I’ve been playing with lazy sequences in the context of this post (reimplementation of the Power series, power serious paper).
The performance problems I alluded to there were apparently due to nested/excessive memoziation being done (by calls to Seq.memoize
). I was doing this pessimistically in an attempt to recreate the Haskell list behavior in OCaml. The ultimate fix I landed on was to implement all base operations from the ground up using non-memoized Seq.t
s and then expose a wrapper at the next level to simply wrap all of the inner Seq.t
s in a call to Seq.memoize
.
So, for example, here’s the base interface (Series_trunc.mli
):
type t = Q.t Seq.t
val coeffs : t -> Q.t Seq.t
val nth : int -> t -> Q.t
val zero : t
val one : t
val z : t
val zpow : int -> t
val neg : t -> t
val ( + ) : t -> t -> t
val ( - ) : t -> t -> t
val ( *. ) : Q.t -> t -> t
val ( * ) : t -> t -> t
val ( / ) : t -> t -> t
val compose : t -> t -> t
And here’s the memoized wrapper (Series_memo.ml
):
type t = Q.t Seq.t
let coeffs fs = fs
(* NOTE: We explicitly do NOT memoize anything we know to have O(1) terms. *)
let nth = Series_trunc.nth
let zero = Series_trunc.zero
let one = Series_trunc.one
let z = Series_trunc.z
(* TODO: Consider memoizing iff n exceeds some threshold. *)
let zpow n = Series_trunc.zpow n |> Seq.memoize
let neg t = Series_trunc.neg t |> Seq.memoize
let ( + ) fs gs = Series_trunc.(fs + gs) |> Seq.memoize
let ( - ) fs gs = Series_trunc.(fs - gs) |> Seq.memoize
let ( *. ) c fs = Series_trunc.(c *. fs) |> Seq.memoize
let ( * ) fs gs = Series_trunc.(fs * gs) |> Seq.memoize
let ( / ) fs gs = Series_trunc.(fs / gs) |> Seq.memoize
let compose fs gs = Series_trunc.compose fs gs |> Seq.memoize
You can see the rest of the implementation here.
Is there a better way of doing this? In general, what are the pitfalls that one should avoid while working with memoized Seq
s? I’m not sure exactly what was going on with my original implementation (I no longer have that code and went through a lot of iterations), but I suspect that the problem was that I was double-memoizing many internal Seqs and causing extra memory pressure due to this. Obviously (based on empirical performance) the top-level memoziation is better, but it’s not clear if this is the best I can do. It is nearly certain that some extra memoization is happening where it doesn’t need to happen, and also that some recursive operations in the base implementation are not memoized but should be.
The fundamental issue is that there is no memoization by default (as in Haskell lists). As a result, only explicitly memoized lists are retained. It looks like under the hood, this gets translated into a call to Lazy.from_fun
. Even if that function did explicitly avoid double-wrapping lazy results, I suspect that any levels of indirection would prevent nested lazy cells from being collapsed.
Can anybody point me to reliable patterns for avoiding this issue with lazy sequences, or is the general advice simply to ignore lazy/memoized values in general? (That’s the gist I was getting from the other thread).
Beyond memoization per se, I’m not happy with how I ended up “manually” constructing the recursive sequence parts by helper functions (e.g., see here). I suspect there is a more idiomatic way to do this using the various Seq
combinators, but I couldn’t figure out which if so. In this case, I’m essentially popping a single element off of the heads of two lists and then constructing the tail as a recursive expression possibly involving the head and tails of two lists. This is more general than a simple operation like map2
, etc., but it also requires in some cases extending the list beyond the length of the shortest of the two Seq
s.
(Implementation note: I ended up implementing this differently than the original Haskell version due to performance reasons. Many operations end up exponential in the number of sequences you are combining, so you can dramatically speed up computation by truncating sequences at some upper bound; this allows you to propagate zeros indefinitely past that point and, in many cases, terminate early. It adds a bit of complexity to the implementation but yields big wins in runtime.)