Laziest encoding of map_product for Seq.t

Hey, all! Somewhat of a quick question: Recently, I’ve been trying to optimise the memory usage of a program of mine that uses lists by swapping to Seq.t and switching from eager to lazy computation.

One thing that I’ve found was missing from the stdlib and from the stdlib replacement I’m currently using (Containers), is a generalised cartesian product map_product: ('a -> 'b Seq.t) -> 'a Seq.t -> 'b list Seq.t function for Seqs.

Naturally, it’s not too hard to write your own version, but I’ve been running into difficulties working out what would be the best encoding for this function to ensure that the computation is as lazy as possible.

My initial encoding was just a straightforward translation of the map_product function for Lists defined in the containers stdlib:

let seq_map_product_l (type a b) (f: a -> b Seq.t) (l: a Seq.t) : b list Seq.t =
  let rec prod_rec  (left: b list) (right: a CCSeq.t) (acc: b list Seq.t) () =
    match right () with
    | Seq.Nil -> Seq.Cons (List.rev left, acc)
    | Seq.Cons (l1, tail) ->
      let (l1: b CCSeq.t) = f l1 in
       (fun acc x -> prod_rec (x::left) tail acc)
       acc l1 in
  prod_rec [] l Seq.nil

But the problem turns out to be slightly more subtle than I expected and this naive translation ends up being too eager. In particular, Seq.fold_left is an eager computation, and fully traverses the sequence it is passed in as an argument. Thus, in order to even get a single element out of this combinator, you’d have to effectively compute all of the sequences produced by each invocation of f.

After running into this, I ended up switching to the following implementation, which seems to fare a lot better:

let seq_map_product_l (type a b) (f: a -> b Seq.t) (l: a Seq.t) : b list Seq.t =
  let rec prod_rec  (left: b list) (right: a CCSeq.t) () =
    match right () with
    | Seq.Nil -> Seq.Cons (List.rev left, Seq.nil)
    | Seq.Cons (l1, tail) ->
      let (l1: b CCSeq.t) = f l1 in
      Seq.flat_map (fun l1_elt ->
        prod_rec (l1_elt::left) tail
      ) l1 () in
  prod_rec [] l

Anyway, I was wondering if this updated implementation is optimal, or whether there might be any further changes I should make (excluding things like memoization etc. which I don’t need for my usecase).

You might want to compare with OSeq.map_product_l implementation which produces values in a different order (which looks a bit more “fair” thanks to interleave, but which one is better depends on your use-case!)
Also, the outer 'a Seq.t has to be forced from the get-go, so perhaps it’s easier to reason about an implementation of 'a Seq.t list -> 'a list Seq.t?

1 Like

Ah, nice, thanks for the pointer. I should definitely have a read through OSeq.

Yes, I had considered the issue of fairness w.r.t the sequence of outputs, although for now, as the original list implementation I was using wasn’t doing any interleavings, I wanted to preserve the existing functional behaviour and purely experiment with evaluation strategy (which actually turned out to be quite fruitful, taking my peak memory usage from 10GBs down to just 200MB).

Yes, good point - I had considered taking a list as input, and maybe I should instead, but my initial implementation decided to operate over Seq.ts simply to preserve a consistent interface for my utility functions (most of the sequences being passed around were already as Seq.t now, so changing the interface to List.t would require adding additional conversion operators throughout my codebase).

Ha yes, that’s a good exercise! I mentioned interleave because it’s an interesting difference of how we program with List vs Seq, as we have to take into consideration that they could be infinite lazy lists (see for example the LogicT.pdf paper for motivations)

If you are looking at reducing the work the GC has to do with temporaries (I’m assuming that is the memory optimization you are refering to) then try looking at iterators too, they have somewhat less overhead than sequences depending on the situation (although the stdlib doesn’t provide composable iterators, there are a few libraries on opam that do), especially if you combine it with flambda.

I haven’t done my own benchmarks to confirm, but GitHub - rizo/streams-bench: Benchmarks for different streaming models: iterators, generators, sequences, transducers, etc. contains a quite thorough measurements of the various sequence/iterator implementations (with flambda, YMMV).
There is a product implementation that you can compare with here too Stream (streaming.Streaming.Stream) (and you can compare the eagerness/lazyness of that implementation with yours).

1 Like