Seq vs List, optimization

@pveber flambda would definitely be an improvement, but Seq should allocate more than Lists in any case ?
@SkySkimmer, I think the order is correct, I’m using it like the pipe operator

I mean shouldn’t it be non-rev? preserves the order AFAIK
Of course that means you need to decide what tailrec to use :wink:

Ah sorry. Yes, the results of both tests are reversed. If you reduce size to something that can work with like 200000, and use instead of List.rev_map, results are the same : seq uses 1.48GB, list uses 450M

Update about flambda :
Using the example with instead of List.rev_map, size = 200000 and flambda the results are not very different :

  • 1.36GB for seq (1.48GB without flambda)
  • 435M for list (450M without flambda)

Memory usage for a lazy data type like Seq.t shouldn’t be affected by flambda–I think what’s happening here is that you’re loading the entire seq into memory immediately because you’re building it from a list. And a fully-loaded seq actually takes more memory than its equivalent list.

EDIT: when you build the seq you need to use Seq.unfold, not List.init.

Hmmm… I understand that Seq.unfold would be definitely more efficient than List.init. However, in both cases the list is allocated fully. Then I would expect to not reallocate the intermediate steps.
I’m in the exact case described in the article… so if that is true I’m guessing the article is also wrong ?

I’m too lazy to work out the details but maybe if you do it won’t be so surprising. In lists you just have one cons cell per list element, in seq you get a cons cell + a closure, so you can expect more will be allocated by processing with Seq.

1 Like

ok, yeah, I can definitely see that being true…But then what’s the point of Seq ? There’s 3 times the allocation so it uses more memory and also probably takes more time (although on my example it seems to take more or less the same time)…

1 Like

Don’t ask me I was against its inclusion (but for other reasons).

More seriously the point versus lists is that if your data is streamable, the whole program run may allocate more but at every given point in time your program may have a smaller memory footprint since not all the data has to be in memory.


Ahhh, then that makes it clear :slightly_smiling_face:
Thank you very much for your answers !

Alternatively, you could avoid doing roundtrip to lists and write your code as you naturally would:

module WITH_SEQ = struct
  let init n f = OSeq.init ~n f (* taken from Oseq because I'm lazy, the code is trivial *)
  let ( >| ) o f = f o
  let sum = OSeq.fold_left (+) 0

module WITH_LIST = struct
  let init n f = List.init n f
  let ( >| ) o f = List.rev_map f o
  let sum = List.fold_left (+) 0

And then, :magic: , a fabulous allocation rate that never get promoted because we never need to actually build the full lists, and a runtime divided by around 3 … tada it’s the number of map in your code :tada:

Trace file list.mem (5.6 kB) [....]
Trace statistics:
   Minor: 279 MB/s allocations, 98.3% promoted
   Major: 274 MB/s allocations (  0 B/s direct), 270 MB/s collections
   Sizes (by word): 100% <= 4w, 100% <= 16w, 100% <= 256w. Max:  16 B
   Runtime: 8.216 s
Trace file seq.mem ( 10 kB) [....]
Trace statistics:
   Minor: 2.3 GB/s allocations, 0.0% promoted
   Major:   0 B/s allocations (  0 B/s direct),   0 B/s collections
   Sizes (by word): 49% <= 4w, 100% <= 16w, 100% <= 256w. Max:  40 B
   Runtime: 2.279 s

As @dbuenzli pointed out, Seq's point is to stream. Doing back-and-forth between lists makes your code more convoluted, slower, and consume more memory.

Also, for fun, gen is an imperative enumerator, it allocates less, but it’s rather painful to manipulate, and doesn’t really go much faster in practice most of the time.

Trace file gen.mem (2.8 kB)
Trace statistics:
   Minor: 607 MB/s allocations, 0.0% promoted
   Major:   0 B/s allocations (  0 B/s direct),   0 B/s collections
   Sizes (by word): 100% <= 4w, 100% <= 16w, 100% <= 256w. Max:   8 B
   Runtime: 2.123 s

And iter:

Trace file iter.mem (242 B) [...]
Trace statistics:
   Minor: -nan TB/s allocations, -nan% promoted
   Major: -nan TB/s allocations (-nan TB/s direct), -nan TB/s collections
   Sizes (by word): -nan% <= 4w, -nan% <= 16w, -nan% <= 256w. Max:   0 B
   Runtime: 0.000 s

Iter is basically a for loop, it barely allocates, especially on such a trivial benchmark. The actual runtime is 2s here. As highlighted before, the downside of Iter is that you can’t write combinators like zip with it, making it much more limited.

In conclusion, don’t include data-structure roundtrips in your benchmarks measurements, and try to write producer and consumers that manipulates your streaming datastructure directly when possible. :mage:

Conclusion²: iter smokes everything else when you can use it. :rocket:

(and sorry, about the emoji mood)


I’m sorry if I came off aggressive/ungrateful/… in my questions that was really not the point :sweat_smile:
My feeling was really “I don’t understand what is happening, it goes against what I read earlier”.

My benchmark was really what I wanted to do : List -> Seq -> List.
The reason is that my bit of code manipulates list and I can’t change it without a very big effort that I don’t have time to put it right now. My questions were about the precise use-case described by the ocamlverse article in question, which seem to be wrong.
I’ve submitted an issue on the ocamlverse repo and will probably write a PR to fix this article soon.

1 Like

Right, but then you are trying to do something that is fundamentally not really efficient. Rewriting code using List to use Seq instead is generally quite simple (Use Nil/Cons appropriately and insert some thunks here and there) and could get you the after-mentioned benefits.

If you just want to fuse some filter/map together, even to/from lists, use iter and it’ll work fine.

Use an array if you want things to go fast.
A bigarray even, if you only have numeric types.

As Jean-Christophe Filliatre has observed, it’s pretty rare that a list is what you really want, and in all those other instances, an array is faster and smaller. Which doesn’t seem to stop me from using lists everywhere, alas.

1 Like

I think everybody does use list as the default, because they are one of the basic data structures supported by the language.
Once (and if ever) you realize you have a performance problem, you change to something faster.

1 Like

There are contexts in which I find infinite lists to be very convenient.

Sometimes I write simulations where I set the initial state, and then use an (expensive) next-state function, and create a lazy sequence using that function. Suppose you run it in a REPL for 1000 steps, and see what happens. You can go back and look at previous states very easily, because you have the history of states stored in the sequence. And if you decide then that you need to look another 10 or 1000 steps further, you don’t have to recompute the first 1000 steps.

This strategy allows a nice modular structure to the code, too. Rather than writing a loop or a recursion in which you have to build in a termination condition, which might have to change for different purposes, you just write the infinite sequence, and then use a take function to get the number of steps you want, or a take_while function that checks for a condition. You can add whatever tests you want without changing the sequence definition, because the tests are external to the recursion that generates the sequence.

I found this strategy quite useful in Clojure, where the default sequences are lazy. I used the strategy in a project in OCaml too, first with Batteries.LazyList and then Core.Sequence, and in the end, I don’t think it was worth it. The small but significant conveniences for me of a lazy sequence were outweighed by the extra complexity of using separate libraries with their own quirks. (Or maybe it was just that particular project, though, where I ended up just re-running sequences all the time. Maybe I would have felt the same if I’d written the simulation in Clojure.)

(Another example–not about sequences, but about a lazy data structure: Despite some past reservations about Haskell, I’m currently learning it with a project that uses binary trees. The original mathematical definition of the trees makes them infinite, so I’m finding it natural to again just write an initial state and a next-branches recursion, and since Haskell is lazy, I don’t have to include a termination condition. In this case the recursion is not expensive, but I get the same modular simplicity to the definition of the tree. For this code it’s not really worth exploring the tree more than a few dozen steps out, since the size of the data grows exponentially, but it still seems natural to model a mathematically infinite tree as a lazy tree.)

A bit of a spoiler for an upcoming release of a few of our libraries at Nomadic Labs…

We had a bug report: calls to some RPCs exposed by some of our binaries would occasionally cause some lag. One of the root causes of the issue was JSON serialisation. The original serialisation scheme was intended for a limited range of uses (especially, small sizes) but then it was used outside of this intended range and some relatively big values were serialised and pushed down the RPC stack.

To circumvent this, we are about to release

  • a “json lexeme sequence” backend for our serialiser library: construct_seq : 'a encoding -> 'a -> json_lexeme Seq.t where json_lexeme = Jsonm.lexeme = [ `Null | `Bool of bool | … | `As | `Ae | `Os | `Oe ]
  • a json lexeme sequence to string sequence converter.

For this second part, we actually have three different converters intended for slightly different uses. They have different granularity, they have different allocation profiles, and they make slightly different assumption most notably about concurrency:

  • string_seq_of_json_lexeme_seq : chunk_size_hint:int -> json_lexeme Seq.t -> string Seq.t which uses one (1) internal buffer of size chunk_size_hint. Consuming one element of the resulting sequence causes several json lexemes to be consumed and printed onto the internal buffer until it is full. When this happens, a snapshot (copy) of the buffer is delivered in the Cons cell. So for chunk-size-hint of, say, 1Ko, the sequence translator uses roughly 1Ko of memory and emits 1Ko chunks of memory that the consumer is responsible for.
  • small_string_seq_of_json_lexeme_seq : json_lexeme Seq.t -> string Seq.t which translates each of the lexeme as a single string. It’s a little bit more than a simple because it needs to insert separators and escape strings. It mostly returns statically allocated strings so there are no big allocations at all.
  • blit_instructions_seq_of_jsonm_lexeme_seq : buffer: bytes -> json_lexeme Seq.t -> (bytes * int * int) Seq.t which works somewhat similarly to the first one but uses buffer instead of allocating its own. And it returns a seq of (source, offset, length) which are intended to be blitted onto whatever the consumer wants to propagates the data too. This barely allocates at all (it currently does allocate relatively big chunks when escaping strings, but we have planned to improve this in the future. (The sequence returns a source to blit; this source is physically equal to buffer most of the time but not always; specifically, for large strings that are present within the json data, the sequence just points to them as a source.)

Note that the description above is a simplification: there is a bit more to it than that. Also note that all this is still Work In Progress. Check out (the value to json lexeme sequence code) and (the json lexeme sequence to string sequence code).


iter is in each val iter : … function of the stdlib :slightly_smiling_face: . It doesn’t need to be in the stdlib as there is no need for a shared type definition.

Note that while it can be faster (especially with flambda) it’s also more limited in its use cases. The json lexing library described by @raphael-proust would not be possible with iter (or rather only with inversion of control).

1 Like

Something to keep in mind about inversion of control: it’s not trivial to use with an I/O monad. This is one of the reasons that we went for the Seq rather than passing a callback through the abstraction layers. The Seq approach leaves the control near the main loop of our RPC server which knows about Lwt, it thus able to sequence writes correctly and yield as appropriate.