@pveber flambda would definitely be an improvement, but Seq should allocate more than Lists in any case ?
@SkySkimmer, I think the order is correct, I’m using it like the pipe operator
I mean shouldn’t it be non-rev? Seq.map preserves the order AFAIK
Of course that means you need to decide what tailrec List.map to use
Ah sorry. Yes, the results of both tests are reversed. If you reduce size to something that List.map can work with like 200000
, and use List.map instead of List.rev_map, results are the same : seq uses 1.48GB, list uses 450M
Update about flambda :
Using the example with List.map instead of List.rev_map, size = 200000
and flambda the results are not very different :
- 1.36GB for seq (1.48GB without flambda)
- 435M for list (450M without flambda)
Memory usage for a lazy data type like Seq.t
shouldn’t be affected by flambda–I think what’s happening here is that you’re loading the entire seq into memory immediately because you’re building it from a list. And a fully-loaded seq actually takes more memory than its equivalent list.
EDIT: when you build the seq you need to use Seq.unfold
, not List.init
.
Hmmm… I understand that Seq.unfold would be definitely more efficient than List.init. However, in both cases the list is allocated fully. Then I would expect Seq.map to not reallocate the intermediate steps.
I’m in the exact case described in the article… so if that is true I’m guessing the article is also wrong ?
I’m too lazy to work out the details but maybe if you do it won’t be so surprising. In lists you just have one cons cell per list element, in seq you get a cons cell + a closure, so you can expect more will be allocated by processing with Seq
.
ok, yeah, I can definitely see that being true…But then what’s the point of Seq ? There’s 3 times the allocation so it uses more memory and also probably takes more time (although on my example it seems to take more or less the same time)…
Don’t ask me I was against its inclusion (but for other reasons).
More seriously the point versus lists is that if your data is streamable, the whole program run may allocate more but at every given point in time your program may have a smaller memory footprint since not all the data has to be in memory.
Ahhh, then that makes it clear
Thank you very much for your answers !
Alternatively, you could avoid doing roundtrip to lists and write your code as you naturally would:
module WITH_SEQ = struct
let init n f = OSeq.init ~n f (* taken from Oseq because I'm lazy, the code is trivial *)
let ( >| ) o f = Seq.map f o
let sum = OSeq.fold_left (+) 0
end
module WITH_LIST = struct
let init n f = List.init n f
let ( >| ) o f = List.rev_map f o
let sum = List.fold_left (+) 0
end
And then, :magic: , a fabulous allocation rate that never get promoted because we never need to actually build the full lists, and a runtime divided by around 3 … tada it’s the number of map in your code
Trace file list.mem (5.6 kB) [....]
Trace statistics:
Minor: 279 MB/s allocations, 98.3% promoted
Major: 274 MB/s allocations ( 0 B/s direct), 270 MB/s collections
Sizes (by word): 100% <= 4w, 100% <= 16w, 100% <= 256w. Max: 16 B
Runtime: 8.216 s
Trace file seq.mem ( 10 kB) [....]
Trace statistics:
Minor: 2.3 GB/s allocations, 0.0% promoted
Major: 0 B/s allocations ( 0 B/s direct), 0 B/s collections
Sizes (by word): 49% <= 4w, 100% <= 16w, 100% <= 256w. Max: 40 B
Runtime: 2.279 s
As @dbuenzli pointed out, Seq
's point is to stream. Doing back-and-forth between lists makes your code more convoluted, slower, and consume more memory.
Also, for fun, gen
is an imperative enumerator, it allocates less, but it’s rather painful to manipulate, and doesn’t really go much faster in practice most of the time.
Trace file gen.mem (2.8 kB)
Trace statistics:
Minor: 607 MB/s allocations, 0.0% promoted
Major: 0 B/s allocations ( 0 B/s direct), 0 B/s collections
Sizes (by word): 100% <= 4w, 100% <= 16w, 100% <= 256w. Max: 8 B
Runtime: 2.123 s
And iter
:
Trace file iter.mem (242 B) [...]
Trace statistics:
Minor: -nan TB/s allocations, -nan% promoted
Major: -nan TB/s allocations (-nan TB/s direct), -nan TB/s collections
Sizes (by word): -nan% <= 4w, -nan% <= 16w, -nan% <= 256w. Max: 0 B
Runtime: 0.000 s
Iter
is basically a for loop, it barely allocates, especially on such a trivial benchmark. The actual runtime is 2s
here. As highlighted before, the downside of Iter
is that you can’t write combinators like zip
with it, making it much more limited.
In conclusion, don’t include data-structure roundtrips in your benchmarks measurements, and try to write producer and consumers that manipulates your streaming datastructure directly when possible.
Conclusion²: iter
smokes everything else when you can use it.
(and sorry, about the emoji mood)
I’m sorry if I came off aggressive/ungrateful/… in my questions that was really not the point
My feeling was really “I don’t understand what is happening, it goes against what I read earlier”.
My benchmark was really what I wanted to do : List -> Seq -> List.
The reason is that my bit of code manipulates list and I can’t change it without a very big effort that I don’t have time to put it right now. My questions were about the precise use-case described by the ocamlverse article in question, which seem to be wrong.
I’ve submitted an issue on the ocamlverse repo and will probably write a PR to fix this article soon.
Right, but then you are trying to do something that is fundamentally not really efficient. Rewriting code using List
to use Seq
instead is generally quite simple (Use Nil
/Cons
appropriately and insert some thunks here and there) and could get you the after-mentioned benefits.
If you just want to fuse some filter/map together, even to/from lists, use iter
and it’ll work fine.
Use an array if you want things to go fast.
A bigarray even, if you only have numeric types.
As Jean-Christophe Filliatre has observed, it’s pretty rare that a list is what you really want, and in all those other instances, an array is faster and smaller. Which doesn’t seem to stop me from using lists everywhere, alas.
I think everybody does use list as the default, because they are one of the basic data structures supported by the language.
Once (and if ever) you realize you have a performance problem, you change to something faster.
There are contexts in which I find infinite lists to be very convenient.
Sometimes I write simulations where I set the initial state, and then use an (expensive) next-state
function, and create a lazy sequence using that function. Suppose you run it in a REPL for 1000 steps, and see what happens. You can go back and look at previous states very easily, because you have the history of states stored in the sequence. And if you decide then that you need to look another 10 or 1000 steps further, you don’t have to recompute the first 1000 steps.
This strategy allows a nice modular structure to the code, too. Rather than writing a loop or a recursion in which you have to build in a termination condition, which might have to change for different purposes, you just write the infinite sequence, and then use a take
function to get the number of steps you want, or a take_while
function that checks for a condition. You can add whatever tests you want without changing the sequence definition, because the tests are external to the recursion that generates the sequence.
I found this strategy quite useful in Clojure, where the default sequences are lazy. I used the strategy in a project in OCaml too, first with Batteries.LazyList
and then Core.Sequence
, and in the end, I don’t think it was worth it. The small but significant conveniences for me of a lazy sequence were outweighed by the extra complexity of using separate libraries with their own quirks. (Or maybe it was just that particular project, though, where I ended up just re-running sequences all the time. Maybe I would have felt the same if I’d written the simulation in Clojure.)
(Another example–not about sequences, but about a lazy data structure: Despite some past reservations about Haskell, I’m currently learning it with a project that uses binary trees. The original mathematical definition of the trees makes them infinite, so I’m finding it natural to again just write an initial state and a next-branches recursion, and since Haskell is lazy, I don’t have to include a termination condition. In this case the recursion is not expensive, but I get the same modular simplicity to the definition of the tree. For this code it’s not really worth exploring the tree more than a few dozen steps out, since the size of the data grows exponentially, but it still seems natural to model a mathematically infinite tree as a lazy tree.)
A bit of a spoiler for an upcoming release of a few of our libraries at Nomadic Labs…
We had a bug report: calls to some RPCs exposed by some of our binaries would occasionally cause some lag. One of the root causes of the issue was JSON serialisation. The original serialisation scheme was intended for a limited range of uses (especially, small sizes) but then it was used outside of this intended range and some relatively big values were serialised and pushed down the RPC stack.
To circumvent this, we are about to release
- a “json lexeme sequence” backend for our serialiser library:
construct_seq : 'a encoding -> 'a -> json_lexeme Seq.t
wherejson_lexeme = Jsonm.lexeme = [ `Null | `Bool of bool | … | `As | `Ae | `Os | `Oe ]
- a json lexeme sequence to string sequence converter.
For this second part, we actually have three different converters intended for slightly different uses. They have different granularity, they have different allocation profiles, and they make slightly different assumption most notably about concurrency:
-
string_seq_of_json_lexeme_seq : chunk_size_hint:int -> json_lexeme Seq.t -> string Seq.t
which uses one (1) internal buffer of sizechunk_size_hint
. Consuming one element of the resulting sequence causes several json lexemes to be consumed and printed onto the internal buffer until it is full. When this happens, a snapshot (copy) of the buffer is delivered in theCons
cell. So for chunk-size-hint of, say, 1Ko, the sequence translator uses roughly 1Ko of memory and emits 1Ko chunks of memory that the consumer is responsible for. -
small_string_seq_of_json_lexeme_seq : json_lexeme Seq.t -> string Seq.t
which translates each of the lexeme as a single string. It’s a little bit more than a simpleSeq.map
because it needs to insert separators and escape strings. It mostly returns statically allocated strings so there are no big allocations at all. -
blit_instructions_seq_of_jsonm_lexeme_seq : buffer: bytes -> json_lexeme Seq.t -> (bytes * int * int) Seq.t
which works somewhat similarly to the first one but usesbuffer
instead of allocating its own. And it returns a seq of(source, offset, length)
which are intended to be blitted onto whatever the consumer wants to propagates the data too. This barely allocates at all (it currently does allocate relatively big chunks when escaping strings, but we have planned to improve this in the future. (The sequence returns a source to blit; this source is physically equal tobuffer
most of the time but not always; specifically, for large strings that are present within the json data, the sequence just points to them as a source.)
Note that the description above is a simplification: there is a bit more to it than that. Also note that all this is still Work In Progress. Check out https://gitlab.com/nomadic-labs/json-data-encoding/-/merge_requests/5 (the value to json lexeme sequence code) and https://gitlab.com/nomadic-labs/data-encoding/-/merge_requests/19 (the json lexeme sequence to string sequence code).
iter
is in each val iter : …
function of the stdlib . It doesn’t need to be in the stdlib as there is no need for a shared type definition.
Note that while it can be faster (especially with flambda) it’s also more limited in its use cases. The json lexing library described by @raphael-proust would not be possible with iter
(or rather only with inversion of control).
Something to keep in mind about inversion of control: it’s not trivial to use with an I/O monad. This is one of the reasons that we went for the Seq
rather than passing a callback through the abstraction layers. The Seq
approach leaves the control near the main loop of our RPC server which knows about Lwt, it thus able to sequence writes correctly and yield as appropriate.