Module 'Stream' removed from 5.0 standard library?

nojb · July 25, 2022, 6:21pm

This is not possible without introducing a cyclic dependency. The general philosophy is that each module defining a type for which it makes sense to convert from- and to- Seq.t should define its own to_seq and of_seq functions.

Cheers,
Nicolas

yawaramin · July 25, 2022, 6:24pm

Whoops, right. This is why I shouldn’t code without a compiler

In that case an even easier PR to add a paragraph to the Seq module doc mentioning that the various conversion functions may be found in the modules of the respective data types.

mjambon · July 25, 2022, 10:06pm

Can you elaborate? How is Seq suddenly more basic or more important than List? My view is that lists are the default container in OCaml and that all other containers should offer import and export functions from and to lists.

rbjorklin · July 25, 2022, 11:54pm

I got recommended this on Discord awhile back: GitHub - CraigFe/ocaml-stdlib-type-conversions: Graphs of the various type conversions provided by the OCaml standard library.

Would be great to have something like it in the manual actually.

nojb · July 26, 2022, 5:21am

I wouldn’t call Seq more “important” than List, but it is a more general data structure as it can represent infinite lists.

Cheers,
Nicolas

mjambon · July 26, 2022, 2:08pm

Would the code below not work, given that it doesn’t depend on the List module but only on the built-in list type?

let rec to_list seq =
  match seq () with
  | Nil -> []
  | Cons (x, next) -> x :: to_list next

let rec of_list list () =
  match list with
  | [] -> Nil
  | x :: next -> Cons (x, of_list next)

nojb · July 26, 2022, 4:11pm

Yes, you are right, thanks!

Cheers,
Nicolas

hyphenrf · July 26, 2022, 5:44pm

What if we used Seq.once or Seq.memoize on a side-effectful unfold, followed by a second history list or seq for peeking etc…?
I mean the channel could still be closed from underneath the unevaluated nodes’ feet…

octachron · July 26, 2022, 6:22pm

At the same time, it is better to avoid relying on the fact that list is a built-in type. In particular, it is better to keep one unique name, List.t, for the list type in the documentation. And in many circumstances, Seq is a better intermediate representation than `List.t, since they can avoid making a full copy of data before doing a traversal.

mjambon · July 28, 2022, 5:52pm

Lists are much more widely used than any other container. That gives them priority when it comes to deciding which container type must be more practical to use. It’s in that sense that they’re more important.

c-cube · July 28, 2022, 6:13pm

They’re more used than anything else because of history, not because they’re truly better. If OCaml had been developed in the 2010s it would probably feature instead the same HAMT structures that clojure and scala have. Anecdotically, in my code I tend to use Map/Set/Tbl/Vec a lot more (on the imperative side of OCaml), and list is more like glue for small things, not a good container for “serious” computation.

These days, most languages have a standard iterator abstraction, and so does OCaml. It’s not a container but it makes for a better interchange format than list; hopefully one day the compiler will even optimize common Seq.t uses into nothing.

mjambon · July 28, 2022, 9:31pm

Yes, and that’s like 80% of all programming. What matters here is the time spent by programmers managing their code, not how CPU time is spent. I’m all for having a Seq module but I don’t see how it would get used nearly as much as the List module. If you or other folks have a vision of the future where Seq.t becomes generally preferred over list, I’d love to hear more about it. I just don’t see it.

yawaramin · July 29, 2022, 12:19am

I think we are in new thread territory by now.

raphael-proust · July 29, 2022, 7:43am

I think that the general direction of the Stdlib w.r.t. the Seq module and conversion functions specifically has been towards encouraging the use of Seq as an intermediate representation, a go-between different data-structures.

let m = MyMap.of_seq (Array.to_seq a) in
…

For this use Seq is arguably better than list. Of course different use cases might have different needs and such, but in many cases Seq is better. E.g., if you are interrupted by a minor collection you promote much less memory to the major heap than in the case of a list.

Other uses of Seqs are of course different. There are major downsides to using Seqs over Lists.

Mixing with monads is more complicated. E.g., Lwt_list functions on the same exact lists as the Stdlib, but Lwt_seq needs a custom type to get the monad right there in the arrow.
Lack of pattern matching makes your code boilerplaty. E.g., | [] -> … | [x] -> … | x::y::rest -> … is so concise and readable.
etc.

But I think that w.r.t. the conversion functions, it is good that the API encourages you to use Seq as an intermediate representation in between two data-structures.

dbuenzli · July 29, 2022, 8:53am

And a fold is better than a Seq. I don’t think this “use Seq to convert between data structures” tagline is very convincing.

silene · July 29, 2022, 8:57am

This seems like a poor idea. For example, consider Array.of_seq from the standard library. It first converts the sequence to a list, and then converts the list to an array. This means that Array.of_list (MyMap.to_list m) is actually much more efficient than Array.of_seq (MyMap.to_seq m).

raphael-proust · July 29, 2022, 9:22am

Sure. I was mostly replying in the context of questions being asked about the reason why Stdlib modules have Seq conversion functions but not List conversion functions.

Fold is not necessarily easy to compose, pass around, and such. (Writing MyMap.of_fold is a cool exercise.) So having some reasonable option in the Stdlib available is appreciated.

It’s true that Seq is not necessarily always best. The case of Array.of_seq is just one example which applies to any data-structure which cannot be constructed by progressive agglomeration of chunks (you need to know the size in advance).

In fact Array.of_list is not very efficient either because you first need to traverse the list just to determine the size. So what you’d really want to avoid traversing the data over and over again is something along the lines of

let (size, q) = MyMap.fold (fun (size, q) k v -> (size+1, Queue.add (k,v) q)) (0, Queue.create ()) m in
Array.init size (fun _ -> Queue.pop q)

We could actually implement Array.of_seq using this general principle. It’d be even easier if we had an Array.init_rev so we could use just a list instead of a Queue.

Still, for many other uses it is better than a list as an intermediate data-structure.

gasche · July 29, 2022, 9:22am

Note: This problem with Array requiring a size beforehand should go away once we manage to agree on a dynamically-resized-array API for the standard library. Then DynArray.of_seq (MyMap.to_seq m) will have better space usage than DynArray.of_list (MyMap.to_list m), as it will not allocate an intermediate list for all values. (I would check carefully before trying to suggest that it will in fact be faster; we would be using short-lived closures rather than longer-lived cons cells.)

silene · July 29, 2022, 9:35am

raphael-proust:

In fact Array.of_list is not very efficient either because you first need to traverse the list just to determine the size. So what you’d really want to avoid traversing the data over and over again is something along the lines of
let (size, q) = MyMap.fold (fun (size, q) k v -> (size+1, Queue.add (k,v) q)) (0, Queue.create ()) m in
Array.init size (fun _ -> Queue.pop q)

Note that, if you are using a queue, you might just as well use Queue.length. No need to count the size yourself, as the queue does it already.

Still, you might be underestimating the sheer cost of Queue.add. It is much faster to create a list and to then traverse it to measure its size, than it is to create a queue.

You should only ever use a queue when you need to push/pop in both front and back of a data structure. If you only ever work on one side, having a data structure faster than a list is challenging. The only way to win is by wasting less memory than a list, which is definitely not the case of a queue.

rizo · July 29, 2022, 12:17pm

There was a rather lengthy discussion (here and here) about iteration models for the Stdlib and from what I recall Stdlib.Seq was picked because it’s simple.

There are numerous trade-offs involved in deciding an ideal iteration model and, for what it’s worth, I think Stdlib.Seq is an ok general-purpose streaming model for very basic use-cases.

I do think that (structurally-typed!) iter/fold would be a a more natural choice for the Stdlib considering we already have many iter functions for lists/arrays/maps.

I wrote a versatile streaming library a while ago that focuses on performance while providing deterministic and safe resource management for streams that interact with effectful consumers and producers: see code and docs. In addition to being resource-safe, it’s also fast, specially when compiled with flambda.

I have work-in-progress extensions to the core streaming model to introduce: batching to allow efficient zero-copy processing of chunks of data (similar to angstrom’s internal memory model), and effect-based concurrency.

Topic		Replies	Views
Is support from 4.05.0 enough these days? Ecosystem	1	556	January 26, 2022
OCaml 4.14.0 is released Ecosystem	14	2957	April 1, 2022
Accessing recent additions to the stdlib (`Seq` 4.14) from older versions of OCaml Learning opam , seq	7	584	July 11, 2023
OCaml 5.0.0 is out! Ecosystem	8	35374	January 10, 2023
Camlp5 and OCaml 4.08 Community	14	3531	March 11, 2021

Module 'Stream' removed from 5.0 standard library?

Related topics