Module 'Stream' removed from 5.0 standard library?

This is not possible without introducing a cyclic dependency. The general philosophy is that each module defining a type for which it makes sense to convert from- and to- Seq.t should define its own to_seq and of_seq functions.

Cheers,
Nicolas

Whoops, right. This is why I shouldn’t code without a compiler :slightly_smiling_face:

In that case an even easier PR to add a paragraph to the Seq module doc mentioning that the various conversion functions may be found in the modules of the respective data types.

3 Likes

Can you elaborate? How is Seq suddenly more basic or more important than List? My view is that lists are the default container in OCaml and that all other containers should offer import and export functions from and to lists.

I got recommended this on Discord awhile back: GitHub - CraigFe/ocaml-stdlib-type-conversions: Graphs of the various type conversions provided by the OCaml standard library.

Would be great to have something like it in the manual actually.

1 Like

I wouldn’t call Seq more “important” than List, but it is a more general data structure as it can represent infinite lists.

Cheers,
Nicolas

2 Likes

Would the code below not work, given that it doesn’t depend on the List module but only on the built-in list type?

let rec to_list seq =
  match seq () with
  | Nil -> []
  | Cons (x, next) -> x :: to_list next

let rec of_list list () =
  match list with
  | [] -> Nil
  | x :: next -> Cons (x, of_list next)
1 Like

Yes, you are right, thanks!

Cheers,
Nicolas

What if we used Seq.once or Seq.memoize on a side-effectful unfold, followed by a second history list or seq for peeking etc…?
I mean the channel could still be closed from underneath the unevaluated nodes’ feet…

1 Like

At the same time, it is better to avoid relying on the fact that list is a built-in type. In particular, it is better to keep one unique name, List.t, for the list type in the documentation. And in many circumstances, Seq is a better intermediate representation than `List.t, since they can avoid making a full copy of data before doing a traversal.

3 Likes

Lists are much more widely used than any other container. That gives them priority when it comes to deciding which container type must be more practical to use. It’s in that sense that they’re more important.

They’re more used than anything else because of history, not because they’re truly better. If OCaml had been developed in the 2010s it would probably feature instead the same HAMT structures that clojure and scala have. Anecdotically, in my code I tend to use Map/Set/Tbl/Vec a lot more (on the imperative side of OCaml), and list is more like glue for small things, not a good container for “serious” computation.

These days, most languages have a standard iterator abstraction, and so does OCaml. It’s not a container but it makes for a better interchange format than list; hopefully one day the compiler will even optimize common Seq.t uses into nothing.

4 Likes

Yes, and that’s like 80% of all programming. What matters here is the time spent by programmers managing their code, not how CPU time is spent. I’m all for having a Seq module but I don’t see how it would get used nearly as much as the List module. If you or other folks have a vision of the future where Seq.t becomes generally preferred over list, I’d love to hear more about it. I just don’t see it.

I think we are in new thread territory by now.

2 Likes

I think that the general direction of the Stdlib w.r.t. the Seq module and conversion functions specifically has been towards encouraging the use of Seq as an intermediate representation, a go-between different data-structures.

let m = MyMap.of_seq (Array.to_seq a) in
…

For this use Seq is arguably better than list. Of course different use cases might have different needs and such, but in many cases Seq is better. E.g., if you are interrupted by a minor collection you promote much less memory to the major heap than in the case of a list.


Other uses of Seqs are of course different. There are major downsides to using Seqs over Lists.

  • Mixing with monads is more complicated. E.g., Lwt_list functions on the same exact lists as the Stdlib, but Lwt_seq needs a custom type to get the monad right there in the arrow.
  • Lack of pattern matching makes your code boilerplaty. E.g., | [] -> … | [x] -> … | x::y::rest -> … is so concise and readable.
  • etc.

But I think that w.r.t. the conversion functions, it is good that the API encourages you to use Seq as an intermediate representation in between two data-structures.

1 Like

And a fold is better than a Seq. I don’t think this “use Seq to convert between data structures” tagline is very convincing.

1 Like

This seems like a poor idea. For example, consider Array.of_seq from the standard library. It first converts the sequence to a list, and then converts the list to an array. This means that Array.of_list (MyMap.to_list m) is actually much more efficient than Array.of_seq (MyMap.to_seq m).

1 Like

Sure. I was mostly replying in the context of questions being asked about the reason why Stdlib modules have Seq conversion functions but not List conversion functions.

Fold is not necessarily easy to compose, pass around, and such. (Writing MyMap.of_fold is a cool exercise.) So having some reasonable option in the Stdlib available is appreciated.

It’s true that Seq is not necessarily always best. The case of Array.of_seq is just one example which applies to any data-structure which cannot be constructed by progressive agglomeration of chunks (you need to know the size in advance).

In fact Array.of_list is not very efficient either because you first need to traverse the list just to determine the size. So what you’d really want to avoid traversing the data over and over again is something along the lines of

let (size, q) = MyMap.fold (fun (size, q) k v -> (size+1, Queue.add (k,v) q)) (0, Queue.create ()) m in
Array.init size (fun _ -> Queue.pop q)

We could actually implement Array.of_seq using this general principle. It’d be even easier if we had an Array.init_rev so we could use just a list instead of a Queue.

Still, for many other uses it is better than a list as an intermediate data-structure.

Note: This problem with Array requiring a size beforehand should go away once we manage to agree on a dynamically-resized-array API for the standard library. Then DynArray.of_seq (MyMap.to_seq m) will have better space usage than DynArray.of_list (MyMap.to_list m), as it will not allocate an intermediate list for all values. (I would check carefully before trying to suggest that it will in fact be faster; we would be using short-lived closures rather than longer-lived cons cells.)

1 Like

Note that, if you are using a queue, you might just as well use Queue.length. No need to count the size yourself, as the queue does it already.

Still, you might be underestimating the sheer cost of Queue.add. It is much faster to create a list and to then traverse it to measure its size, than it is to create a queue.

You should only ever use a queue when you need to push/pop in both front and back of a data structure. If you only ever work on one side, having a data structure faster than a list is challenging. The only way to win is by wasting less memory than a list, which is definitely not the case of a queue.

There was a rather lengthy discussion (here and here) about iteration models for the Stdlib and from what I recall Stdlib.Seq was picked because it’s simple.

There are numerous trade-offs involved in deciding an ideal iteration model and, for what it’s worth, I think Stdlib.Seq is an ok general-purpose streaming model for very basic use-cases.

I do think that (structurally-typed!) iter/fold would be a a more natural choice for the Stdlib considering we already have many iter functions for lists/arrays/maps.

I wrote a versatile streaming library a while ago that focuses on performance while providing deterministic and safe resource management for streams that interact with effectful consumers and producers: see code and docs. In addition to being resource-safe, it’s also fast, specially when compiled with flambda.

I have work-in-progress extensions to the core streaming model to introduce: batching to allow efficient zero-copy processing of chunks of data (similar to angstrom’s internal memory model), and effect-based concurrency.

4 Likes