This is not possible without introducing a cyclic dependency. The general philosophy is that each module defining a type for which it makes sense to convert from- and to- Seq.t should define its own to_seq and of_seq functions.
Whoops, right. This is why I shouldnât code without a compiler
In that case an even easier PR to add a paragraph to the Seq module doc mentioning that the various conversion functions may be found in the modules of the respective data types.
Can you elaborate? How is Seq suddenly more basic or more important than List? My view is that lists are the default container in OCaml and that all other containers should offer import and export functions from and to lists.
Would the code below not work, given that it doesnât depend on the List module but only on the built-in list type?
let rec to_list seq =
match seq () with
| Nil -> []
| Cons (x, next) -> x :: to_list next
let rec of_list list () =
match list with
| [] -> Nil
| x :: next -> Cons (x, of_list next)
What if we used Seq.once or Seq.memoize on a side-effectful unfold, followed by a second history list or seq for peeking etc�
I mean the channel could still be closed from underneath the unevaluated nodesâ feetâŚ
At the same time, it is better to avoid relying on the fact that list is a built-in type. In particular, it is better to keep one unique name, List.t, for the list type in the documentation. And in many circumstances, Seq is a better intermediate representation than `List.t, since they can avoid making a full copy of data before doing a traversal.
Lists are much more widely used than any other container. That gives them priority when it comes to deciding which container type must be more practical to use. Itâs in that sense that theyâre more important.
Theyâre more used than anything else because of history, not because theyâre truly better. If OCaml had been developed in the 2010s it would probably feature instead the same HAMT structures that clojure and scala have. Anecdotically, in my code I tend to use Map/Set/Tbl/Vec a lot more (on the imperative side of OCaml), and list is more like glue for small things, not a good container for âseriousâ computation.
These days, most languages have a standard iterator abstraction, and so does OCaml. Itâs not a container but it makes for a better interchange format than list; hopefully one day the compiler will even optimize common Seq.t uses into nothing.
Yes, and thatâs like 80% of all programming. What matters here is the time spent by programmers managing their code, not how CPU time is spent. Iâm all for having a Seq module but I donât see how it would get used nearly as much as the List module. If you or other folks have a vision of the future where Seq.t becomes generally preferred over list, Iâd love to hear more about it. I just donât see it.
I think that the general direction of the Stdlib w.r.t. the Seq module and conversion functions specifically has been towards encouraging the use of Seq as an intermediate representation, a go-between different data-structures.
let m = MyMap.of_seq (Array.to_seq a) in
âŚ
For this use Seq is arguably better than list. Of course different use cases might have different needs and such, but in many cases Seq is better. E.g., if you are interrupted by a minor collection you promote much less memory to the major heap than in the case of a list.
Other uses of Seqs are of course different. There are major downsides to using Seqs over Lists.
Mixing with monads is more complicated. E.g., Lwt_list functions on the same exact lists as the Stdlib, but Lwt_seq needs a custom type to get the monad right there in the arrow.
Lack of pattern matching makes your code boilerplaty. E.g., | [] -> ⌠| [x] -> ⌠| x::y::rest -> ⌠is so concise and readable.
etc.
But I think that w.r.t. the conversion functions, it is good that the API encourages you to use Seq as an intermediate representation in between two data-structures.
This seems like a poor idea. For example, consider Array.of_seq from the standard library. It first converts the sequence to a list, and then converts the list to an array. This means that Array.of_list (MyMap.to_list m) is actually much more efficient than Array.of_seq (MyMap.to_seq m).
Sure. I was mostly replying in the context of questions being asked about the reason why Stdlib modules have Seq conversion functions but not List conversion functions.
Fold is not necessarily easy to compose, pass around, and such. (Writing MyMap.of_fold is a cool exercise.) So having some reasonable option in the Stdlib available is appreciated.
Itâs true that Seq is not necessarily always best. The case of Array.of_seq is just one example which applies to any data-structure which cannot be constructed by progressive agglomeration of chunks (you need to know the size in advance).
In fact Array.of_list is not very efficient either because you first need to traverse the list just to determine the size. So what youâd really want to avoid traversing the data over and over again is something along the lines of
let (size, q) = MyMap.fold (fun (size, q) k v -> (size+1, Queue.add (k,v) q)) (0, Queue.create ()) m in
Array.init size (fun _ -> Queue.pop q)
We could actually implement Array.of_seq using this general principle. Itâd be even easier if we had an Array.init_rev so we could use just a list instead of a Queue.
Still, for many other uses it is better than a list as an intermediate data-structure.
Note: This problem with Array requiring a size beforehand should go away once we manage to agree on a dynamically-resized-array API for the standard library. Then DynArray.of_seq (MyMap.to_seq m) will have better space usage than DynArray.of_list (MyMap.to_list m), as it will not allocate an intermediate list for all values. (I would check carefully before trying to suggest that it will in fact be faster; we would be using short-lived closures rather than longer-lived cons cells.)
Note that, if you are using a queue, you might just as well use Queue.length. No need to count the size yourself, as the queue does it already.
Still, you might be underestimating the sheer cost of Queue.add. It is much faster to create a list and to then traverse it to measure its size, than it is to create a queue.
You should only ever use a queue when you need to push/pop in both front and back of a data structure. If you only ever work on one side, having a data structure faster than a list is challenging. The only way to win is by wasting less memory than a list, which is definitely not the case of a queue.
There was a rather lengthy discussion (here and here) about iteration models for the Stdlib and from what I recall Stdlib.Seq was picked because itâs simple.
There are numerous trade-offs involved in deciding an ideal iteration model and, for what itâs worth, I think Stdlib.Seq is an ok general-purpose streaming model for very basic use-cases.
I do think that (structurally-typed!) iter/fold would be a a more natural choice for the Stdlib considering we already have many iter functions for lists/arrays/maps.
I wrote a versatile streaming library a while ago that focuses on performance while providing deterministic and safe resource management for streams that interact with effectful consumers and producers: see code and docs. In addition to being resource-safe, itâs also fast, specially when compiled with flambda.
I have work-in-progress extensions to the core streaming model to introduce: batching to allow efficient zero-copy processing of chunks of data (similar to angstromâs internal memory model), and effect-based concurrency.