A user of the fmlib library has reported that the library cannot be compiled with the 5.0.0 alpha version of ocaml because the module Stream is no longer available in the standard library.
Some questions:
Is this just a bug in the alpha version or will the module be removed from 5.0 on?
If removed, what is the reason? Are there alternatives?
Such a removal from the standard library will break a lot of code. Is there a way to avoid that?
I would like to keep the fmlib library compatible with the standard library and I had expected upward compatibility of the standard library.
Another important point is that the Stream module has been removed because its API has always been overfitted to camlp4’s needs. The Seq module is a better alternative for nearly all use cases, and the Stream module has been considered to be deprecated for a long time.
Typically, a quick glance at fmlib gives me the impression that it should have used the Seq module. Deprecating and removing the Stream module from the standard library was done to avoid this reliance over folkloric knowledge and make the situation a lot clearer for everyone.
@ocatchron: Thanks for the explanation. fmlib has been designed to be functional and no imperative features are used. But there comes the point to interface to the real world. For a parser the real world is to read from a file. Therefore I have used the module Stream because it interfaces nicely with in_channel. Since Seq is functional, it might not be possible to use it as an interface to in_channel.
However thanks for the hint and I will consider to avoid Stream in the future.
There is no major difference between Stream and Seq here: Seq can be perfectly used to read from a file.
For instance, reading char with Seq without buffering is a one line function (in 414)
let from_channel ic = Seq.of_dispenser (fun () -> In_channel.input_char ic)
It can but I think we should discourage people of doing so.
Hiding side effects and ressource access behind Seq is full of surprises. Doing so is equivalent to Haskell’s lazy IO and it seems everyone there knows its perils by now – maybe we can avoid enticing people to do the same mistakes.
When you deal with ressources folds are a better idea since the inversion of control forces a (lifetime) scope on you – unless you start using that nifty OCaml 5 feature to escape, this old blog post by @gasche could be refreshed.
Uh, I could be wrong (been a while since I hacked at this low level, but) while Camlp5 streams are built in a manner similar to Seq, they’re consumed differently: consuming them is not functional, and Seq would not be a replacement.
@dbuenzli interesting comments! Some more unrefined thoughts on this topic below.
Personally I would try to think about impure iterators using concurrent separation logic, similarly to how I use it to reason about concurrent idioms (resources being owned either by the iterator or by the user, their ownership transfer, invariants where “the context” owns some resources, etc.). I think that this is also @fpottier’s mental model, it is visible through the documentation of the new Seq functions and the discussion of persistent vs. ephemeral sequences for example.
There is no language support for this form of program reasoning, so we’re on our own. (Same with multicore programming really.) I wonder if some of the design difficulties in Haskell, that I agree are well worth pointing out here, come from focusing static discipline on effects (having the iterator in a monad, etc.) rather than resources. I would hazard the guess that Rust actually has a better time with this, even though I have even less experience with Rust iterators than with the myriad of Haskell streaming libraries.
The analogy with multicore programming does not stop there. If you write code using mutexes, you have to be careful about “giving back” all resources to the lock invariant at the point where you unlock to leave a critical section, and lock/unlock are just function calls, they do not need to coincide with lexical scoping. You can enforce a correspondence between the lock/unlock bracketing and lexical scoping with a Mutex.with_ : lock -> (unit -> 'a) -> 'a function. I think this is the Mutex analogue of using fold rather than manual .next() calls.
To summeraize: I agree that some ways to use effects with iterators make it easy to shoot yourself in the foot, but my intuition is that concurrent separation logic is a good reasoning model to write those programs safely.
@Chet_Murthy indeed, streams are “more imperative” than sequences. My experience using them is that it’s very easy to get it wrong (basically: only obviously-linear usage of each stream is reasonable), and my (brief) experience maintaining the implementation is that it’s a nest of unspeakable complexity. I’m happy to see them de-emphasized at least (it would be unfair for Multicore to take the blame on compatibility-breaking, this is just house cleaning). You also worked with Camlp[45] a lot, do you have a different opinion?
I maintain Camlp5, and am a big, big, big fan of the Camlp5 way of doing syntax extension. So everything I write below should be read with that in mind.
100% agree that they should be deprecated and separated from the OCaml main corpus. 100%. You will get no argument from me on this.
I would agree that the implementation of Stream is seriously complex. I have never had need to debug in there, but the one time I wanted to do something nontrivial and took a look, yeah, it was pretty complex.
But there’s a reason for this: these are imperative streams [hence, my noting that I don’t think Stream is “overfitted” for Camlp4/5]. And imperative streams allow one to write complex parsers with good performance in a very high-level and functional style. I for one would not give them up; that’s why I originally took over maintainance of Camlp5 from @ddr – because I wanted the function for myself.
“My experience using them is that it’s very easy to get it wrong (basically: only obviously-linear usage of each stream is reasonable)” – yep. This is a functional-seeming veneer over arguably-imperative innards (and it could not be otherwise, for performance’s sake).
There are “functional streams” in Camlp5, which I’ve messed with, but … aren’t, it seems, what you want most of the time.
I’m glad that OCaml is a rich and powerful language, with a rich and powerful ecosystem. We can have different variants of significant layers of OCaml’s ecosystem, without our world collapsing. I personally would never write PPX the way you guys do it: it’s a (to quote a disciplined and experienced OCaml guy) “nest of unspeakable complexity”. I don’t know how you guys do it without tearing out your hair, really I don’t.
grin
ETA: I personally don’t feel it’s easy to get programming with Stream wrong, other than the LL(1) restriction. You’re writing parsers after all, and we all know how that works. And I think this is important: Stream.t is not a general-purpose type: it’s raison d’etre is parsers. Parsers, parsers, parsers, parsers.
I agree that Stream is too complex, but at the same time, we don’t have a great solution for performant streaming of data of all kinds. I recently needed a file stream which is fed to a decompression algorithm stream, which is then fed to an interpreter. I ended up using the Gen library, which is more efficient than Seq, but still far from the efficiency I would have liked it to have. We need buffered byte streams for performance (high performance libraries seem to just build them from scratch, which limits reusability), and Stream did seem to support that to some degree.
There seems to be a general assumption that Stream is efficient. It never was. Seq/Iter/Gen are several order of magnitude faster.
8 years ago, when I suggested deprecating Stream, I made a series of benchmarks, the old mail can be seen here. That’s before flambda, (and Sequence/BatSeq is now called Iter). The gap has only widen since (and there are numerous new contenders that are even faster, I’ll let their authors advertise them :p)
For standard solutions, if you want purely transient, use Iter. If you want transient with control, use Seq. If your memory constraints make the GC a problem, then consider Gen.
Even for parsers, I doubt Stream has any advantages, and I suspect genlex rewritten with Seq/Gen would be strictly superior. In truth, it should be functorized (and allow to experiment with new APIs, notably ones that respond to @dbuenzli remarks).
Oh, interesting, Gen. I’ll have to have a look. Also though, your benchmark (which I cannot find) appears to be treating streams as if they’re mostly functional. That’s not what Stream was ever for.
ETA: by “which I cannot find” I mean that the links are broken, can’t find the gist you reference. But the email seems to imply that you’re mostly treating streams in a functional manner, composing them, etc. That’s a use-case for Stream, but … in my experience they were never meant to be used for arbitrary lazy computation.
Now, I was unaware of Gen. I’ll have to take a look. If it’s got good performance, heck, maybe I’ll see if I can use it in place of Stream. Would be amusing, if nothing else, to try.
I’ll have to agree with @Drup about Stream not being particularly efficient.
Even for the use case of parsing, I’ve found that a small first-class module [1] was a reasonable tradeoff between modularity and speed, at least for reading binary formats. I haven’t really tried a token-level system (besides Lexing) but I suspect a similar approach would still beat Stream handily. The one feature Stream has that might make my approach more complicated, is the peek-ahead thing.
A small nitpick @drup: functors are not great for picking an IO source, imho. You want to be able to make this choice at runtime without instantiating the whole functor for each case. First-class modules are really fast.
already posted it on this forum a few times, but:
module type READ = sig
val byte : unit -> char
val eof : unit -> bool
val exact_string : bytes -> int -> int -> unit
end
What Stream has going for it, is that the stream-parser syntax. It’s a very compact way of writing the parsers themselves. And if performance is an issue, you can implement a module that presents the interface that stream-parsers use (which is very limited) and choose some more-efficient underlying datastructure. Once I had to deserialize classfiles, and a ton of them. So I implemented a Stream interface on top of byte-array+in_channel. This meant I could keep the stream-parsers representation of the demarshaller, while having a more-efficient underlying input-abstraction. And of course, some of the key low-level demarshallers (like for integers, a few other things) I could write directly against the byte-array, so no need to abide by the abstraction for everything.
But hey, it’s all good, b/c OCaml allows us to pick-and-choose these abstractions as we think most appropriate. It’s even possible that I’ll decide that Gen is better than Stream, and switch.
On that note, I can’t figure out why there’s a Seq module. It looks sort of like a lazy list implementation but doesn’t provide to_list and of_list functions. This lack of basic explanations and interfaces leaves me confused.