…leads me to wonder what the future of the Streams interface might be. I’m currently using it in a project so that I can have input to a lexical analyzer come from either a string (for unit testing) or a file with the same API. Should I be building my own similar gadget instead? (I’m not that in love with the interface anyway, I seem to prefer things that return options rather than throwing exceptions…)
But it’s certainly not what you want you want to use if you are streaming from files (and the actual reason why I’m against its inclusion in the stdlib).
When you are streaming from/to files you need to be able to handle errors and resource disposal, these thing won’t do it and simply plugging file reading functions in there will easily result in pain like too many open fds etc. (you would be doing the equivalent of Haskell’s lazy IO, google for that you should be able to find plenty of discussion about its shortcomings).
One way to side step the issue with these iterators is to first read the whole file in memory and start from that, but then you are no longer streaming (and the error handling story e.g. for decoders is still spoty).
There was more discussion in the first PR but somehow it got subsequently ignored.
Okay, so you’re saying that these won’t work for me particularly well. Do you have suggestions on what will? I could also try to build something primitive myself, it wouldn’t be too hard, but I would have hoped for a more generic solution.
(Oh, and on file descriptor leaks, I presume from that there isn’t any way in OCaml to attach a finalizer to a data structure to (for example) free a fd if no one is using it any more…?)
While @dbuenzli’s concerns about resources are generally justified, using gen for lexing/parsing is, in practice, perfectly fine and you can find several people doing exactly that. The usage pattern for parsing a file is extremely simple (open the file, lex, parse, close). Sedlex, in particular, works particularly well with gen.
My advice would be not to overthink the matter, unless you want to spend more time thinking about iterators than what your application does.
If you have more complex usage patterns, then yes, lazyness comes back to bite you in the ass, but that’s not always the case.
Indeed if I remember correctly Gen are simple generators and in those the producer drives the generation which allows to control the resources more precisely (here’s a few implementation of them from first principles).