It is my pleasure to announce the first public release of
streaming – a library for building efficient, incremental data processing pipelines that compose and don’t leak resources.
I built streaming as a result of many experiments with different streaming and iteration models for OCaml. There are multiple packages on OPAM that share some of the goals of
streaming (we even have
Stdlib.Seq now!), but none of them combine (1) excellent performance, (2) safe resource handling and (3) pure functional style for combinators. Streaming solves these problems by implementing three basic and independent models: sources, sinks and flows – they represents different parts of the pipeline that correspond to producing, consuming and transforming elements. These models can be defined and composed independently to produce reusable “streaming blocks”.
The library defines a central
Stream model that relies on sources, sinks and flows. This model is a push-based iterator with performance characteristics similar to the
iter iterator, which has type
('a -> unit) -> unit, and is known for being very efficient. But unlike
iter, it has a pure functional core (no need to use mutable state and exceptions for flow control!) and can handle resource allocation and clean up in a lazy and deterministic way. All of this while having a slightly better performance for common stream operations.
For those who are curious about the performance characteristics of
streaming and other models, I created a dedicated repository for stream benchmarks: https://github.com/rizo/streams-bench. In particular, it includes a few simple benchmarks for
The library should soon be published on opam. In the meantime, I invite you to read the docs and explore the code:
- Library documentation: https://odis-labs.github.io/streaming
- Github project: https://github.com/odis-labs/streaming
Questions, opinions and suggestions are welcome!