I’ve been building a library for Apache Avro called avro-simple. It is an OCaml implementation of Apache Avro with codec-based design, schema evolution support, and container file format.
The key principles for this library are:
- Value-centric design: Manual codec construction using combinators (no code generation required)
- Pure OCaml: No external C dependencies for core functionality
- Schema evolution: Built-in support for reading data with different schemas
- Container files: Full support for Avro Object Container File format
- Compression: Multiple compression codecs (null, deflate, snappy, zstandard) with compression plugins
- Streaming: Memory-efficient block-level streaming for large files
- Type-safe: Codec-enforced types with composable combinators
This compares to ocaml-avro which uses code generation based on JSON schemas and is missing some of the schema evolution features.
I’m mainly using this for reading and writing Avro container files, however it should be possible to integrate with ocaml-kafka if you use Kafka. The performance is reasonable so far, 1.4 times slower than the fastest Rust based library I could find and I haven’t really tried to optimise it yet. HEAPs faster than the official Apache Avro libraries in Python.
There are a few other places I think I can improve the performance and memory usage, but it should be quite usable for small to medium sized files. Enjoy ![]()