I’m excited to share a project I’ve been working on: polars-ocaml, some OCaml bindings to the Polars dataframe library. If you’ve ever wanted to do data science or tabular data processing in OCaml, please consider trying this out!
Polars is a quite performant dataframe library written in Rust, with an API that is built on top of the Apache Arrow format with a focus on performance, utilizing parallelism and SIMD to get pretty big speedups compared to regular records or libraries like pandas.
We’ve ported most of the examples in the Polars user guide to OCaml in the form of expect tests: https://github.com/mt-caret/polars-ocaml/tree/main/guide
I encourage folks to take a look if you’re interested in seeing some examples of idiomatic usage of polars-ocaml; I think the labelled arguments and use of GADTs have made the API quite nice to work with!
How to get it
We’ve just released the first version to opam, so you can install it with opam install polars. It also works with OCaml jupyter notebooks via ocaml-jupyter:
If you find any issues or have any questions, feel free to comment or raise an issue on GitHub. While we’ve exposed a fair amount of polars functionality, there’s quite a lot more we haven’t gotten around to, so PRs are very welcome!
I wonder how it interacts with standard OCaml types such as array, bigarray etc. Or is the general idea that all computations on Dataframes should be carried out within Polars functions?
Also, what would you recommend for interactive plotting?
Are you planning to blog post to show off the functionality at some point?
Or is the general idea that all computations on Dataframes should be carried out within Polars functions?
That’s the idea, since then it’s possible to utilize the performance gains provided by the arrow memory layout, parallelism, SIMD, etc. You can go back and forth between OCaml value and Polars using functions like Series.create / Series.to_list.
Also, what would you recommend for interactive plotting?
That’s an interesting idea. I think one potential issue with getting generator-like things out of Series.t is that Series.t is mutable, so things can mutate under you. If you aren’t concerned about that, I think you can create something like this right now by using Series.get which lets you access a single element at a given index.
Interesting approach. How was your experience with OCaml ↔ Rust interop?
I see it uses ocaml-interop which doesn’t require writing C stubs.
Also I notice that it is currently not available on OCaml 5, is that a limitation of ocaml-interop, or did you find some bugs with Rust interaction there?
Thanks! The experience was quite pleasant. I think @tizoc’s work here has made many of the pain points around writing correct and performant FFIs (specifically making sure that value lifetimes are properly handled), and I’m quite excited about the possibilities of this library.
Also I notice that it is currently not available on OCaml 5, is that a limitation of ocaml-interop, or did you find some bugs with Rust interaction there?