[ANN] First release of OCaml bindings to the Polars dataframe library

Hi everyone!

I’m excited to share a project I’ve been working on: polars-ocaml, some OCaml bindings to the Polars dataframe library. If you’ve ever wanted to do data science or tabular data processing in OCaml, please consider trying this out!

Polars is a quite performant dataframe library written in Rust, with an API that is built on top of the Apache Arrow format with a focus on performance, utilizing parallelism and SIMD to get pretty big speedups compared to regular records or libraries like pandas.

We’ve ported most of the examples in the Polars user guide to OCaml in the form of expect tests: https://github.com/mt-caret/polars-ocaml/tree/main/guide
I encourage folks to take a look if you’re interested in seeing some examples of idiomatic usage of polars-ocaml; I think the labelled arguments and use of GADTs have made the API quite nice to work with!

How to get it

We’ve just released the first version to opam, so you can install it with opam install polars. It also works with OCaml jupyter notebooks via ocaml-jupyter:

Contributing

If you find any issues or have any questions, feel free to comment or raise an issue on GitHub. While we’ve exposed a fair amount of polars functionality, there’s quite a lot more we haven’t gotten around to, so PRs are very welcome!

29 Likes

This looks super exciting!

I wonder how it interacts with standard OCaml types such as array, bigarray etc. Or is the general idea that all computations on Dataframes should be carried out within Polars functions?

Also, what would you recommend for interactive plotting?

Are you planning to blog post to show off the functionality at some point?

1 Like

Or is the general idea that all computations on Dataframes should be carried out within Polars functions?

That’s the idea, since then it’s possible to utilize the performance gains provided by the arrow memory layout, parallelism, SIMD, etc. You can go back and forth between OCaml value and Polars using functions like Series.create / Series.to_list.

Also, what would you recommend for interactive plotting?

I haven’t personally tried it yet, but I think @laurent 's GitHub - LaurentMazare/ocaml-matplotlib: Plotting for ocaml based on matplotlib.pyplot library should work with ocaml-jupyter so would probably be worth trying out.

Are you planning to blog post to show off the functionality at some point?

That’s a good idea, I have a few more features I’m planning to implement so I’ll write something up once those things are done!

It might be interesting to provide also generators, e.g. Seq.t which would be able to stream Series elements when a list would not fit in memory?

That’s an interesting idea. I think one potential issue with getting generator-like things out of Series.t is that Series.t is mutable, so things can mutate under you. If you aren’t concerned about that, I think you can create something like this right now by using Series.get which lets you access a single element at a given index.

Interesting approach. How was your experience with OCaml ↔ Rust interop?
I see it uses ocaml-interop which doesn’t require writing C stubs.

Also I notice that it is currently not available on OCaml 5, is that a limitation of ocaml-interop, or did you find some bugs with Rust interaction there?

Thanks! The experience was quite pleasant. I think @tizoc’s work here has made many of the pain points around writing correct and performant FFIs (specifically making sure that value lifetimes are properly handled), and I’m quite excited about the possibilities of this library.

Also I notice that it is currently not available on OCaml 5, is that a limitation of ocaml-interop, or did you find some bugs with Rust interaction there?

Yes. I think (but I may be wrong here) that it currently doesn’t support OCaml 5 yet (with should the runtime be a thread-local variable? · Issue #42 · tizoc/ocaml-interop · GitHub, for example, being a blocker).

1 Like

Would you please make a windows ports? ex for dkml …

Sorry, I don’t have a Windows machine to test my code, but I’m happy to accept a PR that adds support for this!

1 Like

Doubt that would be possible. async does not have a Windows implementation, among other things.

1 Like