Export/import data in OCaml

What is the best practice for exporting data from one OCaml program to be read or incorporated by another OCaml program? The data transfer is through a file.

A. Write a %a printer to output the data in an OCaml readable fashion. Advantage is that the output is human readable and editable without introducing external dependence. And the data can be incorporated at compile time.

B. Marshaling the data through some web standards (like JSON). Encoding/decoding could require more work except for the most basic data structures.

C. output_value is pretty magical but the output is not human readable or editable. Does OCaml retain names of constructors at runtime or is it completely based on matching memory layout? Are there any potential pitfalls?

answering this implies omniscience. Knowing all options and requirements including the future. That’s utopian, this will lead you nowhere. Focus on the properties you care about and use what suits you.

I might have a look at opam - csexp

It depends on your use-case, but doing this would only solve the “encoding” part and you would need to write a parser to decode data written on this format.

This is the goal of ppx preprocessors such as GitHub - ocaml-ppx/ppx_deriving_yojson: A Yojson codec generator for OCaml. and GitHub - janestreet/ppx_sexp_conv: Generation of S-expression conversion functions from type definitions (among others).

No, no such “type” information is kept in the marshalled data. The data is only guaranteed to be read back in a program compiled with the same version of OCaml (and of course against the exact same type definition that was used when marshalling).

Cheers,
Nicolas

2 Likes

It depends on your use-case, but doing this would only solve the “encoding” part and you would need to write a parser to decode data written on this format.

I was actually thinking that the format would be compiler readable (a poor man’s version of meta-circulating compilers) so that the parser is free too.

Ah yes indeed, but this will limit you to the case when the data to be read is available at compile-time (as you noted).

Cheers,
Nicolas

By the way, to print things in OCaml syntax using a ppx you can use GitHub - thierry-martinez/ppx_show: OCaml PPX deriver for deriving show based on ppxlib.

Cheers,
Nicolas

4 Likes

Wow! That is really cool! This is exactly the kind of answer I was looking for that I didn’t even know how to pose the question for, hence the more general open-ended lede I had to resort to (@mro).

1 Like

I usually use json as a storage format, and atd to output structured data, and generate foreign languages parsers.

Hoping this helps.

1 Like

to add to @nojb’s comment, there is also a PPX deriver for Google protobuf format, and of course, as part of Thrift, a type-based de/marshaller-compiler that can generate to OCaml for Thrift types, so you can use the Thrift format.

1 Like

Another interesting option could be the Umarshal module that was recently shipped with the Unison file sync tool. I took a look at the code and it looks fairly self-contained to me; could probably be extracted pretty easily. The cool thing about it is that it aims to be an almost drop-in replacement for OCaml’s Marshal module, except type-safe and stable across OCaml versions. It can also handle variant types, which few other serialization systems can.

3 Likes

A lot of people are happy with sexp.
The files are written in readable ASCII s-expressions, but don’t include the field names of ocaml structures. It doesn’t require a lot of work while it read and write from the type definitions that are annotated.

1 Like

oops, sorry, wrong link, the right lib to serialise ocaml values is this one: sexplib

1 Like