Is there a tool to do automatic serialization/deserialization AND automatic schema migration (from OCaml types)?

e.g. I have some data (represented by OCaml types), that I want to persist, for example in a database or somewhere (maybe in a json format, or protobuf, or whatever). ppx_deriving_yojson, ocaml-protoc, or atdgen all give me a reasonable way to avoid writing the serialization/deserialization code by hand. Awesome!
However, I expect that I’ll want to change the types as development goes on (it’s inevitable), and I’d love to avoid hand-writing migration scripts for that – it seems like most of that should be automatable. e.g. if a tool was given “previous type definition” and “current type definition”, and maybe some hints about default values, it should be able to generate 80% of the code involved.

Has anyone done this? I couldn’t find anything about ppx_deriving(yojson), ocaml-protoc, or atdgen handling schema migration.
Is this a bad/unreasonable thing to want in the first place?

5 Likes

I think this part of atdgen documentation covers most of what you have to do

https://atd.readthedocs.io/en/latest/tutorial.html#smooth-protocol-upgrades

It looks to me like schema evolution is still a manual business for OCaml programmers, using atgen or any of the various ppx-based serialization methods, and there is nothing available at all for OCaml programmers to perform automatic schema migration.

This is my sad face: :slightly_frowning_face: I’d love to discover differently. It’s a problem I think about a lot when I consider the difficulty evangelizing the adoption of OCaml in professional settings.


:frowning:

The json format lets you add optional fields to records, so that’s generally the main property you rely on to avoid costly migrations. Also, use records generously and avoid tuples, so you can add or remove fields later; occasionally, it’s wise to create single-field records.

Now, in terms of migration between two incompatible types, what kind of tooling would you expect? Do you have a specific example?

Can you give examples of tools that you like in other ecosystems?

Personally, I just design my data structures and APIs such that no migration will be necessary. This has been working well in a startup context, where I’m in charge and features become unused before they need to be redesigned. I’m aware this is not typical.

1 Like

I have started to design a tool for F# in that space.
From my understanding your requirements are hard (or may I say impossible) to implement.

There are at least 4 basic actions that you could act out upon your type and hence your JSON

  • Add new property
  • Drop existing property

Now these are easy. The next ones are much harder

  • Move property
    • within Doc (eg Doc/some/path/to/prop => Doc/some/other/path/to/renamed_prop)
    • outside Doc (eg Doc/some/path/to/prop => OtherDoc/completely/diff/pathto/renamed_prop)

now this again rather trivial but it isn’t as the movement may also include transformations

  • Transformations
    • Simply type transforms string => number and number => string. But that isn’t so simple either cause the transformation might not be safe (eg parseInt: string -> option<int>) so you might also need to define default values in case your transformation fails or you define it as an Option type
    • splitting transformations: Where one property is split up into n different types. I have an instance where I have a biz related property that is akin to ${process_id}-${incremental_counter} which needs a transformator like string -> GUID * IncrementalCounter and then move comand that moves one part into one part and the other into another part
    • Cardinality-Transformations: where you have to move multiple previously stand alone properties into a list or the reverse.

If you combine those transformation types with movements (specifically to potentially yet non-existing outside documents) you will see that it will be hard to create an “automatic” migration simply using 2 types.

My approach looks like this:

  • versioned JSON-Schemas (as the mapping between Lang Type and JSON Type might in itself be complex)
  • transformation rules between those versioned schemas

The process would be like this

  • read json and transform into (untyped) “json” object in host lang (in my case F# using NewtonSofts json lib)
  • check json version tag M with version tag N of type in host lang
  • stepwise apply transformation-rules from version M => N and check after each step the validity created JOSN using the Schema of that particular version.
  • when version N is reached serialize into type of host lang
  • later (after doing your work) deserialize type back into JSON (should be now the wished for version N) and save into store

My understanding is the Java ecosystem is where the tools I’m thinking about are mostly available. These are tools that keep a version history of the schema along with methods of translating data conforming to one version of the schema into its equivalent in a different version of the schema.

These tools are typically tightly coupled to the data modeling and interchange languages used as middleware between the application programming interface and persistent data storage services. The only one I know about that isn’t tightly coupled to a specific database management system is the one used in Apache Avro for the automatic schema evolution in its remote procedure call framework.

1 Like

Well I went ahead and made one :sweat_smile: would love to hear your thoughts

6 Likes

It looks quite interesting, thanks!

One thing I was curious to look at, because I worked a bit on that part of ppx_import recently, is how you do the type lookup (I would guess: reading the appropriate .cmi files through compiler-libs?) , but I didn’t find it in the code. If I understand correctly, this is done through the Analyze module that comes from your reason-language-server project, and I got lost a bit there.

I can’t comment on the package itself but the web site is just great. All someone would want to know before trying out a new library, in a very readable format. I wish more packages were presented as nicely! (One nitpick: a Reason/OCaml switchable syntax button for the examples)

Here is one example Data.SafeCopy