Autogeneration of JSON exporter from OCaml datatypes in multiple code bases

Hello, Ocaml beginner here.

I want to know how best to auto-derive code that exports values in an OCaml data type with a huge chain of dependencies on two code bases (in different GitHub repos) that I cannot change (annotate with [@@deriving yojson]), including the Ocaml standard library, in particular for arbitrary precision arithmetic (Zarith).

What I have done is manually to duplicate all relevant data type definitions, and annotated them with [@@deriving yojson]. That works (except for a fudge regarding arbitrary precision arithmetic), but does not help when the underlying repos change. What I’d like is to automate the whole process, so I simply specify the base-type to be serialised to JSON (and links to the GitHub repo dependencies) and everything else works automatically. I can see how to do the relevant scripting myself, but it looks like a huge amount of effort. Surely this problem must have been encountered by others before.

What’s the recommended approach in 2021?

2 Likes

Hi :slight_smile:

This isn’t a direct answer to your question, but I think the problem you’re describing is exactly the one that ppx_import is trying to solve. Namely, “I have a large set of types that I want to run a deriver over, but I don’t control those types myself.” The idea here is to replace duplicated type definitions with [%import: type]; using an example from the README:

type longident = [%import: Longident.t] [@@deriving show]

This approach resolves the types at build time, meaning any transitive constraints on the packages that you depend on will be respected. Even if that’s not a concern for you, this approach might save you needing to write scripts that munge source trees manually.

Hope this helps.

3 Likes

Thanks Craig, I did try [%import: Longident.t] first, but somehow could not get it to work. This might well have been due to my lack of OCaml experience. Does [%import: Longident.t] automatically pull code from a github repo?

1 Like

The idea is that ppx_import gets the source types of a library you depend on by checking its compiled interfaces at build time, and imports the necessary types “just in time” before building your library / executable. So the actual sourcing of the dependency is done with a package manager as usual (e.g. via an opam dependency), and then dune + ppx_import take care of the rest.

In this case, I would add a dependency on compiler-libs, since the Longident.t type is defined therein. For example:

(* ––– ./dune ––– *)

(executable
 (name main)
 (preprocess
  (staged_pps ppx_import ppx_deriving.show))
 (libraries compiler-libs.common))

(* ––– ./main.ml ––– *)

[@@@warning "-3" (* Ignore [Longident.parse] deprecation *)]

type longident = [%import: Longident.t] [@@deriving show]

let () = print_endline (show_longident (Longident.parse "Foo.Bar.baz" ))
1 Like

Hi, Zarith’s Z.t and Q.t provide string encoding/decoding functions, and it is normal to use those instead of encoding the internal contents of these types to JSON.

If you are trying to encode a different data type, say something like:

type some = { thing : Z.t }

Then you can use Yojson’s custom field annotation to tell it to encode Z.t as a string:

let z_to_json z = `String (Z.to_string z)

let z_of_json = function
  | `String s ->
    begin try Ok (Z.of_string s) with
    | Invalid_argument _ -> Error ("z_of_json: invalid z: " ^ s)
    end
  | _ -> Error "cannot decode to Z.t"

type some = {
  thing : Z.t
  [@to_yojson z_to_json]
  [@of_yojson z_of_json]
} [@@deriving yojson]

If type some is from a third-party module M then you would re-export the type as normal:

type some = M.some = {
  thing : Z.t
  [@to_yojson z_to_json]
  [@of_yojson z_of_json]
} [@@deriving yojson]

Note: I haven’t tested this but based on the docs it should work :slight_smile:

2 Likes

Martin, could I suggest you have a look at the tests that come with ppx_import? They’re actually pretty thorough, and I would start by figuring out how to build and run them myself. Then I would make some small example with my own types, and get that working, to test my knowledge.

Many of the PPX rewriters have extensive tests, and I’ve learned how they work mostly by understanding those tests, and figuring out how to build/run them myself, outside of the project (so, writing my own Makefiles (but I’m presuming you use dune, for you, dune files)).

If you prefer something with a bit less metaprogramming you can also use type aliases for this:

type z = Z.t

let z_of_yojson = ...
let z_to_yojson = ...

type some = {
  thing : z
} [@@deriving yojson]

(ppx_deriving and ppxlib will expand this as calls to z_to_yojson and z_of_yojson)

1 Like

The usual way json works in the real world is that a public API is defined, and then code is written in various languages to work with this API. If this is what you’re trying to achieve, as opposed say just pretty-printing, I recommend taking the time to learn how to use atd (please note I’m the original author). Unlike other magic code generators, it generates plain OCaml code which you can review and understand.

Regarding your specific problem, the solution with atd would involve copying the type definitions of all the external libraries into .atd files, using the <ocaml predef> annotation to reuse the external type definitions, and use the <ocaml from="Foo"> annotation to reference types from other atd files. This is all a bit annoying to set up initially but maintenance is easy.

1 Like

@mjambon thanks.

What I am trying to achieve is this: there is an external code base that I want to add a JSON exporter to. (Basically a pretty-printer, so I can feed the output to other programs.) Type definitions in the external code base are likely to evolve over time. I want to make a minimally invasive pull request, basically adding a line or two to the existing main.ml so it has a -export-json option. (I don’t need to import.) All the new code should be in external files, because I don’t think a pull request will be accepted if I change more than a few existing lines.

Your adt solution might do just that. Does adt also deal with the Zarith serialisation that @yawaramin mentioned?

Based on this info, I would do something like this (diff):

+type z = Z.t

+let z_to_yojson = ... (* As shown above *)

 type some = {
-  thing : Z.t
+  thing : z
- }
+ } [@@deriving to_yojson]

Then add the -export-json option where appropriate.

Yes, you can derive serializers for existing types with atd. See the predef annotation.

Again, the two main points of atd are:

  • ease of maintenance and durability, due to not depending on camlp4 or other high-maintenance technology.
  • ability to derive (de)serializers for other typed languages than ocaml without duplicating the atd files, which serve as the specification of the API.

If your goal is to reflect ocaml types directly in json, I’d say ppx_yojson looks more appropriate than atd.

1 Like

Thanks!

I got %import to work, but not recursively. Let’s say I have the following type definitions in a module that I can’t edit:

type ty1 = B1 of bool
type ty2 = C1 of ty1

Let’s assume I want to derive show for values inhabiting ty2 only, so I need

 show_ty2 : ty2 -> String

I want to be able to do this using only one explicit line

type ty2 = [%import: Stuff.ty2] [@@deriving show]

but that does not seem to work. I can get it to work like this:

type ty1 = [%import: Stuff.ty1] [@@deriving show]
type ty2 = [%import: Stuff.ty2] [@@deriving show]

So I seem to be missing a trick to make @@deriving work recursively. (My use case has 100s of intermediate types like ty1. I can use the [%import: ...] [@@deriving show] trick for all of them separately (and it would not be a huge deal with [%import: ...]), but I imagine this should not be necessary. The examples in ppx_import are all %import: ...ed separately.

What you’re asking for is a quite big change in how something like ppx_import would work. I’m not even sure it’s doable. In your example, you have two separate structure-items, each of which is a typedef, and the second refers to the first. So far so good. For a ppx_import to import the first along with the second, means that it must follow chains of dependencies. What if that first one were in a different compilation unit – should it find that compilation unit and pull in the definition?

I think it’s useful to remember that all of this is supposed to be happening -before- any type-checking, name-resolution, or anything else has happened: these are -syntactic- macros only.

Now, there is one thing that I think ppx_import can and should do: right now, it imports a single type and it is a type-expression-level PPX extension. If it were possible to write one at {structure,signature}-item level, that would cause the import of that type, and all the types in the same typedecl-group (viz. type t1 = .. and t2 = .. and t3 = ... ) that would be (to me) much more useful.

I implemented that in pa_ppx.import (part of camlp5/pa_ppx) and use it for example to succinctly write the imports required for pulling in the entire OCaml AST types. With some cppo, a single file suffices for importing all the OCaml AST types for all the versions of OCaml since 4.02.0: pa_ppx_migrate/ast.ORIG.ml at master · camlp5/pa_ppx_migrate · GitHub

To be sure, I’m not advocating that you use camlp5/pa_ppx ; rather, that this sort of function (importing entire typedecl-groups) would be a significant improvement in conciseness, while losing nothing in clarity.

Thanks very much, this is useful, it means that what I had in mind can’t work, that clarifies the issue. I can easily (but slightly laboriously) hack around this problem.