[json-data-encoding] Recursive type encoding

Hi.
I’m trying to encode OCaml type via json-data-encoding (https://gitlab.com/nomadic-labs/json-data-encoding/) to make JSON schema.
The hardest part is self recursive type like this:

type t = {
  self: t option;
  int32f: int32;
  int64f: int64;
  boolf: bool;
}

Any suggestion? There is Json_encoding.mu function, but I can’t express the recursive field self with it.

Hi,

The json-data-encoding library is definitely capable of defining encodings for this type. It can be a bit difficult to get into, so here is a quick walkthrough of a simple solution. Don’t hesitate to ask if you have more questions!

Prelude

First, setting up some minimal build environment. This simple dune file lets you compile the file, but also can get you all the Merlin features working.

$ cat dune
(executables
 (names jde)
 (libraries json-data-encoding)
 (flags :standard))

Then the main file, starting with the exact copy of your type declaration:

$ cat jde.ml

type t = {
  self: t option;
  int32f: int32;
  int64f: int64;
  boolf: bool;
}

We need to do one last preparatory work: define an encoding for int64. This is necessary because json-data-encoding doesn’t provide an encoding for that type. Why? Because Javascript represents numbers as floats and floats are only dense enough to represent integers up to int53 after which it just misses more and more ints as you get higher.

I’m going for a dead simple representation that’s not ideal. The string representation of it is going to be relatively compact, but the encoding/decoding uses conversion functions, some form of parsing, etc. You can maybe do better with, say, a pair of int32.

let int64 =
  let open Json_encoding in
  conv
    Int64.to_string
    Int64.of_string
    string

Core: mu

With that set up, we can define the encoding directly. The use of mu can be weird, but the core idea is: you give it a name (as a string) just so it has a name for it internally, and then you give it a function where the single parameter is a stand in for the whole of the body.

There’s also a necessary step where you transform your record into a tuple. This is necessary because the library provides encodings for tuples which are a generic enough form of data that it can be manipulated by libraries.

let encoding : t Json_encoding.encoding =
  let open Json_encoding in
  mu
    "self"
    (fun self ->
       conv
         (fun {self; int32f; int64f; boolf} -> (self, int32f, int64f, boolf))
         (fun (self, int32f, int64f, boolf) -> {self; int32f; int64f; boolf})
       @@ obj4
         (opt "self" self)
         (req "int32f" int32)
         (req "int64f" int64)
         (req "boolf" bool))

Postlude

You can get the schema and then print it very easily as follows:

let schema =
  Json_encoding.schema encoding

let () =
  Format.printf
    "%a\n%!"
    (Json_repr.pp (module Json_repr.Ezjsonm))
    (Json_schema.to_json schema)

And then you can just compile and run:

$ dune build
$ ./_build/default/jde.exe
{ "$schema": "http://json-schema.org/draft-04/schema#",
  "$ref": "#/definitions/self",
  "definitions":
    { "self":
        { "type": "object",
          "properties":
            { "self": { "$ref": "#/definitions/self" },
              "int32f":
                { "type": "integer", "minimum": -2147483648,
                  "maximum": 2147483647 }, "int64f": { "type": "string" },
              "boolf": { "type": "boolean" } },
          "required": [ "boolf", "int64f", "int32f" ],
          "additionalProperties": false } } }
2 Likes

Awesome! Thanks!
Does this library has tools for definition manipulation to transform schema above to the next form?

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "self": {
      "$ref": "#/definitions/self"
    },
    "int32f": {
      "type": "integer",
      "minimum": -2147483648,
      "maximum": 2147483647
    },
    "int64f": {
      "type": "string"
    },
    "boolf": {
      "type": "boolean"
    }
  },
  "required": [
    "boolf",
    "int64f",
    "int32f"
  ],
  "additionalProperties": false,
  "definitions": {
    "self": {
      "$ref": "#"
    }
  }
}

I’m trying to encode self field as required nullable property:

let encoding : t Json_encoding.encoding =
  let open Json_encoding in
  mu
    "self"
    (fun self ->
       conv
         (fun {self; int32f; int64f; boolf} -> (self, int32f, int64f, boolf))
         (fun (self, int32f, int64f, boolf) -> {self; int32f; int64f; boolf})
       @@ obj4
         (req "self" (option self))
         (req "int32f" int32)
         (req "int64f" int64)
         (req "boolf" bool))

but runtime parsing doesn’t work for this encoding.

I’m not very familiar with the json-schema. I mostly maintain the encoding part of the library. If the transformation you want to apply is very regular and predictable, you can use the Json_query module. It has a set of primitives to:

  • query the content of a json value (to, say, check that the value has a form that you want to change)
  • insert/replace parts of the json value.

It is a sort of jq as a (OCaml) library.

2 Likes

This fails because of an important underlying reason: if you represent None as null and Some v directly as the encoding of v, then you cannot differentiate None and Some None. As a result, you lose the roundtrip property when encoding 'a option option.

(A similar issue happens if you represent the empty list as null, etc.)

If you can guarantee that you don’t nest option, then you can use custom to write your own encoder. I haven’t built a whole solution for that, but it’d look something like this:

let nullable sch e =
  let open Json_encoding in
  custom
  (function None -> `Null | Some v -> construct e v)
  (function `Null -> None | j -> Some (destruct e j))
  ~schema:sch

You might even be able to generate the schema automatically by extracting the one from e and then wrangling it to a nullable form.

You would then be able to use (req "self" (nullable sch self)) with the appropriate schema sch (or none if you manage to generate it automatically).

1 Like

Thanks.
This code works:

type t = {
  self: t option;
  int32f: int32;
  boolf: bool;
}

let nullable e =
  let open Json_encoding in
  custom
    (function None -> `Null | Some v -> construct e v)
    (function `Null -> None | j -> Some (destruct e j))
    ~schema:(schema (union [
        case (null) (fun _ -> None) (fun _ -> []) ;
        case (e) (fun _ -> None) (fun _ -> []) ;
      ]))

let encoding =
  let open Json_encoding in
  mu
    "self"
    (fun self ->
       conv
         (fun {self; int32f; boolf} -> (self, int32f, boolf))
         (fun (self, int32f, boolf) -> {self; int32f; boolf})
       @@ obj3
         (req "self" (nullable (self)))
         (req "int32f" int32)
         (req "boolf" bool))

and generates what I need:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "$ref": "#/definitions/self",
  "definitions": {
    "self": {
      "type": "object",
      "properties": {
        "self": {
          "oneOf": [
            {
              "type": "null"
            },
            {
              "$ref": "#/definitions/self"
            }
          ]
        },
        "int32f": {
          "type": "integer",
          "minimum": -2147483648,
          "maximum": 2147483647
        },
        "boolf": {
          "type": "boolean"
        }
      },
      "required": [
        "boolf",
        "int32f",
        "self"
      ],
      "additionalProperties": false
    }
  }
}

but I’m not sure about right case function usage.

In your use case (i.e., for generating a schema), the functions you pass to case are not important. In other circumstances, they are used as constructor/destructor for the values. E.g.,

type 'a t = Set of 'a list | Singleton of 'a

let encoding alpha =
  let open Json_encoding in
  union [
    case
      (list alpha) (* encoding for the payload of variant Set *)
      (function
        | Set l -> Some l (* recover payload of variant Set *)
        | _ -> None) (* or "fail" to recover if not Set *)
      (fun l -> Set l); (* construct a variant Set from its payload *)
    case
      alpha (* encoding for the payload of variant Singleton *)
      (function Singleton a -> Some a | _ -> None) (* destructor for Singleton *)
      (fun a -> Singleton a); (* constructor for Singleton *)
  ]

Note, however, that unions are not automatically “tagged” in Json data-encoding. So if you try to make a naive union encoding for type t = A of foo | B of foo then you probably won’t roundtrip. Instead, you’ll need to introduce field names in your union. You can do this by wrapping each case in an obj with a single named field. Something along the lines of

type 'a t = Sorted of 'a list | Unsorted of 'a list

let encoding alpha =
  let open Json_encoding in
  union [
    case
      (obj1 (req "sorted" @@ list alpha))
      (function | Sorted l -> Some l | _ -> None)
      (fun l -> Sorted l);
    case
      (obj1 (req "unsorted" @@ list alpha))
      (function | Unsorted l -> Some l | _ -> None)
      (fun l -> Unsorted l);
  ]

Or, depending on how you want the tags to work in your schema

type 'a t = Sorted of 'a list | Unsorted of 'a list

let encoding alpha =
  let open Json_encoding in
  union [
    case
      (obj2
         (req "variant" @@ constant "sorted")
         (req "payload" @@ list alpha))
      (function | Sorted l -> Some ((), l) | _ -> None)
      (fun ((), l) -> Sorted l);
    case
      (obj2
         (req "variant" @@ constant "unsorted")
         (req "payload" @@ list alpha))
      (function | Unsorted l -> Some ((), l) | _ -> None)
      (fun ((), l) -> Unsorted l);
  ]
1 Like

Very useful examples. Thank you.

1 Like

I have a question about Json_query.
How Json_query.path_of_json_pointer works with wildcards (any examples)?
I know only property name and want to find it and change value for it.

P.S. Maybe yojson manipulation functions helps too. Or I can just traverse across whole json and replace needed `Assoc.

1 Like

I haven’t used that part of the library. From a quick read at the doc and the source code this is what I gather: you cannot make any `Star paths using this function. The wildcards parameter controls support for `Next.

If you need wildcards, you can build the path manually using the constructors directly. Or you can mix and match: make the bulk of the path using the path_of_json_pointer function and append a `Star at the end.

The function query (or query_all if you are using wildcards and suspect there might be several occurrences) will find the value. You can then inspect it or you can create a new value based on it.

The function replace will change some inner part of a given value. It’s in a functional, immutable style so here’s a full example:

(* [apply j path f] is [j] except that the part of [j] at [path]
    is modified by [f] *)
let apply j path f =
  let chunk = query path j in
  let new_chunk = … (* new value based on chunk *) in
  let jj = replace path new_chunk j in
  jj

Thanks. I think manual yojson type manipulation would be easiest way in my case.

@raphael-proust
I found an issue with json-data-encoding library.
Valid schema (according to https://www.jsonschemavalidator.net/) doesn’t parsed.
I think the reason is "$ref": "#" reference.

let schema = {|
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "self1": {
      "oneOf": [
        {
          "$ref": "#"
        },
        {
          "type": "null"
        }
      ]
    },
    "int32f": {
      "type": "integer",
      "minimum": -2147483648.0,
      "maximum": 2147483647.0
    }
  },
  "required": [
    "int32f",
    "self1"
  ],
  "additionalProperties": false
}
|}
let _ = schema |> Yojson.Safe.from_string |> Json_repr.from_yojson |> Json_schema.of_json
(* Fatal error: exception Json_schema.Cannot_parse(0, _)*)

Thanks! I’ve opened an issue about it: https://gitlab.com/nomadic-labs/json-data-encoding/-/issues/4

Don’t hesistate to open other issues if you find other problems, or add examples related to the same issue.

1 Like