[ANN] Jsont 0.1.0 – Declarative JSON data manipulation for OCaml

Hello,

It’s my pleasure to announce the first release of the jsont libary:

Jsont is an OCaml library for declarative JSON data manipulation. It provides:

  • Combinators for describing JSON data using the OCaml values of your choice. The descriptions can be used by generic functions to decode, encode, query and update JSON data without having to construct a generic JSON representation.
  • A JSON codec with optional text location tracking and layout preservation. The codec is compatible with effect-based concurrency.

The descriptions are independent from the codec and can be used by third-party processors or codecs.

Jsont is distributed under the ISC license. It has no dependencies. The codec is optional and depends on the bytesrw library. The JavaScript support is optional and depends on the brr library.

The library has been used in practice but it’s new so a few adjustments may be needed and more convenience combinators added.

The library also enables quite a few things that I did not have the time to explore like schema description generation from descriptions, quasi-streaming JSON transformations, description generation from dynamic type representations, etc. Lots of this can be done outside the core library, do not hesitate to get in touch if you use the library and find interesting applications or pesking limitations.

Homepage: https://erratique.ch/software/jsont
Docs: https://erratique.ch/software/jsont/doc (or odig doc jsont)
Install: opam install jsont bytesrw

This first release was made possible thanks to a grant from the OCaml Software Foundation. I also thank my donors for their support.

Best,

Daniel

P.S. I think that the technique used by the library, which I dubbed finally tagged is interesting in itself. You can read a paper about it here along with a smaller, self-contained, implementation of what the library does.

11 Likes

Since programmers are always curious about performance, and also a bit irrational about it :–) I just want to throw in a few numbers to nourrish these irrationalities and convice you that despite invoking bazillions of functions jsont remains competitive. Don’t take the numbers below too seriously, except for ruling out that jsont is not incredibly slow. In practice performance profiles are bound to be quite data dependent (floating point parsing, character data beyond ASCII, out-of-order case members, etc.)

I benchmarked a few tools that decode and minify JSON on a particular 78MB file of GeoJSON data that I found online. We try to keep things comparable (e.g. w.r.t. to source layout and location tracking) among the tools but it’s a bit difficult, for example Yojson is notorious for not checking UTF-8 validity on input (an unforgivable sin). This is measured on an ARM64 M2 with 16GB of memory and OCaml 5.2.0.

We compare json_xs (perl with C bindings), jq, ydump (distributed with yojson), jsontrip (distributed with jsonm), jsont (distributed with jsont) and geojson which is a direct modelling of GeoJSON with the jsont library (i.e. it codecs GeoJSON without going though a generic JSON representation):

Benchmark 1: json_xs -t json < tmp/parcels.json
  Time (mean ± σ):      1.344 s ±  0.008 s    [User: 1.244 s, System: 0.093 s]
  Range (min … max):    1.334 s …  1.359 s    10 runs
 
Benchmark 1: jq -c . tmp/parcels.json
  Time (mean ± σ):      1.930 s ±  0.015 s    [User: 1.780 s, System: 0.145 s]
  Range (min … max):    1.918 s …  1.965 s    10 runs
  
Benchmark 1: ydump -std -c tmp/parcels.json
  Time (mean ± σ):      3.647 s ±  0.013 s    [User: 3.529 s, System: 0.112 s]
  Range (min … max):    3.630 s …  3.677 s    10 runs
 
Benchmark 1: jsontrip tmp/parcels.json
  Time (mean ± σ):      3.059 s ±  0.009 s    [User: 3.013 s, System: 0.045 s]
  Range (min … max):    3.041 s …  3.075 s    10 runs

Benchmark 1: jsont fmt -fminify tmp/parcels.json
  Time (mean ± σ):      2.175 s ±  0.006 s    [User: 2.097 s, System: 0.073 s]
  Range (min … max):    2.168 s …  2.189 s    10 runs

Benchmark 1: geojson tmp/parcels.json
  Time (mean ± σ):      1.846 s ±  0.003 s    [User: 1.798 s, System: 0.044 s]
  Range (min … max):    1.843 s …  1.851 s    10 runs

Note that on encoding the bottleneck is on formatting floating point numbers of which that data file is littered with. So the difference between using jsont fmt to decode to a generic representation or using geojson, which directly models GeoJSON, is a bit lost. This shows decoding only on the tools that support it:

Benchmark 1: json_xs -t none < tmp/parcels.json
  Time (mean ± σ):     440.7 ms ±   2.4 ms    [User: 379.7 ms, System: 54.9 ms]
  Range (min … max):   437.8 ms … 445.7 ms    10 runs
 
Benchmark 1: jsontrip -dec tmp/parcels.json
  Time (mean ± σ):      1.557 s ±  0.003 s    [User: 1.529 s, System: 0.027 s]
  Range (min … max):    1.553 s …  1.561 s    10 runs

Benchmark 1: jsont fmt -d tmp/parcels.json
  Time (mean ± σ):      1.100 s ±  0.004 s    [User: 1.039 s, System: 0.056 s]
  Range (min … max):    1.095 s …  1.107 s    10 runs
  
Benchmark 1: geojson -d tmp/parcels.json
  Time (mean ± σ):     798.0 ms ±   1.5 ms    [User: 766.8 ms, System: 28.6 ms]
  Range (min … max):   796.1 ms … 800.3 ms    10 runs
2 Likes

Finally I’d like to say something about usability and then I will shut up :–)

If you program in a language like JavaScript knowing that you will get data as JSON is always a relief: it means no work for getting the data in and out. At least so you think, until you dynamically realize that the data producer is not, or no longer, exactly producing what it told you it would.

So far I would not enjoy the same relief when I knew I’d have to deal with JSON in my OCaml programs. It is one of the goals of jsont to bring that.

Using jsont will still entail more work than in JavaScript, the descriptions (or queries) have to be written. But that extra work allows you to work with natural OCaml datatype definitions, and, when producers start lying, you will get nice error messages with locations like (here for the GeoJSON modelling mentioned before):

Error: Unexpected enum string value: Tapology. Should it be Topology ?
       File "tmp/topology.json", line 2, characters 10-20:
       File "tmp/topology.json", line 2, characters 2-8: in member type of
       File "tmp/topology.json", lines 1-2, characters 0-20: Topology object
       
Error: Unexpected member type value in Geometry object: Curve. Must be Point,
       MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon or
       GeometryCollection.
       File "tmp/topology.json", line 7, characters 21-22:
       File "tmp/topology.json", line 7, characters 6-12: in member type of
       File "tmp/topology.json", lines 6-7, characters 15-22: Geometry object
       File "tmp/topology.json", line 6, characters 4-13: in member example of
       File "tmp/topology.json", lines 5-7, characters 13-22: objects map object
       File "tmp/topology.json", line 5, characters 2-11: in member objects of
       File "tmp/topology.json", lines 1-7, characters 0-22: Topology object

Error: Missing member coordinates in Point object
       File "tmp/topology.json", lines 9-16, characters 8-9:
       File "tmp/topology.json", lines 9-16, characters 8-9: at index 0 of
       File "tmp/topology.json", lines 8-16, characters 20-9: array<Geometry object>
       File "tmp/topology.json", line 8, characters 6-18: in member geometries of
       File "tmp/topology.json", lines 6-16, characters 15-9: GeometryCollection object
       File "tmp/topology.json", line 6, characters 4-13: in member example of
       File "tmp/topology.json", lines 5-16, characters 13-9: objects map object
       File "tmp/topology.json", line 5, characters 2-11: in member objects of
       File "tmp/topology.json", lines 1-16, characters 0-9: Topology object

And of course all this happens in OCaml, free of any kind of ppx nonsense. The result is flexible and lightweight to use and works wonders against bit rot.

Do not fear the extra modelling and boilerplateish step!

1 Like

Nice, I hadn’t heard of jsont before. While I understand not being a fan of the ppx stuff, I do have to admit that doing [@@deriving yojson] is so simple I can’t help but use it, but the error messages are atrocious. I’m hoping a ppx_deriving for jsont happens.

Well it’s the first release :–)

Perhaps but that kind of thing doesn’t really help with dealing with JSON that you don’t control. I always feel it’s not worth the trouble.

That being said the amazing @art-w came up with something that I wanted to have in the library before prioritizing to solve other problems.

His let operator proposal allows to deal with labelled object constructors (which is less footgunish once you start dealing with a lot of fields with the same types, e.g. that’s how @smondet convinced me to add let operators to cmdliner). I will have a closer look and may standardize the object construction on his proposal so if you plan to use the library maybe stay tuned for 0.2.0 as it may entail a few breaking changes.

9 Likes

Nice. I think let-syntax is great for this too eg [ANN] dream-html & pure-html 3.5.2 - #5 by yawaramin

Isn’t the object creation essentially an applicative?

Yes, and you can also use a monadic bind to add further rules. Eg (from my library):

let* start_date = required unix_tm "start-date" in
let+ end_date = required (unix_tm ~min:start_date) "end-date" in
...

No. It’s a more complicated structure because you need the return type of the application when you apply members. The return type is used by member specifications to specify the projection function used on the result of the application when it’s time to encode back to JSON.

It all started with a simple applicative for decoding generic JSON in memory, but as I wrote here I was frustated that these applicative decodes specifications would not allow me to encode. Having solved the encode (an the ability to support a few other JSON object codec patterns) I lost the obvious applicative on the way – and my attempts at reframing it as an applicative were not successful.

Now @art-w with some contorsions managed to reframe it as an applicative. But in my enthusiasm for his proposal I failed to see that it seems that he his repeatedly constructing pairs for applying the object constructor which I’m not really happy with as it brings quite a few more allocations for object construction that are not there with the API I settled on. I will have a look in the upcoming days if we keep the current way or switch to @art-w’s scheme.

1 Like

I don’t claim domain expertise - I just wanted to share the following:

A parametrized abstraction for a codec typically consists of two parts: a reader and a writer.

Separating these two parts can be beneficial, at least in the private implementation (and perhaps even in the API, I’m not certain).

The reader part, being a producer of 'a is covariant in its parameter. It can often be a great candidate for being an applicative.

The writer part, being a consumer of 'a is contravariant in its parameter. It cannot provide a map function; instead, it would be a contra_map.

module Writer : sig
  type 'a t

  val contra_map : 'a t -> f:('b -> 'a) -> 'b t
end

As such, the combined codec cannot be an applicative. I don’t remember much about this, but I vaguely recall using a profunctor library to help with these kind of things in the past.

1 Like

Oh yes, obviously. I was so focused on decoding I totally forgot about encoding.

I believe the problem of embedding/projecting between a typed and an untyped language is also explored in this paper, which uses OCaml, too.