We’re planning on a Yojson2.0.0 release as we’ve already started breaking the API by removing the dependency on biniou.
Before going any further I wish to emphasize that nothing is set in stone yet and that’s mostly why I’m starting this thread. The github search does provide some partial answers for some of the questions we have but I want to hear the community thoughts before making any decisions. Hopefully this will help planning things out for a better 2.0.0.
First we’d like to reduce the complexity of the yojson's codebase and before considering so we’d be really interested in knowing which of Yojson, Yojson.Basic, Yojson.Safe or Yojson.Raw do you use and why?
We’d also like to know if anyone uses the custom Tuple and Variant constructions at all.
As you can imagine the idea is to get rid of those variants and have a single Yojson module with a single type t which would hopefully satisfy everyone. That would make it easier to remove the dependency on cppo and would make working on yojson a better and more accessible experience.
Another point of interest is the Util submodule. We’re wondering how widely used it is as we might consider improving its API a little bit.
Finally we’re curious how many people would be interested in yojson-lwt and yojson-async packages with non-blocking io.
Thanks for your work on such a central piece of OCaml’s ecosystem. I use mostly Yojson.Safe because it’s what integrates with ppx_deriving_yojson, but have used Yojson.Basic in the past.
Happy to see Yojson under active development. I’m only getting started with OCaml, however it’s for the purpose of rewriting a company’s systems from scratch (formerly: Perl back-ends and JavaScript front-end). I plan to use Yojson indirectly via ocaml-protoc-yojson, to get a strongly typed IDL (Protobuf) not specific to OCaml (which rules out atdgen) and with interchangeable binary and JSON wire formats so we can save work by implementing private IPC and public services with a single API.
Therefore my main concern is performance (and Yojson 1 already beats V8 there!) and also that ocaml-protoc-yojson despite being little-known, will hopefully remain compatible.
There are many ways to consume JSON ASTs: getters with exceptions, getters with result types, applicative combinators, etc. I don’t think it’s possible to make everybody happy with a manipulation API which is part of yojson itself. On the other hand, since yojson exposes the AST, it’s easy to have external libraries that do the manipulation.
One thing I’m curious about is location tracking: at the moment, yojson does not have a way to represent file locations in the AST (so that you could report errors during conversion “x.json: line 5, character 10: expected an integer, got a string” - note that it’s different from parse errors, which yojson handles already). Could the yojson parser expose that somehow?
I don’t think it’s possible to make everybody happy with a manipulation API which is part of yojson itself.
Why? Do we really need to have one library for yojson-getters-with-exceptions, another for yojson-getters-with-result-types, etc…? They could be different submodules within Yojson.Util.
I think it’s a good idea to collect feedback at this point.
There are different ways json is used, which comes different demands from users. And it’s possible that some of them are incompatible, or just too hard to reconciliate. Off the top of my head, here are some uses or requirements:
configuration files: parsing doesn’t have to be super fast, need for accurate locations, can benefit from json extensions that support comments (e.g. json5), benefits from good pretty-printing.
web APIs: need to support unusual data imposed to existing APIs (e.g. very long number literals, large data, malformed data worth supporting anyway, UTF-8 validation, support for non-UTF-8 encodings)
big data: need for speed, accommodation for parser/printer generators like atdgen
simple/transient use of json: convenient way of accessing this or that field without worrying about types, ideally with a javascript-like syntax like data.items[0].id
I don’t have specific solutions in mind for making these things happen. I just want to point out that things like fast parsing and reporting useful error locations may be incompatible or difficult. In this case, multiple implementations and possibly multiple APIs may be easier to manage than a single one that tries to be good at everything.
Personally the only variant I ever use is Yojson.Safe if only for the sole reason that ppx_deriving_yojson uses it.
I think some of these concerns contradict themselves. It is going to be rather difficult to retain location information without completely overhauling the AST that Yojson produces, thus making it a good deal bigger. On the other hand, multiple ASTs are an issue because it is likely that only one is going to get used (as happened with Yojson.Safe to the point where removing all the others is in consideration as indicated by this thread) or multiple are going to get used at the same time, thus creating a compatibility nightmare where library A uses Yojson.A and library B requires Yojson.B and now you have to convert.
What I personally would like to see is more standard compliance, like evaluating Yojson as part of JSONTestSuite but of course that is rather unlikely to cause compatibility issues. In fact there is already an issue about it, but so far I’ve been concentrating a mix between relatively low effort and comparatively big payoff code changes.
I didn’t actually know what type we used, except that it was whatever ppx_deriving_yojson gave us. But I remember being super confused by the two types when I started, esp what to do with Variant and Tuple.
It is actually 4 types, Yojson.json, Yojson.Basic.json, Yojson.Safe.json and Yojson.Raw.json (now all renamed to t respectively) which is I think one of the reasons @NathanReb considers simplifying the code base.
I’ve used Yojson.Safe.json b/c of ppx_deriving_yojson (which is extremely useful). But definitely would love to stick with Yojson.Basic.json instead as has only standard JSON AST.
I’ve been using Yojson.Basic for most code I write, purely by habit.
As for features I’d like:
API that uses data types rather than exceptions, at this point I’ve written a bunch of these for work. Having the option to use options rather than exceptions would be great.
location error messages for parsing config and JSON api requests
support for programmatically constructing JSON types, on a number of occasions constructing something using the polymorphic variant encoding yields a large incomprehensible error message. This is partly cause OCaml doesn’t do as good a job as it could printing out these errors and the choice of polymorphic variants for the encoding.
It would be nice to have standard of_string instead of (or in addition) from_string to serialize, as well as functions returning result instead of throwing exceptions.
Util seems quite useless, it would be nice to have a library of basic constructors instead (for basic types, tuples, list, array etc).
I’m using JSON to exchange data with third-party tools, servers, softwares, etc.
So, I shall never produce Tuple or Variant syntax and I would never receive them after parsing.
Hence, I’m always using Yojson.Basic ; however I would like to have the following features in addition to the existing ones:
both exception-based and exception-free conversion functions ;
both a nicely indented pretty-printer and a very compact one ;
constructors for large numbers, namely 'Intlit and 'Floatlit, available from Yojson.Raw ;
constructors with comments and/or file locations, with auto-unrolling conversion functions, something like:
type t = ... | `Loc( string , int , t )
let rec unroll = function `Loc(_,_,t) -> unroll t | #t as js-> js
let to_int js = match unroll js with `Int n -> n | _ -> raise ...
At Ahrefs, we are using pretty extensively Basic and Safe when reading/writting json by hand. We used to have a usage or Raw, but it seems to be gone for now. We use Util often too, mostly the convertors and a little bit the filters.
I think a json5 parser would also be super cool, as it would be a good candidate to fill the need of a decent configuration language in OCaml. So far I’m not liking toml (and its API) very much, raw json lacks comments and trailing commas, yaml is a nightmare, and xml is too verbose
Have you thought of releasing yojson 2.0 as a new package (with a different name)?
One big pro for this is that both yojson 1.x and yojson 2.x (named differently) may be installed and used at the same time. Thus we don’t have to split installable package set into two parts (using yosjon 1.x and yojson 2.x). Considering that yojson is so widely used (even merlin uses it and it’s installed 90% of time I think) that maybe an issue.