Anybody thought about how to support labeled tuples ?
So for the type (int * bool), the value (5, false) gets serialized as [5, false].
So maybe for the type (a:int * b:bool) the value (~a:5, ~b:true) might get serialized as [["a",5], ["b",false]] ?
I’m implementing this support for pa_ppx.deriving_plugins.yojson, so I have to make a decision about how to do it, so if you already had a decision made, I’d be happy to follow along.
ETA: and it appears that the same question is relevant for SEXP serializers – e.g. ppx_sexp_conv. They don’t (yet) support labeled tuples, and I see no issue describing how they intend to do so; surely somebody’s working on it, and it’d be good to know what the intended format will be.
My first instinct would be to have a normal JSON object, not an array of arrays. But that doesn’t work when some components are labelled and some aren’t.
Another viewpoint could be that the labels are just for convenience and type safety in the OCaml code and shouldn’t be reflected in serialization at all. And if you want them there, then records would be the better approach anyway.
I don’t know the latest state of ppxlib and recent OCaml version support, but perhaps that’s what you might already get with some default migrations which ignore the labels?
I had the same gut feeling that @Chet_Murthy’s encoding not very idiomatic for JSON and quite annoying to process if you are not the original encoder.
The other problem with an object though is that the tuples are ordered and JSON members are not. Somehow you need to be able to recover the order on decoding if the members have been swapped by a processor (which is legit as far as the format is concerned).
One idea would be to encode to objects with a convention in object members so that the decoder can sort it out, namely you encode each component to a field named $(name)-$(pos) with $(name) the (possibly empty) label component and and $(pos) its zero-based position.
According to the JSON specification, the order of the serialisation is not guarantee for objects, then an array seems to be the best thing. It can be plain array (without labels), or an array of arrays (with labels). A plain array won’t prevent to deserialise exactly what has been serialised before.
You’ve convinced me! I’ll do a plain array, without labels. The argument that a array where some of the elements are arrays, and some are not, would be clumsy and not natural JSON, is convincing. As well as the fact that omitting the labels doesn’t prevent demarshalling.
The other problem with an object though is that the tuples are ordered and JSON members are not. Somehow you need to be able to recover the order on decoding if the members have been swapped by a processor (which is legit as far as the format is concerned).
Why is that a problem? Decoders derived from record types already do that. In melange-json.ppx deriving we consider object encoding for labeled tuples (PR).
Yes of course no problem, the decoder knows the named order (you just need to invent names for those non-labelled components).
But note that the resulting data is not self-described for other processors.
OTOH [OK, it seems Frederic really has convinced me grin] is self-description necessary ? A value of type
int * int * int
could be a value of type int array right?
Musing a little while, it seems like a consideration might be "if we marshal to this representation (unlabeled arrays vs dictionaries) will it make it easier or harder to interoperate with other languages’ JSON marshallers. This is difficult to judge (it’s been a long time since I hacked on JSON marshallers for other languages) but it would seem like this argument goes strongly towards “unlabeled arrays” – It would seem unlikely that other languages would support something like labeled tuples whereas many of them support something like tuples.
It could be that I’m biased b/c I already implemented this (haha, it’s so easy to just erase the labels and keep goin’).
Well most language support labelled tuples, that’s just records/objects or whatever hashtable your other language is using. So you can see labelled tuples as anonymous records. But I’m also fine with your view that these labels are just annotating the components of an ordered tuple.
I’m not sure why it was so pressing to add yet another way to define datatypes in OCaml but then I have made peace with the idea that OCaml is not a designed language, it just aggregates features, it wants to be the C++ of MLs :–)
Haha, I should have been clearer: I meant "labeled tuples (à la OCaml) where only some of the members get labels, not either all or none. It seems to me like that means you cannot treat them as anonymous records – you cannot reorder the members in expressions or patterns (IIRC – I’m not going to check, but that’s my memory from when I experimented with them back when they came out) after all.
In any case, I’m going to follow whatever ppx_sexp_conv and ppx_deriving_yojson do for their respective type-derivers.
Not in OCaml (though the actual memory layout is also fixed for records) but who cares for language interop ?
And if you don’t have a label just use a Printf.sprintf {|"%d"|} n for the nth labeless component member name. That’s a perfectly legit JSON member name and not an OCaml label name (I guess) so there’s no confusion.
giggle
A fair cop. Just for clarity, for
(~x:5, false, "foo") : (x:int * bool * y:string)
there seem to be three different representations:
[5, false, "foo"]
[["x", 5], false, ["y", "foo"]]
{ "x": 5, |2|: false, "y": "foo"}
I guess it is a symptom of how completely @Frederic_Loyer has convinced me, that I find the first to be the most … pleasant. Which doesn’t mean much, but I felt I should note it.
The ppx_sexp_conv folks chimed in ( OCaml 5.4 support · Issue #43 · janestreet/ppx_sexp_conv · GitHub ) and they’re going with
((~x 5) (. false) (~y foo))
[I hope I got the quoting of atoms right]
I wonder if you (and @Frederic_Loyer and anybody else) might have a suggestion for a way to support both #1 and #3 – perhaps an attribute that indicates that #1 is desired ?
Assuming the . is for any unnamed component (and thus will be repeated in the list) that strikes me as a particularly bad encoding. If you take the perspective of using generic tool to process s-expressions then neither can you query this as a dictionary (assuming you have the right encoding…) nor directly as a list (tuple). The best of both worlds!
That |2| should rather read "1". I personally much prefer this encoding because:
- It avoids dealing with heterogenous arrays (e.g. in
jsont they entail decoding to a generic representation because it optimizes for uniform arrays).
- It allows user friendly (named) queries.
But then this assumes you are interested in querying your serializations with generic JSON tools.