Have you heard of “jsonnet” ? It’s a weird functional language, that computes over JSON. Erm, that is, its origins are a language that computes over protocol buffers, but since protobufs are really JSON … Places like Databricks use it to compute the config-files for cloud deployments and such.
That is to say, computing over JSON is a big, big, big problem, and “well, write a bunch of code in your favorite programming language” isn’t an answer (as you rightly note) b/c boy howdy, that’ll take forever, and be about as maintainable as “write a bunch of code that accesses btree indexes directly” was at the dawn of relational databases.
We need query and computation languages over JSON for the same reason we needed relational query languages.
YAML ought to be[1] just another syntax for JSON. Most people who use YAML, use it as precisely that. And boy howdy, everything you need for JSON, you also need for YAML. B/c the size of these YAML files that are generated by auto-configurators, and that then you have to modify by hand … geeeeeez.
[1] YAML has these weird bits of syntax that most people don’t use, b/c they recognize that different YAML parsers accept different subsets of the language. It’s all a big mess. I came up with my own subset, designed so that one could write a parser in any language that would accept that language on-the-nose, but hey, not like I can convince anybody to use it: GitHub - chetmurthy/yay: YAY Ain't YAML
You write stuff like JSON.(json |-> "people" |=> index |-> "name" |> as_string).
One nice feature is error messages give nice locations. For instance, assuming json has an origin of "file.json" (the origin is set at parse time), you can get errors like:
file.json, at people: not an array
file.json, at people.[42]: not an object
file.json, at people.[42]: missing field: name
file.json, at people.[42].name: not a string
A typical example to decode a record would be:
let decode_vector json = {
x = JSON.(json |-> "x" |> as_float);
y = JSON.(json |-> "y" |> as_float);
}
One drawback is this approach is quadratic in the number of record fields though. The alternative is to iterate over each field one by one, setting mutable values along the way, before packing all those mutable values in a record, but that’s quite annoying to write.
I’m sure something like this can be done for s-expressions.
I fail to see how a query language would help to convert JSON values to actual OCaml records, do you have an example? Unless the query language comes with a PPX or something like that, in which case the problem would be solved by the PPX itself, not the query language, right?
I was responding to @gasche 's original problem. There, the issue is to construct some cut-down data-structure from the original full sexp/JSON. In an earlier comment in this thread, I mentioned that I’d written an OCaml implementation of @stedolan 's jq; so did someone else: Query-json: Re-implemented jq in Reason Native/OCaml ) ; two thoughts:
this would allow the query-engine to produce an OCaml JSON value
at least when I wrote my interpreter, it was straightforward to imagine how to produce instead a code-generator, which could easily be converted into a PPX.
Now that I think about it, it seems … obvious that we could repurpose jq to solve your problem pretty much on-the-nose. Imagine:
s-expressions of the form ((a b) (c d)...) are treated as JSON dicts.
otherwise, s-expressions of the form (e1 e2 ...) are treated as JSON lists
other cons nodes are errors. Or maybe we invent syntax to do car/cadr
and everything else maps to strings.
This is a simple transformation of JSON to s-expressions, and you could construct the reverse, so that sexp->json->sexp is the identity function (ignoring case #3).
Then, one could just reuse JQ to do the querying.
It seems like that’d solve your problem? And since there are two different JQ implementations in OCaml …
I just wrote a parser for my input data using Decoders_sexplib, and the result works. It’s the only library recommended in the thread that solves my specific problem, so far.
open Decoders_sexplib.Decode
let module_deps_decoder =
let+ for_intf = field "for_intf" (list string)
and+ for_impl = field "for_impl" (list string)
in for_intf @ for_impl
let module_decoder entry_name =
let+ name = field "name" string
and+ impl = field "impl" (list string) |> map List.hd
and+ deps = field "module_deps" module_deps_decoder
in (entry_name, name, impl, deps)
let exec_decoder =
let* entry_name = field "names" (list string) |> map List.hd in
field "modules" (list (module_decoder entry_name))
let lib_decoder =
let* entry_name = field "name" string in
field "modules" (list (module_decoder entry_name))
let entry_decoder =
list_filter (
string |> uncons @@ fun kind ->
match kind with
| "executables" -> let+ v = list exec_decoder in Some v
| "library" -> let+ v = list lib_decoder in Some v
| _ -> succeed None
)
|> map List.flatten
|> map List.flatten
Note that entry_decoder is an example of a decoder working on a sum type / variant as you mention:
in this example I use list_filter to only handle the variants executable and library and ignore the others
there is an extra level of list wrapping (and List.flatten in the result), due to I think the inner working of the Decoders library which was designed with JSON rather than s-exprs in mind. I’m not sure but I think that it normalizes (polyline foo bar) into something like (polyline (foo bar)).
Yes, gitlab.inria.fr is a bad place to host community-oriented free software. Unfortunately the admins are aware of it and they don’t want to change this (and don’t have the workforce resources to change it), and most users don’t think too deeply about the implications of their hosting choice, or are not aware of the problem. I think the best route is to ping the authors kindly (here @esope) to see if they would consider hosting their software on gitlab.com instead – or some other place.
(Hopefully those problems will magically solve themselves once we have proper federation between git forges…)
Just curious what didn’t work with Sexpq ? I didn’t follow closely but I don’t see what you couldn’t possibly express.
That’s one of problem with s-expressions. There is no well defined encoding of dictionnaries, or rather people who write them by hand do not want to use the clean ones.
In lisp you would write them as a list of bindings, a binding being (key . <s-exp>).
In config files it seems no one wants to write that .. You could do (key <s-exp>) but it seems again no one wants to write the extra parens when you bind a key to a list.
So we end up with this bastardized notion of binding which is not so great since you can no longer distinguish between a binding to a singleton list and a binding to an atom without external knowledge (it also makes substitution and other operations harder than it could be).
I didn’t try serialk because I understand that it’s not released / available on OPAM yet. The design looks nice, but I’m planning to include my sexp-extraction code in a PR in an upstream project and I want to stay with opam-released dependencies.
Unrelated: one thing I appreciate about Decoders (and I guess Serialk too) is that thought was given to error reporting. It’s not something that my quick&dirty hand-written do, and I think that’s a large part of the value of using a specialized library.
Re the extra level of list wrapping: this is not intrinsic to Decoders itself, but it is intrinsic to the uncons combinator. uncons peels off the head of the list, but the tail is still a list.
To solve this I’d probably use uncons twice - once for the kind, as you already have, and again for the exec_decoder/lib_decoder.
You could define a let operator for uncons to make this look nicer. I’d add this to Decoders but I’m hesitant to add a whole barrage of cryptic operators.
Something like this:
let ( let*:: ) x f = uncons f x
let nil =
list value >>= function
| [] -> succeed ()
| _ -> fail "expected an empty list"
let entry_decoder kind =
match kind with
| "executables" ->
let+ v = exec_decoder in
Some v
| "library" ->
let+ v = lib_decoder in
Some v
| _ ->
succeed None
let entry_decoder =
(* pop the first element off the list *)
let*:: kind = string in
(* now pop the second *)
let*:: entry = entry_decoder kind in
(* optional - assert we have nothing left to decode *)
let+ () = nil in
entry
let entries_decoder =
list_filter entry_decoder |> map List.flatten
Note there is still a List.flatten. This is not due to Decoders, but due to the shape of the sexp and the shape of the desired result. In the source sexp the separate executables and library stanzas contain lists that we just want to concat together.
Hi @jnavila : I think the combinators you are looking for are variant, field and repeat_full_list. I pushed your example as a test on the repository.
And I also agree that it is a pity that gitlab.inria.fr is not open by default to external contributors. I can ask to open an account for you, if this is what you want.
I see thanks (technically there’s a version available through the b0 package but don’t use it) – given the way you wrote your message I thought you had hit some kind of expressiveness issue.
PPX is great where the producer and consumer are both under your control, in an internal codebase. For everything else, I want the expressivity of writing parsing code.
Decoders tries to make this as easy as possible. Often, composing decoders is mechanical and mirrors the shape of your types. But where it doesn’t, it’s easy to adjust the decoder, and the adjustment is transparent to other people reading your code.
For example, handling versioned data is trivial with decoders. Just use one_of: try the latest version first, and fallback to the old version if it fails. Or, if you’re lucky enough to have a version field in your data, decode it, and switch on it to choose how to decode the rest.
In the current released version (0.7.0), Decoders tries to treat everything as being shaped like JSON. As such, it has to make a decision on how “objects” are represented in S-expressions (it follows Dune - see the note at the top of Decoders_sexplib.Decode · decoders-sexplib 0.7.0 · OCaml Packages).
In the next (unreleased) version, we are exposing a lower-level ('i, 'o) Decoder.t type (see decoder.mli). This is useful wherever you are decoding a type 'i into a type 'o with some possibility of error. I hope this will pave the way for Decoders interfaces to non-JSON-like formats, such as XML (see xml.ml).
Another tip: since there is nothing in here specific to S-expressions, you could write it as a functor using the Decoders.Decode.S interface:
module Decode(D : Decoders.Decode.S) = struct
open D
...
end
Then you can instantiate it with module Sexp_decode = Decode(Decoders_sexplib.Decode).
The benefit is you can now decode JSON, CBOR, msgpck, YAML, of the same shape for free.
Might not be all that useful for your use case, but we use this pattern a lot so we can instantiate our backend decoders with Decoders_yojson.Basic.Decode and our frontend decoders with Decoders_bs.Decode (for Bucklescript/Melange).
If you’re writing a library, it also leaves your users free to chose their favorite ocaml JSON library (Yojson, Jsonm, jsonaf, etc).
Yes, this can be used to skip some fields, but the reason I’m not using it is that I find it more useful to explicitly capture and ignore all fields. This acts as a safety net when parsing responses from something that might change its schema (a bit like warning 9), and gives an opportunity to quickly change the type of the field when realizing later you need it.
Hi, years ago I was not able to use Jane-St’s solution (I didn’t found any example of use, I didn’t found how to use it to read s-expression the same way than XML from the .mli, and no one answered me when I asked on the forum/list about it), so I made my own solution:
Personally I just pattern match it when there are only 1 or 2 parameters, in your example it will be:
| Expr [Atom "circle";
(Expr [Atom "center"; Atom cx; Atom cy]);
(Expr [Atom "radius"; Atom radius]) ] ->
(* convert cx, cy and radius to int or float here, and use it *)
If there are more than 2 parameters and if I want that these parameters can be provided in any order, I use getters like:
| Expr (Atom "circle" :: circ_attrs) ->
let cx, cy = get_circle_center circ_attrs in
(* convert cx, cy and radius to int or float here, and use it *)
I would also simplify your input (stroke (width 0.2032)) into (stroke_width 0.2032), which is what we have in CSS and SVG.
Also all your primitives “circle”, “arc”, “polyline” are at the same level, which means that you can just process all these in a very simple way with any iterator from the List module of the stdlib.