Hi everyone,
I’ve been working on ocaml-vega-lite, an OCaml representation of Vega-Lite. My priorities for this library are for it to be complete and safe, in the sense that hitting any representable value with to_yojson
will yield a syntactically valid Vega-Lite JSON spec. To the extent possible, I’d also like it to be convenient and ergonomic.
I’m planning on supplementing this relatively low-level library with a higher-level one that supports interactive data exploration from environments like utop and IOCaml.
I have mixed feelings about ocaml-vega-lite as it currently stands, and I would love to get some feedback from the list. On the positive side, I feel that ocaml-vega-lite on track to becoming complete and safe. It’s also usably ergonomic for generating visualizations in programs. The LOC counts for the OCaml in examples/
are comparable to those of the corresponding JSON in test/
. The library’s types are relatively easy to discover in utop.
On the negative side, ocaml-vega-lite is still way too heavy to use for interactive data exploration. Compare the bar chart example:
open VegaLite.V2
type row = {
a : string;
b : int
} [@@deriving yojson]
let dataValues = [
{a = "A"; b = 28}; {a = "B"; b = 55}; {a = "C"; b = 43};
{a = "D"; b = 91}; {a = "E"; b = 81}; {a = "F"; b = 53};
{a = "G"; b = 19}; {a = "H"; b = 87}; {a = "I"; b = 52}
]
let dat = `InlineData InlineData.{
values = `JSONs (List.map row_to_yojson dataValues);
format = None
}
let enc : Encoding.t =
let xf = PositionFieldDef.(make `Ordinal |> field (`String "a")) in
let yf = PositionFieldDef.(make `Quantitative |> field (`String "b")) in
Encoding.(make () |> x (`Field xf) |> y (`Field yf))
let jsonSpec = CompositeUnitSpec.(make (`Mark `Bar)
|> description "A simple bar chart with embedded data."
|> data dat
|> encoding enc
|> to_yojson)
to the concision of Scala’s Vegas:
Vegas("A simple bar chart with embedded data.").
withData(Seq(
Map("a" -> "A", "b" -> 28), Map("a" -> "B", "b" -> 55), Map("a" -> "C", "b" -> 43),
Map("a" -> "D", "b" -> 91), Map("a" -> "E", "b" -> 81), Map("a" -> "F", "b" -> 53),
Map("a" -> "G", "b" -> 19), Map("a" -> "H", "b" -> 87), Map("a" -> "I", "b" -> 52)
)).
encodeX("a", Ordinal).
encodeY("b", Quantitative).
mark(Bar).
show
I can add a convenience layer in the higher-level library I mentioned; but right now that layer has a wide ergonomics gap to cover, which will mean lots of code and probably lots of ad-hoc decisions on my part. I’d love to find a way get ocaml-vega-lite to be a bit lighter and more ergonomic without sacrificing completeness or safety.
To illustrate the challenges I’ve encountered, consider the “x” field of Encoding.t. In pseudocode, the possible shapes of this value are:
| `Field of (PositionFieldDef.t = {
type_: Type.t;
timeUnit: TimeUnit.t option;
stack: StackOffset.t option;
sort: [ `Field of VegaLite.V2.SortField.t | `Order of VegaLite.V2.SortOrder.t ]
scale: Scale.t option;
field: [ | `Repeat of string | `String of string] option;
bin : [ `Bool of bool | `Params of VegaLite.V2.BinParams.t ] option;
axis : VegaLite.V2.Axis.t option;
aggregate : [| `Mean | `Median | `Stdev | ... ] option;
})
| `Value of [| `Bool of bool | `Float of float | `String of string | `Int of int]
Currently, ocaml-vega-lite makes you think about a lot of details when constructing this type:
enc |> x (`Field PositionFieldDef.(make `Ordinal |> field (`String "a")))
By contrast, Vegas lets you say encodeX("a", Ordinal)
. If I understand correctly, there are two pieces that make this concision possible:
-
Ad-hoc polymorphism. If you said
encodeX(1.0)
, Vegas would know that the1.0
means(`Value (`Float 1.0))
. This isn’t possible in OCaml, but in theory one could have something likeencodeX(`String ("a", `Ordinal))
andencodeX(`Float 1.0)
. -
A little bit of opinionation on the part of Vegas. In
encodeX("a")
, the"a"
could mean- In the Field variant of PositionFieldDef,
- Either the Repeat or String variant of field
- scale.scheme or scale.range
- axis.title, axis.titleAlign or axis.format
- The Value variant of PositionFieldDef
It looks like Vegas decides that, since “a” is a column in the data, it should be interpreted as the String variant of field. This type of decision would arguably be more at home in the higher-level library.
- In the Field variant of PositionFieldDef,
Any thoughts on how to lighten ocaml-vega-lite up a bit, or on anything else about the library, would be welcome.
Cheers,
Anand