In light of the recent thread on, among other things, showing OCaml values, and because of Dream’s long-standing need for this to exist in OCaml, I’ve done, as suggested, a comparison of the available libraries. They seem to fall into three categories.
Libraries that walk the runtime representation of values and dump it.
These provide a 'a -> string functions that can immediately print any value, and are the easiest to use. They are what I need in Dream – something that a normal person can use for debugging without any boilerplate. The accuracy of these libraries is limited because the type information preserved in the runtime is very bare.
There is also Memgraph, but this appears to only output DOT graphs, so I didn’t look into it in detail.
Since I am interested in these for my purposes, I wrote a tester that compares their outputs, and uploaded it into a gist. See the outputs for Console.log, Dum, Inspect. They are all variants of each other. Each has its own quirks and bugs, but they all look roughly like this:
Interestingly (but predictably), for extensible variants like exn, these dumpers are able to print the string representation of variant constructors even with OCaml’s current runtime.
Libraries, such as ppx_deriving, that have a PPX generate, or the user manually provide, information about types – that is, provide helper values that describe types, and then ask the user to provide that information to walk values and dump them.
These are unsuitable for my goals. All of these require the user to pass in the type information to the printing function at each call site, because in the absence of modular implicits or type classes, the compiler cannot automatically associate the type information with the values. They provide, roughly, ('a -> string) -> 'a -> string functions. The user has to provide the 'a -> string for each 'a each time they would like to print an 'a.
If 'a is a “container” type, the required function is a higher-order function that needs additional function(s) for the element types. This is not ergonomic, as each call site where one would like to show a value needs boilerplate. Even if the 'a -> string is precomposed, it requires the user to remember what it is called, pick the right one, and is not resistant to various refactorings such as wrapping in option. But such libraries are accurate, because the type information provided in the boilerplate can be precise.
I didn’t try these out in detail because they are all unsuitable as Console.log-alikes for inspecting OCaml values without excessive boilerplate. As can be seen from their documentation, they all require boilerplate in the form of functions/witnesses/calling the right function, depending on the type, at the place where you’d like to show a value. The last four also require the user to manually build up the type representation using combinators. This is very awkward.
M.show_myfpclass FP_normal (* ppx_deriving: know the function. *)
Refl.show [%refl: (string * int) list] [] ["a", 1; "b", 2];;
(* refl: describe the type. *)
Print.show ~t:nat_t (S (S (S Z)))
(* lrt: provide the type info. *)
(* The docs for tpf are too obscure, but it's the same kind of library. *)
Fmt.str "%a\n" (pp t) { foo = None; bar = [ "foo" ] }
(* repr: build & provide the type info. *)
etc. These kinds of approaches are probably also present in other libraries that do e.g. JSON encoding.
Libraries that use a PPX at the call site to provide what looks like an 'a -> string function as in (1), but try to infer the type of the value being shown and derive its printer as in (2).
These don’t seem to handle separate compilation well, as could be expected, and generally appear fragile.
In my opinion, for the needs I see, the best approach would be runtime printing as in (1) with runtime type information that is accessible through pointers or indices stored in OCaml blocks. I wonder if this is what the LexiFi fork does. @nojb?
For those interested, we did the main part of the comparison on stream.
We don’t attach type information to values directly, as we don’t want to modify the runtime model of OCaml (also, this would only work with heap-allocated values, which would all become larger).
What we do instead is that when a function has a labeled, non-optional argument of type 'a ttype (here 'a ttype is the “type of types” with constructors corresponding to each kind of type in OCaml) and the argument is not passed explicitly, then the compiler synthetizes it at each callsite.
Concretely, if we define a function of the form
let show ~(t: 'a ttype) (x: 'a) : string =
match t with
| Int -> string_of_int x
| String -> x
| ...
And we call it with show 42, the compiler inserts ~t:Int as first argument. For efficiency, type witnesses (the values of type 'a ttype) are actually computed at compilation-time whenever possible.
Wouldn’t the following be compatible with the standard ocamlc compiler and allow for a custom patched compiler that synthesizes 'a ttype correctly?
let show (type a) ?(t : a ttype option) (x: a) : string =
match t with
| None -> Dum.to_string x (* Any printer from @antron's Type 1 *)
| Some Int -> string_of_int x
| Some String -> x
| _ -> "something else"
You mention you have used this for many years in your lexfi ocaml fork. Sounds like it would be robust, well ironed out by now and deals with the various needs that may have arisen over the years.
How about trying to get it into OCaml – have you tried? Was it not accepted?
I see there is a lot of enthusiasm for adding some form of type reflection to OCaml; that’s great! It is true that at LexiFi we have a tried-and-tested system in use for a long time. Let me try to give some perspective about it and answer some of the questions that came up:
The LexiFi patch actually consists of two parts: 1) the representation of types as an OCaml datatype; 2) a patch to the typechecker/middle end to have the compiler automatically generate type witnesses (as sketched above).
It is important to note that 1) is to an extent independent of 2); it comes down to giving a suitable definition of the “type of types”. I understand from past discussion that proposals in this direction would be welcome by the OCaml dev team. Accordingly, one should concentrate for the most part in 1) to make progress.
For historical reasons the LexiFi version of 1) (ie the type representation, see here and here) has a number of quirks. Furthermore, it makes design choices that may not be the best ones in general. For example, it only represents closed types: no type constructors or type variables can be represented, and so in particular neither can exotic types such as GADTs, first-class modules, extensible types, polymorphic variants, etc.
The main challenge in devising a suitable representation of types is deciding how to handle abstract types (see the paper and the slides I linked to in the other thread). At LexiFi abstract types are represented via “global names” (ie we identify an abstract type M.t by its name "M.t"). This works reasonably well in practice, but is not a good solution in general (the notion of “name” for an abstract type is not well-defined). I suspect the answer may be something of a research problem…
LexiFi did discuss upstreaming a version of its fork long time ago (~2011), but I suspect it wasn’t done mainly because of the theoretical shortcomings of the current implementation (eg handling of abstract types).
Accordingly, the LexiFi fork is not open-source: we don’t have the manpower to support it as an open-source project, we don’t want to release a version of this technology which has known limitations that make it easy to shot yourself in the foot if you don’t know what you are doing, and finally there are some commercial considerations to take into account (but my impression is that if this technology was polished enough that it could be accepted upstream, LexiFi would be happy to do so).
Just wanted to add that you can use ppx_repr if you don’t want to build those representation manually (but the API has been designed so it’s easy enough to write the record and variant representations manually).