Pretty printer for custom data types best practices?

I see a lot of libraries defining custom pretty printers for complex values such that these values can be easily printed by clients. Is there a blog post or article that explains and encourages this? Given that Format is not the easiest module to understand I would find such a resource helpful. For example, if you have a data type representing HTTP requests, how do you add a pretty printer for it and how do you use that as a client?

5 Likes

Format is … painful. Fmt’s combinators make that much, much easier. For anythingt that doesn’t have a defined wireline representation (so, not HTTP) I think writing pretty-printers with Fmt is the way to go. Failing that, generate one with ppx_deriving.show.

I’m not aware of any particular resource for this. Having tried to write something similar in the past, I’m aware that it’s difficult to give a concise / readable explanation that is appropriate for a broad audience. If someone could do that convincingly, I think it’d make an excellent section of Real World OCaml.

Having said that, I’m happy to regurgitate some of my thoughts on this matter:


Pretty-printers for an abstract type

The “standard” pretty-printer for an abstract type takes a formatter and a value to be formatted, returning unit:

type 'a pretty_printer = Format.formatter -> 'a -> unit

Format comes with stock pretty-printers of this type (with names of the form pp_print_<foo>) but since Format doesn’t define the pretty_printer type explicitly it’s a bit folkloric. (The fmt library defines this type as Fmt.t.) So a given type t will often come with an accompanying value pp of type t pretty_printer:

val pp : t pretty_printer

(* or, for polymorphic containers ['a t]: *)
val pp : 'a pretty_printer -> 'a t pretty_printer

(pp is sometimes called pp_dump in order to emphasise that the format is not stable or that it contains debugging information that would otherwise be hidden.)

Using pretty printers

These pretty printers can be used directly as functions, which is sometimes useful when all you want to do is pretty-print a value, either by itself or part of a sequence of print statements:

let () =
  Format.pp_print_string Format.std_formatter "This is my value: ";
  Library.pp Format.std_formatter Library.value;
  Format.pp_print_flush Format.std_formatter ()

It’s rare to use pretty-printers in their raw form like this, though. The real reason that pretty_printer is the conventional type of pretty-printers is that it works well with the “%a” conversion specification, which expects a pretty_printer of values of some type and a value of that type:

let () =
  Format.printf "This is my value: %a%!" Library.pp Library.value

Defining pretty-printers by hand

The best way to write a pretty-printer is to not bother and let a PPX such as ppx_deriving.show generate one for you. If you have too much time on your hands, there are broadly two ways to define pretty-printers manually:

  • function-level; using a library of combinators such as fmt to do point-free composition of pretty-printers. If you do use fmt, one useful tip is that Fmt.Dump contains useful high-level combinators for containers and records.

  • value-level; writing a function that takes a formatter and a value and makes one-or-more calls to Format.fprintf with that formatter. Here the choice of implememtation is up to you, but here’s a standard approach for records and variants:

type record = { foo : string; bar : int; baz : abstract }
type variant = Foo of string | Bar of int | Baz of abstract

let pp_record ppf r =
  Format.fprintf ppf "{ foo: %s; bar: %d; baz: %a }" r.foo r.bar pp_abstract
    r.baz

let pp_variant ppf = function
  | Foo s -> Format.fprintf ppf "Foo %s" s
  | Bar i -> Format.fprintf ppf "Bar %d" i
  | Baz a -> Format.fprintf ppf "Baz %a" pp_abstract a

or, with breakable spaces and boxes around recursively-invoked pretty-printers:

let pp_record ppf r =
  Format.fprintf ppf "{@ foo:@ %s;@ bar:@ %d;@ baz:@ @[%a@]@ }" r.foo r.bar
    pp_abstract r.baz

let pp_variant ppf = function
  | Foo s -> Format.fprintf ppf "Foo@ %s" s
  | Bar i -> Format.fprintf ppf "Bar@ %d" i
  | Baz a -> Format.fprintf ppf "Baz@ @[%a@]" pp_abstract a

Without going into the details of Format: using breaks and boxes helps your pretty-printers compose more gracefully, but this composability has its limits.

Isn’t this fundamentally backwards?

Given the complexity of Format and the resulting potential for inconsistencies and misuse of boxes and breaks, this convention of defining pretty-printers via Format clearly has issues. There are more fundamental problems though; even if you use a PPX to automatically derive them, there’s no getting around the following issues:

  • it’s impossible to add a pretty-printer for an abstract type that you don’t control;

  • the controller of the abstract type must pick the style of the pretty printer (OCaml-esque, JSON-esque, S-expression etc.), which means there’s no hope of achieving ecosystem-wide consistency.

This is in some sense a fundamental limitation of true abstraction, but to plug my own preferred workaround: there’s a more satisfying story for pretty-printers if you embrace run-time type representations and generic programming. If an abstract type comes along with a type representation – a value representing the internal structure of that type – it becomes possible for the consumer of an abstract type to pick the style of pretty-printer that they want:

(** Generic JSON pretty-printer *)
val json_printer_for : 'a Typerep.t -> 'a pretty_printer

module Library : sig
  type t
  val t : t Typerep.t (* ... likely generated by a PPX *)
  val value : t
end

let () =
  Format.fprintf "My library value: %a%!" 
      (json_printer_for Library.t) Library.value

Providing that such type representations are always provided for the types that you care about, it becomes easy to add any number of pretty-printers for those types after the fact. This is the approach that we use in Irmin and related libraries, and we find it works well there.

17 Likes

I think @CraigFe wrote an excellent detailed response. I’ll just add that I do tend to write a lot of printers by hand (with the type Format.formatter -> 'a -> unit) and wrote a blog post some years ago about how I use Format.

6 Likes

the Genprint library uses the compilers’ type representation stored
in .cmt files to print.
it will print abstract types too from installed libraries as long as they
were compiled -bin-annot.
and here is an excerpt from the test file on creating a custom
printer for an abstract type:

  let open Bigarray in
  let open Array1 in
  let a1=create Float32 C_layout 1 in
  set a1 0 99.9;
  (* this is abstract because it resides at the C level *)
  [%pr bigarray abstracted a1];
  let a1print ppf (a1: _ Array1.t) = Format.fprintf ppf "got it...%f" (get a1 0) in
  (* %printer only accepts an identifier, not a closure *)
  [%install_printer a1print];
  (* now uses the printer *)
  [%pr bigarray unmasked a1];
  (* and the value directly *)
  [%pr bigarray content(get a1 0 )];
  (* and remove it... *)
  [%remove_printer a1print];
  [%pr bigarray abstract again a1];

the project repo also has a demo of the library integrated into the debugger.

nb. the availability of .cmt files is a runtime requirement, in-line
with the development focus of the library.

The runtime reflection approach is quite good. Too bad it never got standardized and there’s a bunch of implementations to choose from.

I also think that using Format is fundamentally the wrong thing to use for pp’ing information for debugging. In addition to the problems you’ve pointed out, it’s also the wrong output format when presenting the data outside the console - e.g in debuggers.

In dune, we’ve solved this problem by creating a “universal” dynamic type that one can convert their data structure into:

type t =
  | Opaque
  | Unit
  | Int of int
  | Int64 of int64
  | Bool of bool
  | String of string
  | Bytes of bytes
  | Char of char
  | Float of float
  | Option of t option
  | List of t list
  | Array of t array
  | Tuple of t list
  | Record of (string * t) list
  | Variant of string * t list
  | Map of (t * t) list
  | Set of t list

One has to write manual to_dyn converters everywhere, but at least we get consistent printing everywhere and we don’t need to think about boxes.

I’ve wanted to use this approach in a few other places already, so perhaps I will release this as a separate library.

2 Likes

For the record: you don’t have to use the Format module to write your custom pretty-printers. For simple enough datatypes, a show or to_string function that produces a string representation of your data is good enough already.

The one advantage of pretty-printers with type Format.formatter -> mytype -> unit is that they can be used directly with the #install_printer directive of the toplevel REPL, and with the corresponding command of the ocamldebug debugger. But if you already have a show function, it’s trivial to define:

let pp p x = pp_print_string p (show x)
5 Likes

I’m not a fan of this approach. If I make a type abstract, it’s because I really want to hide the representation. Using the representation to print values of the abstract type will not produce satisfactory results. Take finite maps for instance. You want to display them as maps, e.g.

{ key1 => value1, ... keyN => valueN }

The last thing you want is to display the balanced binary tree or hash table that actually implements the map.

3 Likes

I’ve used JSON this way (playing the same role as dyn does in your code). And a-yup, it’s a lovely way to both get the value of pretty-printing, and eschew all the messiness of (a) choosing how to print something, and (b) dealing with all the messiness of boxes, indentation, whitespace, etc. JSON makes all those choices for you, and they don’t completely suck.

1 Like

Using the representation to print values of the abstract type will not produce satisfactory results. Take finite maps for instance. You want to display them as maps, e.g.

Indeed; I suppose this is a limitation of any “scrap your boilerplate” approach. In Irmin, we have two standard workarounds for this issue:

  • Having a DSL term for isomorphisms between types:

    val invmap : ('a -> 'b) -> ('b -> 'a) -> 'a repr -> 'b repr
    

    which can then be used to provide a more natural “concrete” representation for a type in terms of pretty-printing and serialisation. (In the case of maps, a sorted association list seems sensible.)

  • Allowing individual generic operations on a type to be customised via something analogous to “annotations”:

    val custom : ('a, 'b) operation -> 'b -> 'a repr -> 'a repr
    

    allowing a way to inject more performant or semantically-meaningful behaviour into the generically-derived operations.

Clearly neither of these is perfect: the former has a performance cost and the latter loses at least some of the extensibility that motivates typereps. Regardless, they seem to provide some helpful middle-ground in the boilerplate vs. intentionality trade-off. We have plenty of types in Irmin where the internal structure is not conducive to “default” typereps – particularly caches and hash-consing – but we still find typereps to be a useful tool in filling the gaps.

If I make a type abstract, it’s because I really want to hide the representation.

I agree, but there’s a spectrum here with differing tradeoffs w.r.t. verbosity / encapsulation, and typereps provide an intermediate point in that space. If “total” abstraction of a type has the immediate cost of implementing a set of codecs / comparators / pretty-printers etc. and then piping them through the necessary functors, I’m unlikely to do it until long after a typerep would have come in handy. Perhaps perfection can be the enemy of the good in this regard.

2 Likes

JSON works OK, but now you need to choose an encoding for variants. S-expressions also work fine (as seen in the JST libs), but one still needs to encode things needlessly. Still, it certainly beats working with boxes.

2 Likes

I did something similar to @rgrinberg https://github.com/returntocorp/pfff/blob/develop/commons/OCaml.ml
A type representing OCaml values, and some fonctions to convert values to this universal type.

1 Like

There is a nice tutorial on ocaml.org called Using the Format module. I read it every time I forget how to use boxes.

1 Like

Maybe it’s Stockholm Syndrome, but I think that Format gets more flak than it deserves. Yes, it does take a bit of thought and study to understand the core mental model of how it works, and there are some pointy corners. But it is incredibly powerful and flexible. I highly recommend reading Format Unraveled.

Another point is that I think that the “pretty” part of “pretty printer” is often overlooked. Using Format to implement a uniform serialization format that is human-readable-in-a-pinch is not its strength. For that just derive an sexp. Format excels at writing pretty-printers that work hard to be very readable, using layout to represent structure, and perhaps suppressing representation details in order to avoid information overload for the reader.

Regarding misuse of boxes and breaks, just aiming for standard compositionality goes a long way. In particular, only write a pp function that does not enclose its contents in a box for a strong reason, and avoid individual pp functions that do not emit balanced box opens and closes even more strongly. If you follow those conventions, either manually or by using a combinator library interface such as Containers.Format, then many confusions can be avoided.

8 Likes

I’d just like to add to what @jjb said, that using the most excellent Fmt (package “fmt”) combinators from @dbuenzli makes using the “Format” pretty-printing system much, much, much easier. Did I mention it makes it easier? It’s the same formatter, only with a nicer set of combinators to use it.

It’s 100% true that getting “lovely” output from Fmt, esp. for things like

if (...) then {
 ....
 ....
}

is a PITA, and not at all obvious. But getting usable output, for debugging? It’s a dawdle, a complete dawdle.

P.S. I’ve never looked at Containers.Format; now I’ll have to do so.

In some of my libraries I avoid to impose the Fmt library dependency and then I always end up rewriting an internal mini Fmt. It is a bit of a pain. I even cringe when I have to write M.pp : Format.formatter -> t -> unit instead of M.pp : t Fmt.t in my signatures.

In Format the combinator approach is missing and the names are too long (including the module name for a combinator approach).

I thought more than once about trying to upstream a new Stdlib.Fmt module along the lines of Fmt without the too fancy stuff but I always suspected it would not be well received. And even, if considered, the bikeshedding would likely be too insane for me to cope with.

Besides I often need some of the fancy stuff (but maybe not in libraries), in particular magnitudes dependent formatters so Fmt would likely still need to exist for me.

4 Likes

I cannot emphasize enough, that anybody thinking of writing pretty-printers, should look at Fmt. Just recently, I noticed that there’s a Dump sub-module that has stuff to quote+escape strings:

Fmt.(str "%a" Dump.string "foo\n");;
- : string = "\"foo\\n\""

Too great, too great.

2 Likes

Note that this is not too hard to get directly via format strings:

# Format.sprintf {|"%S"|} "foo\n";;
- : string = "\"\"foo\\n\"\""
3 Likes

grin Weeeell, it was only part of a larger formatting task:

  let wrap_comment pp1 pps x = Fmt.(pf pps "(* %a *)" pp1 x) 

        Fmt.(pf pps "%a := %a%a" AR.pp_hum x.lhs Pp_hum.expr x.rhs_expr
               (option (wrap_comment Dump.string)) x.msg) ;

so the combinator-based approach was a definite win. And this is only a modest example: I use combinators all over, and usually with much more complicated formatting tasks than this.

1 Like

I think fmt and Containers.Format (or CCFormat for the unwrapped module access) are pretty similar, except that fmt has a cleaner API. It’s hard to compete with @dbuenzli on API design :slightly_smiling_face: . A few things like the Dump submodule were clearly stolen inspired from fmt. I think CCFormat is mostly useful if you already depend on containers. I think there’s also a difference in the way color support works (containers supports nested colors, not sure if fmt does).

3 Likes