[ANN] First release of pretty_expressive: A Pretty Expressive Printer

Hi everyone!

I am happy to announce the release of pretty_expressive, an implementation of A Pretty Expressive Printer (OOPSLA’23).

Unlike other pretty printers in the OCaml ecosystem which are mostly focused on the OCaml style (as far as I can tell), pretty_expressive is general-purpose, making it suitable for formatting various styles (the pretty printer was originally created to format programs in the Racket language). pretty_expressive is further distinguished from other general-purpose pretty printers (e.g., Hughes’, Wadler’s, Bernardy’s) by its expressiveness and optimality.

This is also my first OCaml project (not counting Reason). Any feedback is welcome!

Thanks,
Sorawee

12 Likes

That look very nice.
I have a small nitpick. You seem to use <> to concatenate documents, but <> is the “not equal operator” in OCaml, which means that you cannot use it while your library is opened. Maybe a better choice would be ^^, because ^ is the string concatenation operator. Francois Pottier’s PPrint made this choice and I think its quite nice.

4 Likes

Looks interesting! Have you made, and is it worth to, a comparison with François Pottier’s pprint? Citing its documentation: The document language and the printing engine are inspired by Daan Leijen’s wl-pprint library, which itself is based on the ideas developed by Philip Wadler in the paper A Prettier Printer. This Haskell library exploits laziness to achieve a very low memory requirement: the entire document never needs to reside in memory. PPrint achieves greater simplicity and possibly higher throughput by requiring the entire document to be built in memory before it is printed.

Sounds like we’re going to need a set of benchmarks to compare all these pretty-printer libraries in practice :slightly_smiling_face:

@EmileTrotignon Thank you for the suggestion! This is fixed in core: rename <> to ^^ · sorawee/pretty-expressive-ocaml@7274ee7 · GitHub. The <> operator is still provided in a separate functor, in case anyone wants to follow the notation in the paper closely, but the default Make functor now provides ^^ instead.

1 Like

We (the authors of the paper) are actually corresponding with François right now! Yes, PPrint is essentially a Wadler-style pretty printer. The paper details differences between Wadler-style pretty printers and ours. In short: PPrint, which uses a greedy algorithm, is going to be faster, at the cost of not being as optimal/pretty and not being as expressive.

Here’s a concrete example to illustrate the non-optimality issue, taken from another paper A pretty but not greedy printer (Jean-Philippe Bernardy, ICFP’17): we would like to pretty print the following S-expression with the page width limit of 20.

((abcde ((a b c d) (a b c d) (a b c d) (a b c d)))
 (abcdefgh ((a b c d) (a b c d) (a b c d) (a b c d))))

Following is a program that runs both PPrint and pretty_expressive

open PPrint
open Pretty_expressive

type sexpr = Atom of string | List of sexpr list

let abcd = List [Atom "a"; Atom "b"; Atom "c"; Atom "d"]
let abcd4 = List [abcd; abcd; abcd; abcd]

let ex = List [ List [Atom "abcde"; abcd4]
              ; List [Atom "abcdefgh"; abcd4]
              ]

let print_pprint s w =
  let rec pretty s =
    match s with
    | Atom s -> string s
    | List l ->
      string "(" ^^
      align (separate (group (break 1)) (List.map pretty l)) ^^
      string ")" in
  ToChannel.pretty 1. w Stdlib.stdout (pretty s)

let print_pretty_expressive s w =
  let cf = Printer.default_cost_factory ~page_width:w () in
  let module P = Printer.Make (val cf) in
  let open P in

  let separate s = fold_doc (fun d1 d2 -> d1 ^^ s ^^ d2) in

  let rec pretty s =
    match s with
    | Atom s -> text s
    | List l ->
      text "(" ^^
      align (separate (group nl) (List.map pretty l)) ^^
      text ")"
  in
  pretty_print (pretty s) |> print_endline

let () = print_endline "1234567890123456789012345"
let () = print_endline "(* PPrint *)"
let () = print_pprint ex 20
let () = print_endline "\n"
let () = print_endline "(* Pretty_expressive *)"
let () = print_pretty_expressive ex 20

which outputs (I manually inserted the vertical bars to show the width limit):

12345678901234567890|12345
(* PPrint *)
((abcde ((a b c d) (|a
                    |b
                    |c
                    |d)
         (a b c d) (|a
                    |b
                    |c
                    |d)))
 (abcdefgh ((a b c d|)
            (a b c d|)
            (a b c d|)
            (a b c d|))))

(* Pretty_expressive *)
((abcde ((a b c d)  |
         (a b c d)  |
         (a b c d)  |
         (a b c d)))|
 (abcdefgh          |
  ((a b c d)        |
   (a b c d)        |
   (a b c d)        |
   (a b c d))))     |

As you can see, pretty_expressive stays within the specified page width limit whenever possible, and it also minimizes the number of lines. PPrint does neither.

The above example might be a bit “unfair” to PPrint, because we use the align combinator, which is well-known to not get along well with greedy algorithms. Without align, the output of PPrint should be more reasonable (though still not always optimal). Then again, users of a pretty printer might not be able to avoid using align (e.g., OCaml and Racket code has alignment everywhere).

8 Likes

Our paper does have a set of benchmarks, and we use it to compare our implementation, wl-pprint (Wadler/Leijen’s pretty printer, which is what François’s pretty printer is based on) in Haskell, and pprint-compact (Bernardy’s pretty printer) in Haskell. The result is that pretty_expressive is prettier and easier to use, but slower.

This is impressive work. I look forward to trying the library out!

Regarding the interface, one bit of feedback is that instead of binding all possible settings when the Printer functor and the CostFactory are instantiated, some settings could be left to the final caller of the pretty_print function.

To give a concrete example: currently if I want to build a doc that can be rendered with variable page_width, then any code for manipulating docs must be functorized to accept a Printer instance, and functors are heavyweight enough that I try not to use them excessively.

One alternative would be for the CostFactory to contain a second user-defined type that contains settings to be specified at printing time, and then pretty_print could take an instance of that type and pass it along to the relevant CostFactory functions.

2 Likes

That is a great observation!

For what it’s worth, what you described is exactly the interface that I initially wanted. And I am not totally happy with the current interface either. It’s in the current form mainly because I failed to appease the OCaml type system, but also because of other various reasons.

In the rest of this post, let’s say that the cost factory to be used for pretty printing consists of a type t along with operations on t.

The cost document

One feature in this OCaml implementation (but not in the paper) is the cost document. This feature allows you to add an arbitrarty cost of type t to an attached document.

A consequence of this feature is that the AST must be parameterized by t (unless the current interface is used). But structuring it this way causes a lot of problems related to weak polymorphism, and I gave up after fighting with it for a while.

Performance engineering

Even without the cost document, there’s still a reason to use the current interface anyway: performance engineering.

The pretty printing algorithm is roughly a dynamic programming on a directed acyclic graph. The straightforward approach is to use a hash table for the dynamic programming table, which maps a node to a collection of t. This approach has two downsides.

  • The lookup cost on the hash table is very high.
  • Subsequent pretty-printing on a slight modification of the document must start from scratch.[*]

Instead, my approach moves the dynamic programming intermediate values into the nodes. I.e., it uses the nodes themselves as a dynamic programming table[**]. This helps with the lookup cost issue, and because the values are stuffed into the AST, subsequent pretty-printing that recycles parts of the existing document will also be incremental: the previously computed values are automatically reused. This can be useful for a code formatting server that needs to re-format a slightly different AST after an edit.

But as a result, my approach creates a dependency between the cost factory and the AST, and the current interface is a natural fit for this.

[*] I guess the pretty printer could return a dynamic programming table as another result to be used for future computation, but I suspect this way is going to be tedious, and it doesn’t help with the lookup cost issue.

[**]: In this approach, the GC cannot free dynamic programming values for a node until the node itself can be collected. This could be a good thing or bad thing depending on whether you want to do the incremental computation afterward.

Partial evaluation

Related to the cost document is partial evaluation on cost document. A document cost c1 (cost c2 d) is semantically equivalent to cost (combine c1 c2) d, and the pretty printer should be able to partially evaluate the former to the latter.

A natural site for partial evaluation is the smart constructors for the document type. However, since combine is an operation from the cost factory, it is not possible to perform the reduction without knowing the cost factory in advance.

If we don’t want to provide a cost factory in advance, we can workaround by making partial evaluation a part of pretty_print instead. But that also means that on incremental computation, we need to do the same partial evaluation over and over again. I don’t think it’s a big deal, but the current interface seems neater in this sense.

Final thoughts

I am pretty positive that all issues I mentioned above can be fixed. It’s just that I am not (yet) proficient with OCaml—this is my first OCaml project after all. But yeah, I do want to give it another try to improve the interface, and would love any help or guidance, especially regarding the weak polymorphism issue.

3 Likes

Thanks sorawee for the detailed reply. I see your rationale pretty clearly now.

Regarding the issues with weak polymorphism, if you felt like posting a snippet of the earlier attempt that caused trouble, I’m sure there are folks in this forum who could offer advice. I would just suggest starting a separate topic to ensure visibility.

Best of luck on this cool project!