That is a great observation!
For what it’s worth, what you described is exactly the interface that I initially wanted. And I am not totally happy with the current interface either. It’s in the current form mainly because I failed to appease the OCaml type system, but also because of other various reasons.
In the rest of this post, let’s say that the cost factory to be used for pretty printing consists of a type t
along with operations on t
.
The cost
document
One feature in this OCaml implementation (but not in the paper) is the cost
document. This feature allows you to add an arbitrarty cost of type t
to an attached document.
A consequence of this feature is that the AST must be parameterized by t
(unless the current interface is used). But structuring it this way causes a lot of problems related to weak polymorphism, and I gave up after fighting with it for a while.
Performance engineering
Even without the cost document, there’s still a reason to use the current interface anyway: performance engineering.
The pretty printing algorithm is roughly a dynamic programming on a directed acyclic graph. The straightforward approach is to use a hash table for the dynamic programming table, which maps a node to a collection of t
. This approach has two downsides.
- The lookup cost on the hash table is very high.
- Subsequent pretty-printing on a slight modification of the document must start from scratch.[*]
Instead, my approach moves the dynamic programming intermediate values into the nodes. I.e., it uses the nodes themselves as a dynamic programming table[**]. This helps with the lookup cost issue, and because the values are stuffed into the AST, subsequent pretty-printing that recycles parts of the existing document will also be incremental: the previously computed values are automatically reused. This can be useful for a code formatting server that needs to re-format a slightly different AST after an edit.
But as a result, my approach creates a dependency between the cost factory and the AST, and the current interface is a natural fit for this.
[*] I guess the pretty printer could return a dynamic programming table as another result to be used for future computation, but I suspect this way is going to be tedious, and it doesn’t help with the lookup cost issue.
[**]: In this approach, the GC cannot free dynamic programming values for a node until the node itself can be collected. This could be a good thing or bad thing depending on whether you want to do the incremental computation afterward.
Partial evaluation
Related to the cost
document is partial evaluation on cost
document. A document cost c1 (cost c2 d)
is semantically equivalent to cost (combine c1 c2) d
, and the pretty printer should be able to partially evaluate the former to the latter.
A natural site for partial evaluation is the smart constructors for the document type. However, since combine
is an operation from the cost factory, it is not possible to perform the reduction without knowing the cost factory in advance.
If we don’t want to provide a cost factory in advance, we can workaround by making partial evaluation a part of pretty_print
instead. But that also means that on incremental computation, we need to do the same partial evaluation over and over again. I don’t think it’s a big deal, but the current interface seems neater in this sense.
Final thoughts
I am pretty positive that all issues I mentioned above can be fixed. It’s just that I am not (yet) proficient with OCaml—this is my first OCaml project after all. But yeah, I do want to give it another try to improve the interface, and would love any help or guidance, especially regarding the weak polymorphism issue.