Topiary is a formatter for OCaml and a universal formatting engine. It is a tool in the Tree-sitter ecosystem, designed for formatter authors and formatter users. Authors can create a formatter without having to write their own engine or even their own parser. Users benefit from uniform code style and the convenience of using a single formatter tool across multiple languages. Topiary is written in Rust and developed by Tweag.
What’s new?
From an OCaml user’s perspective, this release mostly includes the following changes:
You can try Topiary without installing it with our online playground. Topiary is also available via OPAM so you can simply rely on opam install topiary. If you are curious as to how we packaged a Rust project in OPAM, we have just the blog post. Nix users, you can find Topiary in nixpkgs or rely on the flake github:tweag/topiary. Finally, since this version, statically-linked binaries are available for download from the GitHub release.
We hope you like it; do try, report issues, ask for features, or tell us your love!
I try to understand the design of those tools, so here is what I could gather by skimming the documentation and som ecode.
Topiary works by having users specify rules that match on Concrete Syntax Tree (CST) fragments and emit “formatting tokens” around those fragments. This is expressive (any user of the tool can define their own formatting rule, and there are elaborate cascade rule to override user-global or project-local rules locally), relatively high-level, but the “formatting token” language is arguably ugly and ad-hoc. (It is unsurprising if you are familiar with pretty-printing libraries, but of limited expressivity and shoehorned in an ugly TOML syntax.) The formatters are also closely tied to the tree-sitter grammar, so watch out for large formatter updates if the tree-sitter grammar changes. (For example, it looks like OCaml tuples are currently parsed as a right-leaning tree rather than a proper sequence, but this could change overnight.)
ocamlformat works by compiling the compiler AST into Format formatting boxes. It is less configurable, in particular users must choose between a fixed number of configuration options instead of being able to override formatting rules, but is more expressive and better able to accommodate special cases (which is a good and also a bad thing).
ocp-indent works by processing the token stream directly and inserting indentation based on an approximation of the corresponding parsing state. (It is only trying to indent, not format, so it chooses indentation level but never inserts new line breaks.) The code to indent e1; e2; ...; en is there, but the comparison with the other two cannot be generalized as other things are sensibly harder with this design.
I tried comparing topiary and ocamlformat quickly on some simple examples, and the formatting quality seems comparable – but then I would expect some corner cases to have been better tuned with ocamlformat.
The fact that topiary starts from the CST while ocamlformat starts from the AST implies differences in results: by default ocamlformat will print syntactically-equivalent constructs in the same way (for example, a ; b; is going to be printed just as a; b, unless a hackish special case is implemented in the printer), while by default topiary will preserve all non-whitespace tokens in the source program and merely add line breaks and indentation. This makes topiary a more lightweight formatter: it will more often preserve the input style instead of changing it to a tool-implemented style.
In fact ocamlformat is becoming more and more cst-ised. Not everything is implemented, but a lot of constructs use cst-like nodes instead of dirty hacks nowadays.