[ANN] Release of Topiary 0.3.0

Hi everyone,

We are pleased to announce the release of Topiary v0.3.0 – Dreamy Dracaena.

What is Topiary?

Topiary is a formatter for OCaml and a universal formatting engine. It is a tool in the Tree-sitter ecosystem, designed for formatter authors and formatter users. Authors can create a formatter without having to write their own engine or even their own parser. Users benefit from uniform code style and the convenience of using a single formatter tool across multiple languages. Topiary is written in Rust and developed by Tweag.

What’s new?

From an OCaml user’s perspective, this release mostly includes the following changes:

There is more but we will point you to the release, the change log or all the changes for those.

How to get it?

You can try Topiary without installing it with our online playground. Topiary is also available via OPAM so you can simply rely on opam install topiary. If you are curious as to how we packaged a Rust project in OPAM, we have just the blog post. Nix users, you can find Topiary in nixpkgs or rely on the flake github:tweag/topiary. Finally, since this version, statically-linked binaries are available for download from the GitHub release.

We hope you like it; do try, report issues, ask for features, or tell us your love!

Cheers,
The Topiary team.

18 Likes

I try to understand the design of those tools, so here is what I could gather by skimming the documentation and som ecode.

  • Topiary works by having users specify rules that match on Concrete Syntax Tree (CST) fragments and emit “formatting tokens” around those fragments. This is expressive (any user of the tool can define their own formatting rule, and there are elaborate cascade rule to override user-global or project-local rules locally), relatively high-level, but the “formatting token” language is arguably ugly and ad-hoc. (It is unsurprising if you are familiar with pretty-printing libraries, but of limited expressivity and shoehorned in an ugly TOML syntax.) The formatters are also closely tied to the tree-sitter grammar, so watch out for large formatter updates if the tree-sitter grammar changes. (For example, it looks like OCaml tuples are currently parsed as a right-leaning tree rather than a proper sequence, but this could change overnight.)

    The code to format e1; e2; ...; en is there.

  • ocamlformat works by compiling the compiler AST into Format formatting boxes. It is less configurable, in particular users must choose between a fixed number of configuration options instead of being able to override formatting rules, but is more expressive and better able to accommodate special cases (which is a good and also a bad thing).

    The code to format e1; e2; ...; en is at there.

  • ocp-indent works by processing the token stream directly and inserting indentation based on an approximation of the corresponding parsing state. (It is only trying to indent, not format, so it chooses indentation level but never inserts new line breaks.) The code to indent e1; e2; ...; en is there, but the comparison with the other two cannot be generalized as other things are sensibly harder with this design.

I tried comparing topiary and ocamlformat quickly on some simple examples, and the formatting quality seems comparable – but then I would expect some corner cases to have been better tuned with ocamlformat.

The fact that topiary starts from the CST while ocamlformat starts from the AST implies differences in results: by default ocamlformat will print syntactically-equivalent constructs in the same way (for example, a ; b; is going to be printed just as a; b, unless a hackish special case is implemented in the printer), while by default topiary will preserve all non-whitespace tokens in the source program and merely add line breaks and indentation. This makes topiary a more lightweight formatter: it will more often preserve the input style instead of changing it to a tool-implemented style.

4 Likes

In fact ocamlformat is becoming more and more cst-ised. Not everything is implemented, but a lot of constructs use cst-like nodes instead of dirty hacks nowadays.

Very late answer on my end; but a late answer is still better than no answer!

That is true. As far as we know, there does not exist another tool with the same approach yet. There is a lot of exploration on our end as well to add expressivity and then sometimes decide that one particular helper wasn’t the good approach after all. If Topiary were to be successful, I would expect new tools to come up with better languages to implement the same approach. It would also not be hard to define another front-end language for Topiary queries; that could improve the situation but would of course not improve the expressivity.

That is also absolutely true. This is the usual compromise when depending on another library: you save a lot of work because part of it has already been done for you, but you expose yourself to the risk of breaking changes in the library in question. A maintainer of a Topiary query could always decide to maintain their own tree-sitter grammar if they wanted to.¹ Language-specific formatters are better there because they can rely on the language’s own parser and grammar potentially and one can hope that those are more mature and stable and will change less often than a tree-sitter version.

¹: This is not quite true: at the moment, grammars are shipped with Topiary and there is therefore little flexibility unless you re-compile Topiary with your own grammar. It has however always been the goal to be able to de-couple those. The Topiary team would provide the Topiary engine, a library providing the function to format given a grammar and a query which would make it easy to choose exactly which version of a grammar to use with which version of a query, etc. There would still be a Topiary CLI but it would probably just read a configuration file to go get the queries and grammars from Git repositories or something.

I would expect so as well. OCamlFormat got more time to fine tune and get feedback from the community and, because it is based on a more expressive language. The goal of Topiary is not necessarily to replace language-specific formatters so it might never beat OCamlFormat and that’s very fine. Hopefully, though, it will get mature enough to show that the approach is sound and to provide a tool that can just format about everything without having to have language-specific software installed. We mostly see it as a target for “smaller” languages without much manpower that could hack a reasonable formatter and syntax highlighting support in the same ecosystem. Who knows, though, this is just the beginning!

This is not actually true. Topiary takes the tokens from the grammar and prints them, adding space (and soft- vs. hard-newlines and the likes) whenever the query tells it to. With an empty query, you will get all the tokens squashed together. However, Topiary does provide ways to preserve the input style, mostly by providing single-line and multi-line styles: if then input expression holds on one line, Topiary will use its single-line mode and otherwise a multi-line mode. This is independent from line width or whatnot. The belief (coming initially from Ormolu, Tweag’s formatter for Haskell) is that there is no satisfactory way to choose whether to write an expression on one or several lines and that the user knows better.

[  1; 2   ; 3 ;]
{ x= 12  ; y  = "foo";    }

(* will get formatted as: *)

[1; 2; 3]
{ x = 12; y = "foo" }

(* while *)

[  1; 2
;   3 ]

{ x= 12  ; y =
               "foo" }

(* will get formatted as: *)

[
  1;
  2;
  3;
]
{
  x = 12;
  y = "foo";
}

I think this is really a different approach there are up- and downsides to both of those. My mind is not entirely made up on this topic, actually.