[ANN] OCamlFormat open-source released

OCamlFormat open-source released

I’m pleased to announce the first public release of OCamlFormat.

OCamlFormat is a tool to automatically format OCaml code. It follows the same basic design as refmt for Reason code, but for OCaml. In particular, it works by parsing source code using the OCaml compiler’s standard parser, deciding where to place comments in the parsetree, and printing the parsetree and comments in a uniform style.

At Facebook, we currently use this for the OCaml code of Infer to enable developers to stop thinking about line breaking, indentation, parenthesization, etc., to minimize stylistic nit-picking during code review, and to make it as visually obvious as possible when the parser’s interpretation of code does not match the programmer’s. We use this both with integration with editors as well as a pre-commit hook.

Development is taking place on github. License is MIT.

See the github page for more info on installation, documentation, contributing, etc.

13 Likes

Is there any specifications available somewhere to know how the code is formatted ?

ocamlformat requires source code that:

  • does not trigger warning 50 (“Unexpected documentation comment.”). For code that triggers warning 50, it is unlikely that ocamlformat will happen to preserve the docstring attachment;
  • parses without any preprocessing, using the version of the standard ocaml (not camlp4) parser with which ocamlformat was itself built. Attributes and extension points should be correctly preserved, but other mechanisms such as camlp4, cppo, etc. will not work;
  • is either a module implementation (.ml) or interface (.mli) but not a sequence of toplevel phrases (that is, including toplevel directives such as #use). jbuild files in ocaml syntax should work.

Under those conditions, ocamlformat is intended to run without raising exceptions. But there are bugs, so prior to terminating or modifying any input file, ocamlformat checks that:

  • the parsetrees obtained by parsing the original and formatted files are equal up to some minor normalization (see Normalize.equal_impl or equal_intf);
  • the docstrings, and their attachment, has been preserved (implicit in the parsetree check);
  • the set of comments in the original and formatted files is the same up to their location.

That is as accurate and precise as I know how to state at the moment. But it does not say anything about all the choices among the technically correct ways to format the concrete syntax. For the choices that have been made, I would not call it a ‘specification’, but a (probably non-exhaustive) list of general guidelines I have tried to follow are:

  • legibility, in the sense of making it as hard as possible for quick visual parsing to give the wrong interpretation, if of highest priority;
  • whenever possible, the high-level structure of the code should be obvious by looking only at the left margin, in particular, it should not be necessary to visually jump from left to right hunting for critical keywords/tokens/etc.;
  • all else equal, compact code is preferable, so do not indent unless it helps legibility, do not insert horizonatal space or open lines within individual value and type definitions, etc.;
  • special attention has been given to making some standard syntactic gotchas visually obvious;
  • when reformatting code, comments should not move around too much, but some movement seems to be unavoidable;
  • an explicit non-goal is to follow any existing set of guidelines, which have all AFAIK been written with human formatters in mind.

But even with some guidelines, a lot of the work has been to get something that is not too ugly, for as wide a variety of coding styles as I have had time to consider. A guideline I have failed to follow is to keep the number of special cases severely limited, there are already enough that it feels like maybe too many.

There is a huge space for subjective and personal preferences here, and it would be great to explore alternatives by adding options to the command line and config files.

It would be even more interesting to see proposals for changes to the output which are objectively better, as opposed to subjectively different.

2 Likes

I’m very much enjoying using it. As someone who just wants a “sensible standard” to follow, this is definitely good enough for me, and I’ll convert my code to it.

The only blocker to mechanising this conversion is the (very sensible) safety check that verifies that the two ASTs are equivalent, which is uncovering some edge case bugs. As soon as those are fixed I’ll continue the code reformatting :slight_smile:

2 Likes

Really neat indeed! Integration to editors are really straightforward. The small blocker in my case is the still lacking support for objects… sadly bucklescript and some libs (like postgresql-ocaml) which I use still depend on it. Really looking forward to the feature, hopefully someone will take it on :wink:

2 Likes

A question since I’m unfamiliar with how it works: rather than throwing (classes not implemented and objects not implemented), would it be possible to just skip / return the value as is?

One possibility would be to use the Pprintast functions from compiler-libs. The problem would be that comments in that code would be dropped, and the final checks would fail. That didn’t seem like a usable solution to me, so I chose to just fail explicitly. I’d be happy with changes to this behavior if it would be useful.

1 Like

OCamlFormat v0.2 has been released and is now available from opam.

v0.2 includes fixes for several assert-due-to-misformatting bugs reported by people who tried v0.1, as well as a few improvements to the generated output.

An opam package is now available after some build/packing refactoring.

Many thanks to everyone who has submitted an issue or pull request!

4 Likes

This is exciting. Thank you!