ocamlformat requires source code that:
- does not trigger warning 50 (“Unexpected documentation comment.”). For code that triggers warning 50, it is unlikely that ocamlformat will happen to preserve the docstring attachment;
- parses without any preprocessing, using the version of the standard ocaml (not camlp4) parser with which ocamlformat was itself built. Attributes and extension points should be correctly preserved, but other mechanisms such as camlp4, cppo, etc. will not work;
- is either a module implementation (.ml) or interface (.mli) but not a sequence of toplevel phrases (that is, including toplevel directives such as
jbuild files in ocaml syntax should work.
Under those conditions, ocamlformat is intended to run without raising exceptions. But there are bugs, so prior to terminating or modifying any input file, ocamlformat checks that:
- the parsetrees obtained by parsing the original and formatted files are equal up to some minor normalization (see
- the docstrings, and their attachment, has been preserved (implicit in the parsetree check);
- the set of comments in the original and formatted files is the same up to their location.
That is as accurate and precise as I know how to state at the moment. But it does not say anything about all the choices among the technically correct ways to format the concrete syntax. For the choices that have been made, I would not call it a ‘specification’, but a (probably non-exhaustive) list of general guidelines I have tried to follow are:
- legibility, in the sense of making it as hard as possible for quick visual parsing to give the wrong interpretation, if of highest priority;
- whenever possible, the high-level structure of the code should be obvious by looking only at the left margin, in particular, it should not be necessary to visually jump from left to right hunting for critical keywords/tokens/etc.;
- all else equal, compact code is preferable, so do not indent unless it helps legibility, do not insert horizonatal space or open lines within individual value and type definitions, etc.;
- special attention has been given to making some standard syntactic gotchas visually obvious;
- when reformatting code, comments should not move around too much, but some movement seems to be unavoidable;
- an explicit non-goal is to follow any existing set of guidelines, which have all AFAIK been written with human formatters in mind.
But even with some guidelines, a lot of the work has been to get something that is not too ugly, for as wide a variety of coding styles as I have had time to consider. A guideline I have failed to follow is to keep the number of special cases severely limited, there are already enough that it feels like maybe too many.
There is a huge space for subjective and personal preferences here, and it would be great to explore alternatives by adding options to the command line and config files.
It would be even more interesting to see proposals for changes to the output which are objectively better, as opposed to subjectively different.