[ANN] OCamlformat 0.14.0

emillon · April 3, 2020, 8:35am

On behalf of the development team, I’d like to announce the release of ocamlformat version 0.14.0 .

Here are the main highlights of this release:

Support for OCaml 4.10

This means both that it compiles and runs using this version, but also that it can format 4.10-specific language features (module _ and multi-indices operators).

Preliminary support for invalid files

As OCamlformat operates on ASTs, it normally requires a valid input file. This release adds a --format-invalid-files option to detect invalid parts and print them verbatim. This feature is still experimental.

Preserving more concrete syntax

Starting with this release, OCamlformat is going to preserve more concrete syntax. For example, module M = functor (K : S) -> struct end and module M (K : S) = struct end are equivalent. In the past, both variants would be formatted as the latter. Now, the original syntax is preserved. In some cases, preserving was possible through the means of an option: for example, to choice between let%name x = e in body and [%name let x = e in body], was controlled by the extension-sugar option. This option is now deprecated and OCamlformat will now always preserve what was in the source file (this was the default behaviour).

Similarly, it was possible to control how special characters are escaped in string and character literals through the escape-strings and escape-chars options. They are being deprecated and the only possible behavior will be preserving the concrete syntax (as done by default).

The reason for this change is that we feel that ocamlformat should be just about formatting. The fact that this behavior was configurable is in part due to the fact that it operates on OCaml ASTs, but end users should not have to be surprised by their code being transformed on reformatting.

In the future, we plan to extend that to other similar constructs, such as using (/) or begin/end, or spacing between module items.

Placement of doc comments

Placing doc comments (** ... *) is controlled by the doc-comments configuration option. It is always possible to put them before the item they refer to, and this is what the doc-comments=before option does. The alternative doc-comments=after will try to do its best to put them after, but in some cases it is not possible. For example, in a variant type declaration, a doc-comment put immediately after will be attached to the last constructor by documentation tools. Ocamlformat needs to preserve the meaning of programs, so in these cases, it will instead put the comment before. In the case of
module declarations, putting the comment after might not be very useful if the corresponding module is very large.

This requires a complex rule to determine which comments will be put before and which comments will be put after. So in this version, we are deprecating this mechanism and replacing it with a simpler one controlled by doc-comments-val that applies only to val and external items. For these items, it is always possible to attach documents before or after them. For all other items, like type or module declarations, the doc comments will consistenly be put before.

Many bugs found by fuzzing

We hooked ocamlformat to AFL, looking for programs that parse correctly but trigger errors during formatting. This approach worked very well and more than 20 logical bugs were found with this
technique.

Upgrading

To upgrade from ocamlformat 0.13.0, one needs to upgrade the ocamlformat binary and replace the version field in .ocamlformat files by 0.14.0 and then:

if you used doc-comments=after, you can replace it by doc-comments-val=after.
This will move doc-comments on module items except val and external ones.
if you used doc-comments=before, you can remove it as it is now the default.
if you set escape-chars=preserve, escape-strings=preserve, or extension-sugar=preserve explicitly, you can remove them safely (they were the default)
if you used another value for one of these options (such as escape-strings=hexadecimal), you will need to remove them as well. This will not trigger a diff, but ocamlformat will not enforce a particular concrete syntax for new code.

A note for new users

We encourage you to try ocamlformat, that can be installed from opam directly (opam install ocamlformat), but please remember that it is still beta software. We added a FAQ for new users that should help you decide if ocamlformat is the right choice for you.

Have a great day!

nojb · April 3, 2020, 8:40am

Thanks for the announcement!

A slightly off-topic question: have you considered making pre-compiled binaries available?

Especially on Windows, ocamlformat is hard to compile due to its many dependencies. Being able to just download a binary would be really convenient!

Cheers,
Nicolás

emillon · April 3, 2020, 8:50am

Hi! One thing we’d like to provide is self-contained source tarballs with vendored dependencies. That would also make sure that you can install ocamlformat in a switch that uses an incompatible version of base, for example. I think that would benefit windows users as well.

nojb · April 3, 2020, 8:59am

Yes, that would help. But binaries would still be great : )

Cheers,
Nicolás

emillon · April 3, 2020, 9:21am

OK! I’ll check if we can tweak our CI pipeline to help with this

emillon · April 3, 2020, 4:21pm

This upgrade is likely to generate a huge diff on projects that use the default profile, so I would like to expand a bit on the reason.

According to the syntax rules used by the ocaml tools (the ocaml compilers, ocamldoc, odoc), it is always possible to put the doc-comment before an item.

Some teams prefer to put the documentation after. But that is not always possible. For example, type t = A | B (** doc *) will attach the doc-comment to B, not to t. The only way to attach the comment to t is by putting the comment before.

Enter ocamlformat: doc-comment placement is controlled by an option with two values, before or after. before will always place the comment before. after determines if it is possible to put the comment after, and if it is not, will put it before.

Some items cannot have comments after, like variant types (as described above). But there is another reason not to put comments after. In some cases, that can put the comment far from the thing it is documenting. Considering modules, the following is nice:

module M = L.M
(** doc *)

But this is not great is the structure is large:

module M = struct
  ...
  ...
end
(** doc *)

To summarize, when ocamlformat is configured to put comments after, it has to follow a complex heuristic to determine whether it has to fallback to before. In the case of a module, it depends on its shape, how many functor arguments are there, this kind of things (for various reasons, we don’t know how large something is going to be in advance, so we have to look at its shape). The point is that it is complicated to understand and explain, and that fixing it always makes it more complex. Another aspect is that in the end, we want ocamlformat to be pretty stable when it reaches 1.0.0, and complex rules are at odds with this goal.

So, we have decided to simplify the rule: instead of looking deep in the AST, we just look at the kind of item this is. For val and external items, it is always possible to put the doc-comment after, so we follow exactly what the configuration option says.

As a user of the default profile, what this means for you: for items that are not val or external, and considered “simple” by the 0.13.0 heuristic, doc-comments are going to move from after to before.

Based on these reasons, you will understand that before is always simpler. You can opt into this by setting doc-comments-val=before. This will cause an even larger diff as all items are going to move before (that is: all items described just above, plus val and external items), but the rule gets extremely simple (everything is put before). It is possible that this option will become the default in the future, but we have not decided this yet (in this case, if you did not opt into it, you will see comments on val and external items move at that time).

octachron · April 3, 2020, 5:02pm

This is false, you can add an empty documentation comment to B:

type t = A | B (**)
(** doc for t *)

yawaramin · April 3, 2020, 6:36pm

Is there any rule I can set that will always keep the doc comment after the item, regardless of any heuristic?

emillon · April 6, 2020, 8:16am

TIL, thanks!

No, that’s not possible. Allowing that could change the meaning of your program by attaching doc-comments to a different place. (ocamlformat checks at the end that the output AST is equivalent to the input one, and exits with an error if that’s the case)

yawaramin · April 6, 2020, 1:39pm

Thanks. Btw, I guess you mean it exits with an error if the input and output ASTs are not equivalent?

emillon · April 6, 2020, 2:30pm

Oops, yes, sorry! we also exit with an error when we forget to output a comment that was present in the source file.

sagra · April 6, 2020, 5:13pm

Would it be possible to get an option to ignore the source file formatting?

emillon · April 6, 2020, 7:00pm

Do you mean, for example, enforcing only hexadecimal escapes and forbidding octal ones?

I’d say that this kind of thing is more a linter’s job than a formatter’s. So it would be better to let other tools do this kind of transformation. A linter can be smarter than a formatter, and for example allow octal numbers only for file modes, or require underscores in int literals larger than a certain size.

We want OCamlformat 1.0 to be fairly stable, so it needs to have a clear scope. Formatting is already a pretty complex topic, so if we can make it do that one thing well it’s going to be great!

sagra · April 6, 2020, 7:41pm

In part, but also functor notation, ppx notatation, essentially any point where there are multiple equivalent ways of representing something, I’d like the formatter to always output the same one.

If that’s not in the direction you want the the tool to go in, that’s fine.

emillon · April 8, 2020, 9:26am

If we take ppx notation as an example (let%ext ... vs [%ext let...]), when there are two choices this is natural to expose that as an option, such as sugar and extension. But different preprocessors may have a different preferred concrete syntax: for example I think that it’s more common to use the let%bind form with ppx_let and [%expr ...] with metaquot, so it’s likely to cause disagreement with what option is the right one to use in a given codebase.

There are two techniques we’ve been using to improve these situation: the first one is trying to find a heuristic that works for most cases. In that case, that might be having a list of extension names and preferring sugar for bind and extension for expr. The other one is to preserve the concrete syntax, and delegate the decision to the user or to another part of the pipeline.

Dune has already some support to use external tools that can push changes to source files (that’s how expect-style tests or linters work). In the future we’d like to improve the situation so that you can seamlessly configure a linting pipeline that applies some linting rules with corrections, and that can get fed into ocamlformat before promotion.

Under that architecture I think it makes more sense to go to a direction where these rules are enforced out of ocamlformat. Maybe we can an example repository would be useful for people that want to use this kind of workflow.

ejgallego · April 8, 2020, 9:31pm

That doesn’t seem to work, in ocamlformat profile if I do this all my comments go back to after.

CraigFe · April 9, 2020, 7:50am

From looking at the code, this is unexpected. The ocamlformat default is set as before in the code, and the corresponding PR shows the effect of the change on OCamlformat itself. I’d double-check and consider submitting an issue

ejgallego · April 10, 2020, 3:05am

Indeed, that is weird. I saw this on the Coq repos , I dunno if I did some silly mistake, removing it here:

github.com

coq/coq/blob/master/.ocamlformat

version=0.13.0
profile=ocamlformat

# to enable a whole directory, put "disable=false" in dir/.ocamlformat
# to enable specific files put them in .ocamlformat-enable
disable=true

module-item-spacing=compact
sequence-style=terminator
cases-exp-indent=2
field-space=loose
exp-grouping=preserve
break-cases=fit
doc-comments=before

Zimm_i48 · April 10, 2020, 9:54am

This is consistent with the longer explanation here.

cc @emillo: probably something to fix in your OP

Zimm_i48 · April 10, 2020, 9:57am

That plan sounds good to me, except for spacing between module items. How can you consider this as anything else but formatting? I don’t want to have to think whether I separate my declarations by zero, one or two newlines.

Topic		Replies	Views
[ANN] Release of OCamlFormat 0.9 Ecosystem ocamlformat	1	1651	April 2, 2019
[ANN] ocamlformat 0.14.2 Ecosystem announce , ocamlformat	0	985	May 12, 2020
The way ocamlformat formats Ecosystem ocamlformat	33	6615	March 8, 2019
[ANN] Release of OCamlFormat 0.10 Ecosystem	1	1437	July 2, 2019
[ANN] OCamlFormat 0.17.0 Ecosystem announce , ocamlformat	0	876	February 16, 2021