[ANN] OCamlformat 0.14.0

OK! I’ll check if we can tweak our CI pipeline to help with this :slight_smile:

1 Like

This upgrade is likely to generate a huge diff on projects that use the default profile, so I would like to expand a bit on the reason.

According to the syntax rules used by the ocaml tools (the ocaml compilers, ocamldoc, odoc), it is always possible to put the doc-comment before an item.

Some teams prefer to put the documentation after. But that is not always possible. For example, type t = A | B (** doc *) will attach the doc-comment to B, not to t. The only way to attach the comment to t is by putting the comment before.

Enter ocamlformat: doc-comment placement is controlled by an option with two values, before or after. before will always place the comment before. after determines if it is possible to put the comment after, and if it is not, will put it before.

Some items cannot have comments after, like variant types (as described above). But there is another reason not to put comments after. In some cases, that can put the comment far from the thing it is documenting. Considering modules, the following is nice:

module M = L.M
(** doc *)

But this is not great is the structure is large:

module M = struct
  ...
  ...
end
(** doc *)

To summarize, when ocamlformat is configured to put comments after, it has to follow a complex heuristic to determine whether it has to fallback to before. In the case of a module, it depends on its shape, how many functor arguments are there, this kind of things (for various reasons, we don’t know how large something is going to be in advance, so we have to look at its shape). The point is that it is complicated to understand and explain, and that fixing it always makes it more complex. Another aspect is that in the end, we want ocamlformat to be pretty stable when it reaches 1.0.0, and complex rules are at odds with this goal.

So, we have decided to simplify the rule: instead of looking deep in the AST, we just look at the kind of item this is. For val and external items, it is always possible to put the doc-comment after, so we follow exactly what the configuration option says.

As a user of the default profile, what this means for you: for items that are not val or external, and considered “simple” by the 0.13.0 heuristic, doc-comments are going to move from after to before.

Based on these reasons, you will understand that before is always simpler. You can opt into this by setting doc-comments-val=before. This will cause an even larger diff as all items are going to move before (that is: all items described just above, plus val and external items), but the rule gets extremely simple (everything is put before). It is possible that this option will become the default in the future, but we have not decided this yet (in this case, if you did not opt into it, you will see comments on val and external items move at that time).

3 Likes

This is false, you can add an empty documentation comment to B:

type t = A | B (**)
(** doc for t *)
2 Likes

Is there any rule I can set that will always keep the doc comment after the item, regardless of any heuristic?

TIL, thanks!

No, that’s not possible. Allowing that could change the meaning of your program by attaching doc-comments to a different place. (ocamlformat checks at the end that the output AST is equivalent to the input one, and exits with an error if that’s the case)

Thanks. Btw, I guess you mean it exits with an error if the input and output ASTs are not equivalent?

Oops, yes, sorry! we also exit with an error when we forget to output a comment that was present in the source file.

1 Like

Would it be possible to get an option to ignore the source file formatting?

1 Like

Do you mean, for example, enforcing only hexadecimal escapes and forbidding octal ones?

I’d say that this kind of thing is more a linter’s job than a formatter’s. So it would be better to let other tools do this kind of transformation. A linter can be smarter than a formatter, and for example allow octal numbers only for file modes, or require underscores in int literals larger than a certain size.

We want OCamlformat 1.0 to be fairly stable, so it needs to have a clear scope. Formatting is already a pretty complex topic, so if we can make it do that one thing well it’s going to be great!

1 Like

In part, but also functor notation, ppx notatation, essentially any point where there are multiple equivalent ways of representing something, I’d like the formatter to always output the same one.

If that’s not in the direction you want the the tool to go in, that’s fine.

1 Like

If we take ppx notation as an example (let%ext ... vs [%ext let...]), when there are two choices this is natural to expose that as an option, such as sugar and extension. But different preprocessors may have a different preferred concrete syntax: for example I think that it’s more common to use the let%bind form with ppx_let and [%expr ...] with metaquot, so it’s likely to cause disagreement with what option is the right one to use in a given codebase.

There are two techniques we’ve been using to improve these situation: the first one is trying to find a heuristic that works for most cases. In that case, that might be having a list of extension names and preferring sugar for bind and extension for expr. The other one is to preserve the concrete syntax, and delegate the decision to the user or to another part of the pipeline.

Dune has already some support to use external tools that can push changes to source files (that’s how expect-style tests or linters work). In the future we’d like to improve the situation so that you can seamlessly configure a linting pipeline that applies some linting rules with corrections, and that can get fed into ocamlformat before promotion.

Under that architecture I think it makes more sense to go to a direction where these rules are enforced out of ocamlformat. Maybe we can an example repository would be useful for people that want to use this kind of workflow.

That doesn’t seem to work, in ocamlformat profile if I do this all my comments go back to after.

1 Like

From looking at the code, this is unexpected. The ocamlformat default is set as before in the code, and the corresponding PR shows the effect of the change on OCamlformat itself. I’d double-check and consider submitting an issue :slightly_smiling_face:

Indeed, that is weird. I saw this on the Coq repos , I dunno if I did some silly mistake, removing it here:

This is consistent with the longer explanation here.

cc @emillo: probably something to fix in your OP

That plan sounds good to me, except for spacing between module items. How can you consider this as anything else but formatting? I don’t want to have to think whether I separate my declarations by zero, one or two newlines.

Hello,

As described in this thread, ocamlformat 0.14.0 introduced a new algorithm to determine how
documentation comments are placed. We underestimated the impact of making this the default, and this means that many unwanted diffs were present for 0.13.0 -> 0.14.0 upgrades.

We are going to prepare a 0.14.1 release next week reverting this behavior back to the 0.13.0 defaults. Users still on 0.13.0 are encouraged to wait for this and upgrade directly to 0.14.1.

Sorry for the inconvenience, and thanks for the feedback!

2 Likes

Yes, that’s a good point. Bottom line is that spacing between items is not super consistent at the moment (see Feature request: Simpler spacing heuristics for modules · Issue #1253 · ocaml-ppx/ocamlformat · GitHub for a recent example), so we’d like to improve that area. If we can find simple rule that works, that’s even better!

1 Like

Personally I do (at least between 0 or 1). I can’t imagine LaTeX choosing how to gather my sentences into paragraphs or splitting each of my paragraphs into one sentences paragraphs.

In module implementation I very often group one-liner definitions according to their relatedness. ocamlformat insisting on putting a space between these one-liners is one of the reasons among others I’m not using it, it worsens legibility by lack of compactness.

Basically the rule I would like is: after a one-liner definition don’t do anything special except collapsing blank lines to a single one if there is more than one following.

2 Likes

As Etienne mentioned, we have released OCamlformat 0.14.1, reverting the change to the defaults and our plans to deprecate the doc-comments option.

For projects that already upgraded to 0.14.0 (eg. Coq), the doc-comments option will change its meaning again. It is necessary to add doc-comments=before to have the documentation comments placed before.
Moreover, the new option doc-comments-val added in 0.14.0 has a higher precedence than doc-comments, even when it’s not set. It is thus necessary to set them both to before to have the old “before” behavior.
This will be improved in the next release (see https://github.com/ocaml-ppx/ocamlformat/pull/1340).

Thank you to our early adopters to bear with us. We are improving our release process to reduce confusion for the next updates. As usual, if you have any feedback, please open an issue on https://github.com/ocaml-ppx/ocamlformat to discuss it with us.

2 Likes