With dune 1.4 shipping with support for ocamlformat I’m planning to convert some of our internal codebase to use it. The idea is we can stop bikeshedding how things should be indented, formatted etc. which will help with reviews and be easier to onboard people (“just indent as ocamlformat does”). So take all of this not as complaining but as an attempt to get things right.
What I would like to do is to have it output some style that is similar to what we already do, or at least what the community tends to do. I am sure I’m not going to agree with all choices but if ocamlformat gets enough adoption I can adjust.
But when converting one of our non-trivial sized projects, I have run across some cases which are puzzling and don’t think represent “usual” OCaml code style:
Records are formatted as
type t = { hest : int
; giraf : string }
which is maybe a Haskellism, but I’ve never seen anyone define records this way in OCaml. Normally I see
type t = {
hest : int;
giraf : string;
}
with a few variation whether to have a trailing semicolon or not. I’ve heard people argue in Haskell it is nice because lines can be added and removed with minimal changes to lines in diffs but having the curly brackets in the same line makes it worse than before.
A similar thing goes for lists, which again put the separator in front, for no reason I can follow, except to match records. Tuples as well.
The way extension points are added to types is sort of odd, it is just clumped in a new line after the type, without indent.
type role =
| Owner
| Guest
[@@deriving eq, show]
becomes
type role =
| Owner
| Guest
[@@deriving eq, show]
Which does not look attached to that type at all anymore. Similarly with records.
Our files begin with opening Core and/or Async sometimes and then some module aliases. Even in the most sparse setting I can’t get it not to lump these together. This is admittedly nitpicking.
Is there a setting to just group parens a bit more? I don’t particularly like the way it writes ) ) ) ) or puts ) in new lines. Maybe I am too much a Lisper here, but the spaces add nothing to readability for me and I also haven’t seen codebases do a ) ) ) ) space dance.
It seems to introduce training spaces in recursive functions, but I guess this is just a bug and not expected behaviour.
Since I assume that the idea of ocamlformat is to provide a good out-of-the-box experience (similarly to in Clojure the indent style is basically “do how Emacs would indent it”) and not to have every single thing configurable in two dozen ways I would like to start a discussion on how it formats things. There has been oddly little discussion on it so far, despite this forum to be linked in its readme as place to discuss.
I have nothing about reasonable defaults and don’t really care about those, but for me to use it in my code bases it has to be reasonably tweakable and at least be able to agree with the way I have been using ocp-indent over the last years.
I doubt I’m a good a reference point in the space of ocamlformat users since I have been programming in OCaml for the last sixteen years and certainly have developed my own unwritten and rather compact style so it will be harder to adapt to whatever is proposed.
In any case last time I investigated its result on my code bases there were too many things that hurt my typographic eyes to make it usable for me. In particular lines starting with punctuation, spurious spaces around parentheses and some operators, general lack of compactness and poor information density by degenerating to sequences of lines (e.g. notably signatures in .mlis, my brain doesn’t need a single element on a line to be able to scan things).
Aside. I understand that such tools can be a huge benefit on team work. But I also see them with a bit of circumspection. It’s the classical view by computer people that you can solve social problems via a computer program. I mainly see programming as an excercise in human communication in which typesetting and layout have a very important role and I’m not sure a machine is or will always be able to provide the best answer (please spare me the discussion about AI).
We’ve done a lot of work to improve the defaults by our own lights at Jane Street. You can see the results in a number of our packages. Here’s one example:
We mostly fall into the punctuation-on-the-left camp. It’s hardly a settled question, but there are a lot of both styles in OCaml code out there. The argument we’ve had for putting the punctuation on the left is to make it easier to visually scan and understand the structure of the code. Note that this is consistent with the agreed-upon syntax for pattern matches and variant declarations.
I think our patches may fix some of your other issues (like opens and module aliases at the top, handling of parens), but ocamlformat definitely has its flaws. We’ve found that all in, we’re materially happier using it and thereby getting everyone on the same page as to how formatting should work, despite those flaws.
We have plans to get all of our patches upstreamed (some changing the default behavior, some just making it possible to configure ocamlformat as we want it configured internally), but that is a work in progress.
What I would like to do is to have it output some style that is similar to what we already do
You’re making me laugh because you have the standard reaction of anyone who has habits when they see a code formatter working. You want to use a code formatter because of the benefits it brings, but you can’t unless it closely matches your own habits.
Well, most of the things you said you don’t like are done like this with elm-format too. elm-format is used by pretty much everyone in the Elm community. And when some new programmers are unhappy about the formatting style, the standard answer is: get used to it.
And trust me, you do get used to it. I’ve become so much used to it that this is now the style I’m using in my OCaml programs too (even if I agree, I was not taught to put semi-colons at the beginning of lines). So I’m pretty happy in fact that these are the defaults of ocamlformat.
A “common” style will always feel “uncommon” to some. Instead of debating about this, just try using this style and you’ll see you can get used to it pretty quickly.
Aside from the “common”/“uncommon” argument, there’s at least one objective point in @Leonidas comment that I believe deserves discussion: the typesetting of records/lists and the difficulty of re-ordering elements using ocamlformat’s formatting.
This point is not so much about feeling or the aesthetics of the formatting, but about the practicality of manipulating formatted code, which should be the important point. And in this case, ocamlformat’s choice seems to prevent (or make harder than necessary) manipulations that are relatively common.
If you want the tool to be used pervasively you still have to convince existing code bases and practices to switch to it so you should expect debate. Also styles tend to evolve over time.
The situation is quite different for young languages where such formatting tools seem to be nowadays introduced directly with the language itself.
I could get used to all sorts of things. I have become accustomed to migraines for example. However, I see no reason to get used to something that I don’t like, unless of course it is unavoidable because it was imposed upon me by fate.
Therefore, you will have to show me there’s a benefit here. If you want me to voluntarily adopt this style, you will have to convince me to do so of my own volition, and so far, I’ve seen little that convinces me I would like to do so.
We mostly fall into the punctuation-on-the-left camp. It’s hardly a settled question, but there are a lot of both styles in OCaml code out there. The argument we’ve had for putting the punctuation on the left is to make it easier to visually scan and understand the structure of the code. Note that this is consistent with the agreed-upon syntax for pattern matches and variant declarations.
I don’t really agree with that. , and ; are postfix separators (and you can add ; after the last item), whereas | in a normal pattern match is a prefix separator (you can and should add one before the first branch). So I put ; at the end of each line, including the last one, which is the most consistent with putting | before each branch of a match.
I completely agree with @perry and @dbuenzli’s comments above. if you want ocamlformat to take over, you have to get community buy-in. The only way that’s going to happen is if the decisions ocamlformat makes match those made by the majority of code people in the community see and use. Imposing a new style that doesn’t exist and isn’t agreed upon is not going to work and ocamlformat will just remain an esoteric tool used by a small section of the community, as it is now. This is particularly true since we have another tool - ocp-indent - that does satisfy the community’s standards. Since ocp-indent is well accepted, it seems to me that ocamlformat should strive to match ocp-indent wherever possible, and then offer the additional benefits that full parsing offers on top of that.
ocamlformat is able to readocp-indent config files, and they are various efforts to have both tools be compatible, including automatic tests. If you find any difference in the output of the two tools, you are welcome to report an issue on the bug tracker.
I’d like to bring up that in this style, the indentation of the list of record fields is 4 rather than 2. The hypothesis is that a consistent offset between blocks of important material is more readable than otherwise.
module Model = struct
type t =
{ symbol : string
; edge : float
; max_edge : float
; bsize : int
; bid : float
^^
^^^^
Shifting the fields to the left gives us:
module Model = struct
type t =
{ symbol : string
; edge : float
; max_edge : float
; bsize : int
; bid : float
}
^^
^^
It’s also an argument against these styles:
module Model =
struct
...
end
type t =
{
...
}
type u = { foo : int;
bar : float }
Frankly, I strongly prefer this:
module Model = struct
type t = {
symbol : string;
edge : float;
max_edge : float;
bsize : int;
bid : float;
}
end
The main arguments I find relevant when discussing this style are:
Indentation of stuff that matters should be consistently 2 for readability (the argument above).
I have been working in Elm for a couple of years where the agreed style is like:
{ symbol : string
; edge : float
From my experience, having the first element different is quite annoying, anything to do with the first element requires unnecessary extra operations, and sorting is a pain.
In Elm this style is mostly imposed by the compiler as it can’t handle a leading or trailing comma. But in ocaml this is fine.
Here are my 2 cents as a relative newcomer.
I started using ocamlformat from the very beginning, as I wanted to try the idea of “one style to rule them all”. Previously I always disagreed. Now I do agree that we need one style, but I very much disagree with the current ocamlformat style, although I still use it.
I wholeheartedly agree with the common theme in this topic.
The only way I can produce ocamlformat's style is by using ocamlformat. I write it in some other (much closer to what most people here propose) style and then run ocamlformat.
I didn’t find this to be a problem in my experience with elm-format and ocamlformat. I have ocamlformat running every time I save my file and it helps me detaching myself from formatting considerations. This means that I don’t even try to produce something that looks like a reasonable formatting anymore, I let the tool do it for me.