Maintenance bottlenecks in the compiler distribution

Hi everyone,

This is a public announcement that we are experiencing a maintenance bottleneck in the development of the OCaml compiler distribution (the github/ocaml/ocaml repository).

Our development process naturally generates a fair amount of maintenance work to, among other things, discuss and integrate proposed patches, fix bugs, and react to feature requests. We don’t have enough people doing this maintenance work; currently the vast majority of this work is being done by about 5 people: David Allsopp, Florian Angeletti, Nicolás Ojeda Bär, Xavier Leroy, and myself.

Despair not! Bug fixes tend to be prioritized and handled quickly; I believe that the OCaml releases remain of satisfying quality. But other aspects are affected negatively, for example:

  • our ability to react to proposed changes in a timely manner,
  • the experience of people trying to contribute to the compiler codebase,
  • various potential improvements that get stalled by lack of manpower to work on them.

Context

The OCaml compiler distribution moved to Github in January 2014. Since then, maintainers have been constantly complaining that there are more people willing to submit changes/PRs than people willing to review them, creating a bottleneck on the reviewing side. (We point this out in the first section of our CONTRIBUTING.md document.)

But the effort to upstream Multicore OCaml has unfortunately made the situation worse, for at least two reasons:

  • Integrating the completely new Multicore runtime required a lot of review, integration and documentation work. We onboarded experienced Multicore developers as upstream maintainers and this helped smooth out the process, but we have still been less available with other maintenance tasks that piled up in the meantime.

  • The sequential glaciation indirectly reduced the maintenance workforce. In November 2021 we stopped merging non-multicore-related features in the development version; as a result, various maintainers and heavy contributors moved away from working on the main development branch, to do their experiments in separate repositories (which is completely fine), and also more or less stopped following issues and performing maintenance on the main branch (which aggravated the maintenance issue).

Now that OCaml 5 has been released, many contributors will be coming back with many exciting change proposals to upstream. At the same time, our users are playing with Multicore features and will soon find countless bugs to fix, limitations to lift, etc. It’s not easy to play OCaml maintainer right now.

What can people do to help?

Contribute to the maintenance effort

Heavy contributors, in particular core developers but not only, should be expected to participate to this collective maintenance effort. We are having discussions right now about our expectations.

In my personal opinion, anyone who dedicates a substantial portion of their time to working on code intended for eventual upstreaming should dedicate a fraction of this time to collective maintenance of the upstream development trunk. (10%? 20%? Something like this.) This is the most healthy way to ensure that the volume of maintenance work scales with the volume of submissions. (If you are paid by someone to work on the compiler, please make sure that your pay also covers this maintenance fraction.)

Occasional contributors who would like to help with OCaml development should also consider whether they can help with this. (No pressure!) We have several instances of people helping with code reviews, triaging, helping make decisions on design questions etc. (I remember nice contributions in this direction from Daniel Bünzli, Gabriel Radanne, Nathanaëlle Courant, Favonia, Guillaume Munch-Maccagnoni and Kate for example.)

Generate less maintenance work

If you interact with the compiler distribution as a software project, please be mindful of the maintenance load that you generate.
If you send a Pull Request, make sure that its purpose/justification is explained very clearly, that it is easy to review; that the benefits of the change (explain those clearly) outweigh the long-term and also the short-term costs of integrating it.
Similarly for feature requests or enhancement proposals: now is the time to focus on the uncontroversial things that are clear improvements, and to justify, explain them very clearly.

How?

It may not be immediately clear to people what “contributing maintenance work” means concretely. Right now I see three obvious approaches.

  1. Subscribe to github/ocaml/ocaml notifications and jump in when you want.

  2. Look at our issues (258 open as I write this) and see whether you think can help. Maybe some are out of date / irrelevant and could be closed – say it. Maybe some bugs could be fixed, or some enhancement requests could be fulfilled. If you can, give it a try. It’s best to start with issues where the desired outcome is consensual (a clear bug to be fixed, with no immediate downside; a small interface improvement that does not introduce much complexity and is well-justified; etc.), rather than work on some weird syntax proposal that will in turn require ample discussion and may be turned down in the end. (If you find a wonky proposal that failed to gather consensus and probably never will, it’s actually helpful to suggest closing the issue.)

  3. Look at our pull requests (246 open as I write this) and try to see whether you can help. Again, it’s best to focus on PRs where there is a clear motivation/need. Look at the code, feel free to ask questions on things you don’t understand or comment on aspects you don’t like so much. If the PR is stale, maybe it should be rebased (would you like to give it a try?), or there isn’t much that can be reused and it could be closed – feel free to say so.

We have received the feedback that some people are still unsure what to do. In the upcoming weeks (probably in January) we will have more discussions about how to organize maintenance, to find more focused processes that encourage people to contribute in this way. I don’t think that there is a silver bullet, a magic process that will make it much easier, so I would encourage anyone interested to first try those three basic approaches above and see if one works for them.

In my experience people often self-censor and do not try to react to PRs or issues that are not in their area of expertise. But most of the compiler codebase is in only a very few people’s area of expertise, the rest of us (myself included) just make do with their imperfect understanding and try to help anyway. Do not hesitate to walk into issues outside of your comfort zone, it is a great way to learn about the compiler distribution codebase.

Happy maintaining!

32 Likes

Another possible thing to do is:

  1. Filter the issues/PR you opened yourself and assess whether it’s still relevant or reasonable at that point in time or for the forseeable future (ideas and pie in the sky stuff can be discussed to death on this forum).

I remember doing this a few time on the opam project and ended up closing a few things I had requested a few years earlier.

9 Likes

Note that many (most?) open-source communities have faced similar issues, there is a thread on the Rust discuss about the exact same issue:

Apparently the approach that Rust people are taking is to have people subscribe as “reviewer candidates”, and have their Github bot assign new PRs to one of the reviewer candidates at random, with greater priority to potential reviewers that have a short review backlog. It is interesting that, at least in that discussion, there seems to be no emphasis on the idea that people that submit PRs should also contribute to review work.

The whole narration around “contributing”, “PR welcome!” in OSS completely eluded the fact that making a PR is only a tiny and the easiest fraction of the work needed to integrate suggestions from random people from the internet into a cohesive result.

Maybe one thing you could try to add is a kind of tit for tat statement in CONTRIBUTING.md. Something of the form (better said) we are happy to take you work, but integrating and maintaining it takes time and ressources, there is a social expectation that for any PR merged of yours you will make a full review of another PR.

5 Likes

It’s not obvious to me, what should be done after successful local rebase? Unlikely, any random person have rights to force push to every possible PR. Opening a new rebased PR will create more maintenance work, and will probably complicate reading of previous discussion.

If the PR has not had any activity in a long time, it is likely that the original author has moved on, but it would be rude to assume that. So I would leave a note in the original PR both asking whether the original author is interested in resuming work on the issue, and to inform that you have rebased the work in a different PR (no need to wait to open the separate PR, it can always be closed if needed).

Either the author will respond, or not, in which case the original PR can be closed and the discussion can be continued in the new one. Generally speaking, the original author should be credited in the new PR (and in the Changes entry when/if the PR is merged).

Lastly, note that sometimes PRs become stalled because of genuine issues with the design or implementation of the code. Absent a resolution of those issues, rebasing the PR will not really get it any closer to being merged.

Cheers,
Nicolas

2 Likes

I would:

  • push the rebased branch on my remote fork
  • post it on the stale issue, asking the author if they are willing to adopt it
  • after a few weeks without a reply, go ahead and submit a new PR and invite maintainers to close the previous one
2 Likes

Informed by further discussions on this topic, I proposed a change to the upstream CONTRIBUTING.md file to reflect our expectations regarding “collective maintenance”: https://github.com/ocaml/ocaml/blob/5e243d502e65b3e6df6023b4b8d7ebe3c7cc0609/CONTRIBUTING.md#collective-maintenance

The text as it currently stands is included below. Note that it describes expectations for all frequent contributors to the OCaml compiler codebase (proportionally to the volume of contributions: no expectation in practice for infrequent authors of small contributions), so it is relevant to more people than just the team of core developers.

Proposing changes to the OCaml compiler contribution generates
“maintenance work” for other people. Maintenance work includes, for
example:

  • reviewing Pull Requests or language change proposals,

  • considering change suggestions and giving feedback to turn them into
    actionable issues,

  • implementing bug fixes or feature requests of general interest,

  • improving the documentation of the tools or other usability aspects,

  • or documenting or clarifying the codebase to preserve and improve
    our ability to change it in the future.

Doing this collective maintenance work is a selfless task, and we
typically have much fewer people willing to to do it than people
willing to submit new language features or generally evolve the
codebase for their own specific needs. Without a collective effort to
participate, we end up with a handful of people doing the vast
majority of this collective maintenance work. This is exhausting, does
not scale, and slows down the pace of improvement of the compiler
distribution.

To keep a healthy open source project, we need the total maintenance
work performed by all contributors to scale proportionally with the
total demand for maintenance work they generate. This can only work if
as many contributors as possible perform some (possibly small) amount of
maintenance work: collective maintenance. One could use the metaphor
of a shared house: things work well when most people, not just a few
people, participate to the house chores.

If your contributions generate maintenance work for others – in
particular, if you spend a substantial effort working on a change to
the language or compiler codebase meant to be eventually proposed
upstream – we expect that you will spend a fraction of your
contribution time on maintenance tasks, typically on the parts of the
compiler codebase that you are already working on. This approach is
good for the project, and also for you: helping maintain the codebase
will improve the quality of your own contributions, and the social
ties created by infrequent collaboration with other contributors will
be useful when submitting your own work.

Note: we have been asked whether groups of contributors could balance
maintenance work at the level of the whole group, rather than
individual contributors – for example a company where some frequent
OCaml contributors would do less maintenance and others would do more
to compensate. Yes, that sounds reasonable, but also harder to balance
than encouraging everyone to play nice individually.

I’m happy to hear any question or remarks on this topic :slight_smile:
(The text is a first iteration and may be refined later.)

5 Likes

To me the most surprising thing about OCaml PRs is probably that simple, obvious PRs that would be easy wins get bikeshed or ignored for long periods of time. E.g. https://github.com/ocaml/ocaml/pull/11993 where Prof. Leroy writes:

Patiently waiting for the next “what about” comment…

Or this seemingly no-brainer that has been sitting without any go-ahead since 2018: https://github.com/ocaml/ocaml/pull/2170

As Leo White writes:

Could I suggest not getting too bogged down in discussions on what we would do in the hypothetical situation that we wish to extend the signature of these modules.

Maybe we can say that getting ‘ignored’ is just a symptom of the real bottlenecks. But I don’t think the bikeshedding is. All PRs are not the same; some require a careful review and discussion. Others need little more than rubber-stamping. The rest fall somewhere on the spectrum. In the OCaml review process I see a culture of bikeshedding on basic design and alternatives after the implementation is already up for review in someone’s PR. Sure, we could argue this is a holdover from the days of Mantis when it was much easier to discuss issues in PRs, but those days are over now. Maybe we need to re-examine this culture.

2 Likes

I don’t really agree with your assessment. Some points about bikeshedding:

  • It is easier to just give opinions than do a full review; if you feel that a PR discussion is full of discussions, sometimes the issue is the lack of people available/motivated to do a full review. (But the opinions are already useful; for example #11882 got many opinions and no full review and this sufficed to make good progress on the design for a new iteration of the PR.)
  • I don’t agree with your “no-brainer” assessment on either of the two PRs you cite.
    • #11993: Changing the tools we install by default could have unforeseen effects on our users (who knows how those tools are used?), so we need a broad discussion with the community to make sure that a change is safe/acceptable. If we merge this as a “no-brainer” and then, at the next release, users end up complaining that their workflow broke, and we have to revert, this is a fair amount of time wasted for everyone. Of course, there is a risk of this happening no matter how long we discuss the PR, but I think it is important to take the time to hear people on this.
    • #2170: I fail to see how you think of a fundamental change to many modules of the stdlib API as a no-brainer change. We don’t have operators in the stdlib for now (except in the initially-opened namespace), so figuring how to do this right is tricky. Binding operators are also a new feature and people don’t agree on the right way to design them yet (there is another PR open that could change our syntactic choices quite a bit).

The standard library is held to very strong backward-compatibility expectations, which make mistake costly. This is a reason why there is a lot of hesitation to large stdlib changes. (For a while it was even completely frozen. The situation has improved a lot in the last few years, thanks to a few brave souls notably @nojb, @c-cube and @dbuenzli.) We could relax this backward-compatibility requirement, but this would also have costs in the ecosystem.

The OCaml development process is very much not a “move fast and break things” sort of thing. People tend to be averse to change by default. This has costs – it can certainly be frustrating to contributors – but it also brings a lot of value for users of the language, who rarely need to be worried about moving to a new language version. I think that some of what you call a bike-shedding culture is part of what preserves this value for users.

Of course, there is also superfluous bike-shedding, and I am sure that there are various aspects of our change process that could be changed – including various aspects that I cannot see/recognize myself as I am too used to the current ways of doing things. But I remain of the impression that the first source of frustration for contributors are the delays in getting a decision on their PR, and that this is not caused by the discussions but by the lack of workforce to take those decisions – at the level of quality we have come to expect of these decisions. We could, certainly, decide to worry less about changes and be more trigger-happy with the merge button, but do you really believe that we would get a better language in the end?

In the OCaml review process I see a culture of bikeshedding on basic design and alternatives after the implementation is already up for review in someone’s PR.

What would you propose as a different way of doing things?

5 Likes

Let me give an overall answer instead of to each point. In my humble opinion we need a little bit of planning and notice of upcoming planned work to give people the chance to respond. E.g. imagine somewhere a list of upcoming planned changes:

  • Reduce size of compiler distribution by compiling the following tools to bytecode only (more details as needed)
  • Add submodules containing standard let-operators to the following data modules in Stdlib
  • Etc.

Should this be in the GitHub issues? Maybe. The planned work would get jumbled up with all the incoming issues filed by people. Maybe fixable with ‘planned’ label. RFC? Feels very heavyweight and bureaucratic for simple changes. Something else? Perhaps.

In any case, giving people a chance to discuss before actually doing implementation work could save some time and frustration. There should be a cutoff period for the decision though. I don’t think we needed three years to decide to add some new operators in nested submodules in Stdlib.

3 Likes

@gasche: I have a couple questions.

  1. Would making a PR removing the hardcoded CC=cl and LD as link for MSVC (when CC and LD are supplied with CC=.../MSVC/14.29.30133/bin/Hostx64/x64/cl.exe LD=.../link.exe ./configure) be maintenance work? I think it falls under the “implementing … feature requests of general interest”, but I’m really trying to tease out what “general interest” means (because only a fraction of the currently small Windows user base would use it). Obviously, that example PR would have to include improving the MSVC detection logic (ie. if test x"$cc_basename" = "xcl" is a very brittle detector for MSVC) and that part is unambiguously maintenance work.
  2. (More important!) Is it fine to test things on OCaml 4.14 rather than OCaml 5 trunk? I have no ability to test the OCaml 5 trunk until MSVC is available for OCaml 5, and that won’t be soon. Obviously within reason; that is, the changed code would have to be similar between the trunk and 4.14.

Thanks.

Regarding whether a given change is going in the right direction for Windows, I have no idea, you know better than I do. The person who would judge the merit of such a PR would quite probably be @dra27. In general, we are in favor of making the life of OCaml Windows users better (assuming that the cost in complexity of the overall system is proportionate) – I wouldn’t assess the benefits just from the current Windows+OCaml userbase but rather the potential future userbase.

Testing on 4.14: yes, this is fine. (Maybe someday you will be able to test on top of Restore the MSVC port of OCaml by dra27 · Pull Request #11835 · ocaml/ocaml · GitHub instead.)

Yes, it’s a “bug” that configure ignores CC and LD when targeting MSVC (we’ve had similar issues with dealing with CFLAGS, CPPFLAGS and LDFLAGS in the past). Apart from anything else, I’ll happily review a PR which fixes that :slightly_smiling_face:

As an aside, there is a slight issue with hard-coding those paths, in that upgrades to the packages within a Visual Studio can change them (which is why I went down the msvs-detect route, ensuring that the environment is set correct in opam switches instead), but I’m guessing that you’re both aware of that and mitigating it in Diskuv.

Yes, definitely - I’d open the PR against trunk. I’m in two minds as to whether that specific fix should be back-ported to 4.14, but regardless I’d expect to test it either against 4.14 or on top of ocaml/ocaml#11835. Incidentally, Visual Studio 2022 17.5 is now released, so testing that PR no longer requires the “Preview” release of Visual Studio.

Thanks @gasche and @dra27; I understand the policy much better now.

I wouldn’t use a hardcoded path in an opam switch. My installer makes a “system” set of ocamlc, dune, etc. binaries available to Windows users outside of an opam switch. That lets newcomers start OCaml programming with dune.exe and utop.exe without knowing anything about opam. That “system” relies on my MSVC-and-Unix-environment-providing with-dkml shim for dune. The hardcoded path change would let me move the shim off dune.exe and onto cl.exe and ml64.exe, which will expand the set of binaries that Windows users can use outside of an opam switch (ex. ocamlc, ocamlbuild, etc. will work).

I cut myself an issue to check if the upgrade edge case was handled; I remember thinking about it, but sometimes that doesn’t translate into doing it. Regardless, it is a tiny change to handle that in the with-dkml shim that provides the environment.

But that reminds me … I don’t think C compiler build number upgrades (ex. 16.6.30309.14816.6.30320.27) will work in general. I’ve seen weird failures when GitHub does rolling upgrades of the Visual Studio environments in GitHub Actions. The C .obj and .lib files are cached on a machine in one build number, and then a machine with a different build number would download the cached objects, then link with them and fail (I guess flexdll*.obj would be one cached object, and I guess that exported symbols are slightly different between VS builds). Anyway, different but necessary discussion. (The lack of build number robustness may limit the scope of relocatable opam switches to a single machine. The utopia would be to build a switch on a GitHub/GitLab CI machine and have that installed on an end-user machine)

I wasn’t being very clear. I wouldn’t want it back-ported to 4.14 except in the Diskuv OCaml repository. My real concerns are about having patches in my own repositories; I never want to blindly walk into a situation where I make use of a patch that won’t be accepted upstream. So understanding the compiler PR policy is important to me (and I suspect others).

1 Like