Welcome new maintainers of opam repository, and introducing Obi

Following this thread, it became apparent that we need more maintainers of the opam package repository. The call for maintainers was very successful, and I am pleased to have expanded the team to include the following members: @perry, @djs55, @rgrinberg, @yomimono, @jpdeplaix, @hannes, @mseri and @hcarty.

They join the existing hard working maintainers, who are Damien Doligez, @whitequark, David Sheets, @samoht, @amir, @gasche, @lefessan, @yallop and Gregoire Henry.

So many contributors need a little more coordination, and luckily the OCaml 4.06.0 has set our opam world on fire, just in time for the newcomers to assist :slight_smile: There has been a steady increase in the number of build failures, but the OCaml 4.06.0 release flipped the ā€œsafe-stringā€ default into being active, resulting in 1000s of packages failing to build in the new release.

In order to keep up, Iā€™ve built a prototype tool called ā€œobiā€ (OCaml Build Infrastructure) which does tens of thousands of package builds on a build cluster hosted at OCaml Labs (for x86_64) and Packet.net (for arm64) and IBM (for ppc64le). The log outputs are available at: http://obi.ocamllabs.io/

Some things I want to draw your attention to:

  • There are logs per compiler version that are rebuilt roughly daily across a suite of compiler releases. In particular, we build using 4.06.0 and the common variants (default-unsafe-string and flambda) to check for regressions there.
  • There is a triage page which runs an analysis over the results to classify failures. The are two areas right now which need attention: fixing regressions due to the 4.06.0 safe-string, and failures with flambda.
  • The safe-string triage has thousands of packages failing, which either require constraints being added (for older revisions), and ideally a release of the package that does use safe-string. This is where we could use help from the whole community to bring our repository back up to speed.

As to where this is all going, it will be worked on at OCaml Labs over the next 12 months to improve the tooling considerably as OCaml continues to grow in popularity:

  • Obi is still an experimental deployment, so all the URLs above are subject to change, so please donā€™t hard-link to it from anywhere important.
  • All the metadata is available online as sexp files, and so Obi will also have a CLI tool that implements the workflow. I just wanted to get the HTML build logs up as soon as possible to allow our new maintainers to help fix problems.
  • The non-x86 and non-Debian logs are available, but not yet rendered. If someone has a burning need to inspect them, let me know and Iā€™ll push it up the priority queue.
  • Eventually, the manual opam-repository fixes can be automated via the CLI, but that feature obviously isnā€™t there yet. It will likely land only when we complete the release of opam 2.0 and can take advantage of some of the automation features.
  • Only Linux-based operating systems are being built right now, but we are expanding the cluster to include at least Windows, FreeBSD, OpenBSD and maybe OSX if we can figure out how to automate it. If anyone is particularly interested in helping with automation (e.g. something similar to containers) on the *BSDs in particular, that would be helpful.

For now though, letā€™s congratulate our new opam maintainers, and hopefully get to fixing some of these safe-string issues :slight_smile:

Anil, your friendly opam-repository overseer

21 Likes

Thanks to everyone for volunteering to help!

Iā€™m generally a relatively absentee co-maintainer (there is no shortage of other fires to put out), but I participate to triaging in bursts, and general repository care on the long term (typically during preparation periods for new language releases).

For PRs, my general approach is to first look at all PRs that are green, check quickly that they are reasonable and merge them, and then do a linear swipe on the non-green PRs. Being non-green often corresponds to one of the following:

  • There is a build error with the package under some OCaml version or OS. I consider that packagers should fix this before a merge, so I report the error and move on. (If I can make guesses at what causes the error, I make them.)
  • There ia dependency error on some OS that doesnā€™t have the right depexts. I try to complain about it as well, althought I wouldnā€™t consider this a blocker for merging if the authors donā€™t know how to improve that.
  • There is a build error in some other package in the dependency chain. I complain to the mainainer. Sometimes they need to make a bound stricter, but it is also often the case that the problem is not with their package but with the dependency, or with a cross-cutting aspect. I wouldnā€™t hesitate to encourage the packagers/submitters to deal with the issue themselves (the more external people you can get to curate the repo, the better). If the issue is not solved in the submissionā€™s opam file or in other versions of the same package, it is better to ask for a separate PR than to deal with several problems at once in the same PR ā€“ it will make CI reports easier to read, and otherwise you can end up with a bloated PR touching many different things at once, and never know what goes wrong.
  • Sometimes the CI is just wrong about things. In my experience it happens less often with Travis, which does simpler tests (theyā€™re less informative but correspondingly more robust); I wouldnā€™t merge a package that keeps failing under Travis, but if it passes Travis and fails for weird reason (lack of entropy, End_of_file or CentOS or whatever) you can merge and move on.
  • Sometimes the CI is right about something that we decide to ignore. For example, during the 4.06 release preparation, sexplib was broken under 4.06 and any package with a sexplib dependency would have a 4.06 build failure because of that; there is not much that we can do besides merging with a known failure.
1 Like

@avsm : thanks for the bulk build results page, they look very useful already.

The ā€œsafe-string build resultsā€ look ripe for a coordinated community action ā€“ not a fixed-time sprint, but at least a separate Discuss topic to coordinate, encourage people to participate and report on progress. Should I take care of starting this?

What is the refresh rate for the page? Did you do a dump today and you will do other regularly, or is this updated incrementally when the repo changes? How much time should crowd-working contributors wait to see the effect of their fixing actions on the shared common report?

Another thing maintainers may help with is to curate and triage the opam-repository issue tracker so that the number of issues does not get out of hand and concrete build problems get answered in a timely fashion.

This goes from helping newcomers that get into build problems with opam/depexts itself, to pinging package maintainers when thereā€™s a build problem reported for their package, to help resolving issues with these packages when the maintainer is too busy or missing in action.

2 Likes

BTW, is there any feedback mechanism in the obi infrastructure that will tell a package maintainer that their package is failing a bulk build? Iā€™m guessing not yet (after all, you only just created it), but one of those might be useful at some point if you ever have time ā€” perhaps an automated email or some such. That way package owners will know something is wrong.

Also, does the obi triage page only show us ā€œpackage itself is brokenā€ or does it also show us ā€œprerequisites for this package are brokenā€?

Itā€™s building an old version of crunch (v 1.3.0), the newest released is 2.1.0

http://obi.ocamllabs.io/logs/88c27ea217115f89f875f286f77a4d38.txt

The solver seems to have gotten a strange result when trying to build mirage-clock ā€“ the triage page marks mirage-clock 1.3.0 as a failing build (log), and the build did indeed fail, but itā€™s because fmt version 0.7.1 was selected to fulfill the fmt dependency (the current version is 0.8.4 which installs fine on 4.06.0, as does mirage-clock 1.3.0). The container images doing these builds are using a sensible solver, right?

While itā€™s true that many packages are missing required lower bounds on their dependencies (and some automated work could be useful in addressing that!), Iā€™d be surprised if this was intended behavior.

There have been tweaks with the solver criteria recently, so that may be a regression: Iā€™d need to know what exact v. of opam this was done with. Couldnā€™t reproduce locally on master, though.

Some dependecies are solved differently depending on order of installation. For example I had a problem with biniou 1.0.6 when trying to install merlin 3.0.4, but it chose a more recent version if yojson or conf-which were already installed before. Also, I have utop 2.0.2 installed successfully while itā€™s failed on Obi. I suppose thatā€™s for the same reason. BTW, Iā€™m using the latest OPAM master.

A separate topic would be great ā€“ particularly with respect to releasing new versions of packages rather than simply constraining them.

While Obi is in experimental mode, the page will refresh every few days since it is semi-manually driven. The reason for this is important ā€“ the new CI runs on opam2-dev only since it takes advantage of new features (and @altgr is awesome at adding the interfaces we need for good CI and canā€™t do so on opam1).

Therefore, it is prone to running into bugs in opam2-trunk that cause transient blockage and so wonā€™t be fully automated until opam 2.0 is released. We also have some instability coming up as we hook in Windows tests into it as well, which may require a rearrangement of the container layouts.

However, I hope it is useful in its fledgling form today to highlight packages that can be immediately fixed. I am too short on time due to other duties to do the actual opam-repository work myself at the moment, unfortunately.

1 Like

There is no such feedback mechanism yet, although I expect we will build that in the future. For now, I hope that having extensive build logs about maintainersā€™ own packages will make things easier for them to fix (as indeed, the datakit-ci reverse dependency checks have done).

The Obi triage page shows that a package is broken, including if one of its dependencies are down. @Altgr, is it worth distinguishing a dependency failure from an install target in the opam install error code, you think?

One tool that should come online to make this less manual soon is opam admin add-constraint (or a variation thereof for compiler revisions). @Altgr is working on a bidirectional branch for opam1/2, so in theory (if this works ā€“ he may well contradict me as being mad) we could use the opam2 administration tools to add constraints that reflect back into the opam1 git repository :slight_smile:

Iā€™m reluctant to spend any time on improving the opam1 administration tools while we have our hands full with getting opam2 released, so this ability to use opam2 to improve opam1 would be very, very handy to have.

@altgr Does this include by any chance that opam tries to pick the smallest compatible version for a given package ? If thatā€™s the case thatā€™s very useful, I think many build failures (at least in my packages) can be attributed to lack of lower bounds.

I think the reason to distinguish them is, at least somewhat, is that one should focus on the packages that are themselves known broken. The packages depending on a broken package would often be fixed as soon as the dependency is fixed.

No, not by default, the best on the user side is still to try to get the latest possible version of everything.
See here for the changes, which were performance-oriented but shouldnā€™t change the end results.

You can, however, try to install lowest bounds by maximising, instead of minimising, the count[version-lag,changed] criterion. Note that mccs doesnā€™t perform well with maximisations, though. aspcud is still an option for such cases. Since the criteria format is annoyingly different, see opam config report --solver aspcud for the defaults. The criteria you want would be something along the lines of -count(removed),-sum(request,version-lag),-count(down),-count(changed),+sum(changed,version-lag). for installing mirage-clock on an empty 4.04.2 switch, for example, this gives me:

  āˆ—  install ocamlbuild    0.9.1     [required by fmt]
  āˆ—  install conf-m4       1         [required by ocamlfind]
  āˆ—  install ocamlfind     1.6.2     [required by jbuilder]
  āˆ—  install jbuilder      1.0+beta9 [required by mirage-clock]
  āˆ—  install fmt           0.7.0     [required by mirage-device]
  āˆ—  install mirage-device 1.1.0     [required by mirage-clock]
  āˆ—  install mirage-clock  1.3.0     

instead of

  āˆ—  install ocamlbuild    0.11.0     [required by fmt]
  āˆ—  install conf-m4       1          [required by ocamlfind]
  āˆ—  install result        1.2        [required by fmt]
  āˆ—  install uchar         0.0.2      [required by fmt]
  āˆ—  install ocamlfind     1.7.3      [required by jbuilder]
  āˆ—  install topkg         0.9.1      [required by fmt]
  āˆ—  install jbuilder      1.0+beta14 [required by mirage-clock]
  āˆ—  install fmt           0.8.4      [required by mirage-device]
  āˆ—  install mirage-device 1.1.0      [required by mirage-clock]
  āˆ—  install mirage-clock  1.3.0      
1 Like

The possibility has been proposed in the past to have ā€œsmallest version possibleā€ by default in CI, but actually, since we now have a much better CI that can test many more things, why not just do both ?

1 Like

When going over the current list of open PRs against opam-repository, we can waste some time looking at issues that have already been taken care of by another repository maintainer.

I would propose to use ā€œassignedā€ issues to avoid this. If as a maintainer you do the work of understanding an issue, you donā€™t merge right away but you feel confident that you will be able to take care of the interaction with the submitter from now on, feel free to just assign yourself the issue. Then other maintainers are free to skip this issue when looking for new issues requiring their attention ā€“ of course giving a second opinion on an assigned PR is always welcome.

6 Likes

@Chris00 over at https://github.com/ocaml/opam-repository/pull/10821 is asking for someone to test a configuration script under OSX. Among the new or old opam-repository maintainer, is someone running OSX and willing to do this?

(For the record, Iā€™m running Fedora and could help debug Fedora-specific issues. Fedora is pretty bad for OPAM, though, given that there is no decent external solver packaged for recent releases.)

Thanks for publicizing my request. Travis says all is fine, including on Mac OSX so I guess I put the right compilation instructionsā€¦ :slight_smile:

Now it is @UnixJunkieā€™s turn to need a MacOS userā€™s hand to try to build a conf-package in #10926. It would be mildly helpful to have a list somewhere of which kind maintainers to ping in that situation.