[ANN] opam-ci: first release of a tool to check the health of your packages

Cross posted from https://github.com/ocaml/infrastructure/wiki/Using-the-opam-ci-tool

Have you submitted a new opam package after testing it on your desktop, and then wondered if it builds on OpenSUSE or CentOS, or on an ARM or PowerPC architecture, or against a different version of the OCaml compiler, or if older version continues to work after a few years of being published? But you don’t have the resources to check all these interactions manually, especially as the opam package database now contains thousands of revisions of OCaml source code. The new opam ci plugin comes to the rescue! The remainder of this post describes its uses, how to triage issues and then fix them.

opam-ci provides an interface to the opam2 continuous integration cluster, which regularly rebuilds the full package repository across a variety of OCaml compiler versions, operating system distributions and CPU architectures. These builds are done regularly in remote infrastructure and the results are pushed to a metadata repository where they are fetched by a CLI client to let you query the status of your packages.

  • opam ci status shows a dashboard of the build results across this matrix. Packages can be filtered by maintainer substrings or tag names in the opam package description, so you see only those relevant to you.

  • opam ci logs will show you the build errors so you can fix them. It also generates a Dockerfile of the precise build to reproduce the environment locally for you.

To get started, try these commands with the maintainer argument replaced with your own information or tags:

# show all the failing MirageOS packages
opam ci status -m org:mirage | less -R
# show all the packages maintained by anil@recoil.org
opam ci status -m anil@recoil.org --filter=all | less -R
# show all the packages failing on the latest RC of the OCaml compiler
opam ci status --filter=variants:rc | less -R
# display all failure logs for the mirage-xen package
opam ci logs mirage-xen

The status view shows a panel of icons that represent different combinations of ways to build opam packages. From left to right, these are:

  • Compiler: The circled numbers represent OCaml compiler versions (a circled 6 is OCaml 4.06, a circled 7 is 4.07, and so on).
  • Distro: The square letters indicate different OS distributions. D is Debian, F is Fedora, A is Alpine, U is Ubuntu and S is OpenSUSE.
  • CPU Architecture: The small circled letters represent different CPU architectures. x represents x86_64, a is arm64 and p is PowerPC64LE.

Some compiler variants are also tested to track down specific problems, shown by the icons to the far right of the display. (see below for more information on these).

  • safe-string: The ss icon is for ‘safe-string’ failures, which would happen in OCaml 4.06 due to the switch to immutable strings.
  • flambda: The fl icon is for packages that fail to compile with the flambda variant of the compiler.
  • release-candidate: The flag icon is for packages that fail to compile with the latest release candidate of OCaml; this is useful to figure out how much of the ecosystem works with a soon-to-be-released compiler.

The colours indicate the result of the build: white indicates the package was not built due to constraints, green is a successful build, yellow indicates the build was skipped due to a dependency failure, red is a direct build failure of that package, and magenta and blue indicate package metadata errors such as a failure of the solver to find a solution or the package sources being unavailable. (note: colors are currently mandatory but this needs to be improved)

Uses

Check your own package builds

See all failing builds by specifying the maintainer:

opam ci status -m anil@recoil.org | less -R

See all builds, including successes:

opam ci status -m anil@recoil.org -f all | less -R

Check on a project’s libraries

You can specify additional -m fields, which match based on the maintainer: or tags: field in the opam metadata. For example, the MirageOS uses org:mirage in its tags to group libraries:

opam ci status -m org:mirage | less -R

Package you have forgotten about

Sometimes you constrain a package due to an incompatibility with a newer version of OCaml, but then forget to release a new version.

You can query for “lagging” packages whose latest version is incompatible with the latest release of OCaml due to constraints:

opam ci status -f lagging | less -R

Fix specific issues around OCaml features

Newer releases of OCaml come with some backwards incompatible changes. You can find packages with some of those particular problems.

Migration to safe-string

OCaml shifted from mutable to immutable strings by default in OCaml 4.06, but there is a variant of the compiler with the old behaviour. You can list packages which break with the new immutable default, but that work with the older setting.

opam ci status -f variants:ss | less -R

Flambda inliner compilation

There is an experimental inliner available with the flambda configure time variant of the compiler. A few packages fail to compile with the new option, so you can list those explicitly to determine what’s wrong:

opam ci status -f variants:fl | less -R

Testing on trunk / release candidate OCaml

When there are release candidates for OCaml, it is helpful to test packages on those pre-release versions. You can find packages that compile successfully on a stable release but fail on the bleeding edge compiler:

opam ci status -f variants:rc | less -R

Note that this shows actual build failures. You can use the “lagging” filter to find packages that have been constrained to prevent them from being compiled on the latest version entirely, which is useful to figure out what needs porting.

opam ci status -f lagging | less -R

Community service

You can find all unmaintained packages that might seem some assistance since their maintainer fields are blank:

opam ci status -f orphaned | less -R

Identifying issues

Once you have identified a package that is failing, you can inspect the build logs to figure out how to fix the issue.

Firstly, use opam ci logs to find out what’s wrong. For example, I might run this on the xen-gnt-unix package.

$ opam ci logs xen-gnt-unix
xen-gnt-unix: multiple build failures found with different configuration parameters.
Please refine the command to select exactly one of the following:
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=ubuntu-18.04
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=fedora-28
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06+flambda --arch=amd64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06+default-unsafe-string --arch=amd64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.05 --arch=amd64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.04 --arch=amd64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=alpine-3.7
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=ppc64le --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=arm64 --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=opensuse-42.3
  opam-ci logs xen-gnt-unix.3.0.1 --compiler=4.06 --arch=ppc64le --distro=debian-9
  opam-ci logs xen-gnt-unix.3.0.1 --compiler=4.06 --arch=amd64 --distro=opensuse-42.3

If just one failure is found, then the build logs are shown for that failure. If there is more than one failure, the output will give you a more precise command line to enter to select just one of the failures, as shown above.

Just pick the first one, and the output shows us the abbreviated failure log, and some metadata such as which git revision of the [opam-repository (https://github.com/ocaml/opam-repository) the package was built against.

opam-ci logs xen-gnt-unix.3.0.0 --compiler=4.06 --arch=amd64 --distro=ubuntu-18.04
====> xen-gnt-unix.3.0.0 4.06 Ubuntu 18.04 amd64 (exit code 31) (opam-repository 8425e617):
<snip>
### output ###
# File "/home/opam/.opam/4.06/lib/io-page/META", line 1, characters 0-0:
# Error: Library "io-page-unix" not found.
# -> required by library "io-page.unix" in /home/opam/.opam/4.06/lib/io-page
# Hint: try: jbuilder external-lib-deps --missing -p xen-gnt-unix @install

This then lets you hopefully see a path to fixing the issue.

Fixing issues

The problems you find can range from a few root causes. If you think of more, please add them here by updating the wiki page or posting to the discussion forum.

Incorrect opam package constraints

If a package used to build but subsequently starts failing, it is probably because some dependencies have changed their interfaces. You can use opam package constraints to fix this by identifying which the offending dependency is from the build failure, and modifying your package to only select

Feature: #5 tracks including dependency information into the opam-ci metadata directly.

A package occasionally also becomes uninstallable due to dependency constraints resulting in an impossible situation. This is normally caught by the opam repository maintainers, but mistakes slip through. In this case, you’ll need to figure out the constraints that let your package install.

Feature: #7 tracks including opam lint output in opam-ci status to make finding these problems easier.

OCaml compiler version

OCaml now releases on a 6-8 month release schedule, and the march of progress occasionally breaks existing code. In this case, maintainers can release a new version of their package that works, but older releases are still tracked in opam and should be constrained to prevent them being selected by the new compiler.

One example is the migration to safe-string, which broke a large number of packages out of the box in OCaml 4.06.0. In this case, you might see an issue like this:

$ opam-ci logs syslog.1.4 --compiler=4.06 --arch=amd64 --distro=debian-9
# File "syslog.ml", line 196, characters 50-53:
# Error: This expression has type bytes but an expression was expected of type
# string

In this case, you need to prevent this package from being installed on OCaml 4.06.0 or higher, and release a new version of the package with the functionality fixed. The opam1 constraint looks like:

available: [ ocaml-version < "4.06.0" ]

Operating System portability

A very common situation is that you test your package on your local desktop, but cannot try it on the huge number of Linux distributions out there. Thanks to the magic of Docker containers, the opam-ci shows build information on many distributions: Ubuntu, Debian, CentOS, Alpine, OpenSUSE and OracleLinux across several versions (the full list is [here (https://github.com/avsm/ocaml-dockerfile/blob/master/src-opam/dockerfile_distro.mli#L25)).

To pick on an example of zstd, the opam ci status zstd shows that it builds on Debian and Ubuntu, but fails on CentOS and Alpine. Inspecting the opam file reveals why:

$ opam show zstd --raw
<snip>
depexts: [
  ["libzstd-dev"] {os-distribution = "debian"}
  ["libzstd-dev"] {os-distribution = "ubuntu"}
]

The depexts field is driven by opam-depext which understands a large number of operating systems. You can fix this by including the packages for Alpine and CentOS and submitting the fixes. The opam-repository CI will test your submitted fixes against the matrix of operating systems and verify if the fix worked or not, or you can use Docker locally via the OCaml containers before submitting the fix.

CPU portability

OCaml runs on a variety of CPU architectures, but most of us develop on x86. The bulk builds also run regularly on arm64 and ppc64le, which exposes portability bugs in C bindings quite often.

The AFL testing system fails on ARM for example:

opam-ci logs afl.2.52b --compiler=4.06 --arch=arm64 --distro=debian-9
 ### output ###
 # [*] Checking for the ability to compile x86 code...
 # /tmp/cctyEiSt.s: Assembler messages:
 # /tmp/cctyEiSt.s:10: Error: unknown mnemonic `xorb' -- `xorb %al,%al'
 #
 # Oops, looks like your compiler can't generate x86 code.
 #
 # Don't panic! You can use the LLVM or QEMU mode, but see docs/INSTALL first.
 # (To ignore this error, set AFL_NO_X86=1 and try again.)

In this case, you can fix the portability issue in your package and release a new version. To prevent the older (already released) packages from being selected by the opam solver on that architecture, you can add a cpu constraint in the available: field. An example is the yaml.0.2.0 package that was broken on PowerPC and fixed in yaml.0.2.1:

$ opam show yaml.0.2.0 --raw
<snip>
available: arch != "ppc64"

Contributions

We would love to see more contributions to opam-ci and the associated infrastructure. The most obvious way you can get involved is by looking at the build failures and submitting fixes to the opam-repository to help us maintain the rapidly growing package database.

If you would like to work on the CLI tool itself, then feel free to get started and look at the issue list and the contributions guidelines to get started, and/or post on the OCaml discussion forum. You can also add more triaging tips to this wiki to help other users.

22 Likes

This is absolutely mind-blowing.

3 Likes

:clap: :clap: :clap: :clap:

This is really useful and needed! Ensure portable and reproducible environments out of Docker/containerization is a must have requirement.

On the past year I had some problems of this sort on an Arch Linux 64-bits. I will check that again and try to figure out the problem with the help/aid of such tool.

Thanks!

I’d love to add Arch Linux into the list of supported container distributions, but I’m not familiar with the tooling. This would be an excellent one to contribute to https://github.com/avsm/ocaml-dockerfile :slight_smile: