Removing "fat" compiler images on hub.docker.com/r/ocaml/opam2

When the container images on https://hub.docker.com/r/ocaml/opam2 were first assembled, the intention was to provide a distribution-specific image with all of the OCaml compilers available in a single image. This in turn allowed for CI systems to pull a single image and then opam switch into the different compilers very efficiently.

However, as time marches on, the number of simultaneous compiler images has become quite large, and this feature doesn’t seem to get much use. Therefore, we are planning to change the format of the OCaml compiler images hosted on hub.docker.com:

  • use the scheme that has been deployed on https://hub.docker.com/r/ocurrent/opam which is that a single image has a single compiler instance, with the version being explicit in the tag.
  • continue pushing the multiarch images to both ocurrent/opam and ocaml/opam2 from the same https://github.com/ocurrent/docker-base-images instance. This will not take up much extra space since the content on the Hub is hashed and this is just a set of new tags.
  • ensure that the CI scripts over at https://github.com/ocaml/ocaml-ci-scripts continue to work uninterrupted. The only observable change in the new images aside from the lack of multiple compilers is that WORKDIR is set to /home/opam/src instead of /home/opam/opam-repository in the new images.

Comments welcome on all of this – if you are using the “fat” compiler images to reduce your bandwidth, please let me know. Most users I’ve seen prefer the slimmer single-compiler images instead, hence this change. You can comment here or on the https://github.com/ocaml/ocaml.org/issues/1195 tracking issue.

14 Likes

Most pragmatically, this change will let us update the /r/ocaml/opam2 images to contain 4.11. There are various infrastructure issues that are preventing the pushing of the current larger images that will be fixed by the new ocurrent-based CI infrastructure.

To slim the images, does it make sense to cut off the history of the opam repository that is shipping as part of the the Docker image? The opam repository by itself is large but in a typical use case it only gets updated but not rolled back to an earlier commit.

1 Like

The proposed strategy makes sense, and seems to be inline with other tool tagging strategies.

Thanks for your great work here. I’ve been very pleased with the ocaml docker images. I particularly appreciated the wide variety of OS/compiler/arch versions published.

if you are using the “fat” compiler images to reduce your bandwidth …

Note that the new per-compiler images share the same base layers (with opam installed), so there shouldn’t be much/any increase is bandwidth or disk space from using the new scheme.

I looked at docker pull ocaml/opam:debian-10-ocaml-4.09 (about 500MB compressed). The Git history of opam-repository inside the image takes up about 290MB (uncompressed) and packages/ about 150MB. Would it not no tbe a good idea to trim the history in order to slim the image and provide an image with full history if that is what is needed? Or does it become impossible to update such a trimmed Git repo? I would argue that basically nobody needs the git history but many the ability to update the opam repository inside the Docker image.

1 Like

I would argue that basically nobody needs the git history but many the ability to update the opam repository inside the Docker image.

Thanks for the suggestion – It depends what use you have for the shallow images that do not have git history. A lot of CI systems using these images git checkout to a specific Git revision of opam-repo when doing their tests (including the opam-repo-ci). Without the history, that pinning would fail.

That doesn’t mean that we can’t figure out some combination of git --depth options to reduce the default size of the repo though, since the CI should be able to pull a revision in. One complexity with these images is that the git version is inside the container, and so the CIs need to support some pretty ancient versions (like CentOS 7) which is not yet EOL and has users. But if you can come up with a suggestion of how to use modern git to reduce duplication, I’m sure if we can figure out the backwards compatibility issues.

If you could open an issue on https://github.com/ocaml/opam/issues noting that disk usage is a concern for git checkouts of opam-repo, that would also be helpful. It’s not been a focus in opam 2.1, but we could improve the situation in a later release.

As a followup to this overall thread, the migration continues to ocaml/opam, described in Ocurrent/opam Docker images have moved to ocaml/opam

We could certainly be more efficient in the builder too. I’ve added an issue here:

3 Likes

It indirectly has, but we could still potentially do better. opam 2.1 compresses the repositories, decompressing them (to tmpfs) only when they’re updated and/or when the state cache needs to be regenerated.

This was done by @AltGr in order to speed up opam update (and it does - really, really, noticeably!), but it also has the effect of reducing the disk space required for a git clone of opam-repository from ~250MiB to ~95MiB.

I’ve opened this “PR issue” to track the idea of passing --depth=1. It seems to work, at least so far for me, and reduces the repo clone to 15MiB. Comments and further points welcomed there!

3 Likes