Best practices around using external libraries on opam?

I would like to make sure that if github is down that I can still build my Ocaml project via opam. The way my builds work is a fresh build VM on every build that installs everything via opam and then produces my artifact.

What are the best practices in terms of maintaining local copies of all dependencies? Is it easy to maintain an opam cache on, say, S3, that all of my installs will go through?

1 Like

If I had this concern I’d create an opam local switch and see if copying my project’s _opam directory to the build VM after I clone my project into it but before I run the build works.

You could build and upload a Docker image containing your project build dependencies already built and in the dune cache, and start every build from there. The image would need to be rebuilt every time dependencies changes, which would be annoying and time-consuming, but when dependencies don’t change it should be fast because it would just download the image and start the build there.

Another option would be to use Nix as a dependency management system, and Cachix as a build artifact caching service. It would definitely need more setup, but would give you a fine-grained build dependency artifact cache.

Actually, what’s your specific concern with GitHub? That you may have a networking issue that makes GitHub inaccessible, or that GitHub itself could have robustness issues that seize your workflow?

FWIW, this is something companies worry about quite a bit.

Google opted not to use GitHub to coordinate Chrome and Android open source development because (among other reasons) they were concerned about GitHub’s single point-of-failure. At the time GitHub was only able to serve out of one datacenter. Instead Google developed a multi-master Git service and their own CR/CI system, Gerrit.

Presumably Microsoft also learned that they were becoming very strategically dependent on GitHub and solved this problem by acquiring them.

I believe GitHub has addressed the all-eggs-in-one-datacenter concern since then though.

There are several levels of cache used by opam.
① normally, it does not rely on github servers, unless you manually configured it to use the git repo rather than the one at opam.ocaml.org
② package archives are mirrored at opam.ocaml.org too, which is the primary source opam will try to fetch from
③ and unless you run opam clean, archives are kept locally in ~/.opam/download-cache

Of course, if you use the repo through git directly, that may disable the use of the remote cache ; or you may have packages bound to git urls.
At least for the former, you can manually configure: archive-mirrors: "https://opam.ocaml.org/cache" in ~/.opam/config to have opam attempt to use the cache there unconditionally of where the package definition was found.

Last, generating a local cache is quite easy:

  • git clone opam-repository
  • optionally, use opam admin filter from the clone dir not to cache the entire repo…
  • run opam admin cache to fetch all archives locally
  • use that local repo through opam repo add --all, or just configure the cache dir globally as above (using a file: URL)
2 Likes

I also asked a similar question some time ago: How to setup local OPAM mirror

TLDR; there is no easy step-by-step guide, so I opened a bug to document/change opam for easier process of setting up the local or a regional mirror: https://github.com/ocaml/opam/issues/4103

A related option that we use (to keep facebook build machines from DOSing opam or github) is to build a tarball of the dependencies based on a lock file using the extract_mini_repository script. This has worked well for us, but I guess that you need to be willing to go to a lockfile based workflow or you will be churning dependencies very often and have significant skew between local and CI builds.

1 Like