I’m wondering if there’s any drawback from caching ~/.opam folder between builds using ocaml/setup-ocaml@v2. It’s currently undocumented in the README: GitHub - ocaml/setup-ocaml: GitHub Action for the OCaml programming language and I tried looking around on other “popular” projects and haven’t seen it myself.
The basic idea is to restore cache with uses: actions/cache/restore@v3 with the ~/.opam folder with a key like: opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-${{ hashFiles('**.opam') }} and store this cache the same way after instalation.
Note: In the case of Windows, _opam/ instead.
I have a few questions here:
I haven’t seen any improvement over non-cached installations (do I need to tell opam somehow to be aware of the cache)?
It’s not impossible, but there are two related problems which are quite tricky to solve GHA caching in its present form. You need to be able to update the cache at the some point, which means that the dependencies need to be part of the cache key. That runs a risk of ending up with lots of caches if/when dependencies update which can risk blowing the artefact storage allowance for caches. This could be mitigated by storing a differential cache - i.e. restoring .opam with the compiler (which is what setup-ocaml presently does) and then caching just the added packages, but I don’t think there’s an automated way to do this, so it’s quite a lot of work to set in place.
Incidentally, ocaml-ci solves these problems firstly by separating out the dependency calculation to a separate step (its Analysis stage) so that the exact opam packages to install as dependencies are known up-front (that means that the list of packages form part of the cache key) and it solves the layering problem by using a snapshotting file system, so the compiler is shared between all jobs on a given worker because it’s essentially at an earlier RUN stage of the job.
Yes, sorry - I should have mentioned in passing above that you were doing that part!
The issue then is the cache sizes and actually getting updates - setup-ocaml is already caching the switch (a couple of hundred meg, I think) and caching .opam again will include that part. The other issue is that you’ll never get updates - so you need some kind of timing element in it (or, as you suggest, manually deleting the cache weekly/monthly). One possibility is to include, say, a week number in the cache key. ocaml-ci solves that by computing the oldest sha of opam-repository which gives the latest solution - that gets re-checked every time opam-repository receives a merge, but rebuilds are then only triggered if a relevant package has changed.
Totally, if there’s the possibility to use opam.lock (haven’t use it in a while) as a key, you will get the updates when any local developer would install their enviroment and push to GH, which is a very desirable case.
Yes - using a lockfile should definitely work - the files are already checked out at that point, so it can be referred to in the hash as well. In essence, for a specific use-case it should be possible to improve caching with the action, it’s just slightly too risky for setup-ocaml to do that by default. If you have the time, I think a “FAQ” entry on how to improve caching would be a good addition to the README.
Did you manage to get the caching to improve the performance of your actions?
On a project I’ve been working on recently (to be open sourced soon hopefully), we were able to get our action times to drop from 30+minutes to 5minutes by making use of the cache.
I’ve included the relevant snippet from the action we used:
Actually, the hard work was done by an undergrad @mayank working with me, so I can’t really comment on whether this is actually a robust way of improving performance times.
I think the key step in reducing build times was to add the if check to the install steps to avoid running them when the cache was hit.
Yes, I have the exact same structure and I see an improvement over installing but there are times that I see “Nothing to do” and some times (cached) when installing all deps, so I was wondering if there’s something inconsistent there.