Cache with gh action `ocaml/setup-ocaml@v2`


I’m wondering if there’s any drawback from caching ~/.opam folder between builds using ocaml/setup-ocaml@v2. It’s currently undocumented in the README: GitHub - ocaml/setup-ocaml: GitHub Action for the OCaml programming language and I tried looking around on other “popular” projects and haven’t seen it myself.

The basic idea is to restore cache with uses: actions/cache/restore@v3 with the ~/.opam folder with a key like: opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-${{ hashFiles('**.opam') }} and store this cache the same way after instalation.

Note: In the case of Windows, _opam/ instead.

I have a few questions here:

  • I haven’t seen any improvement over non-cached installations (do I need to tell opam somehow to be aware of the cache)?
  • Is there any problem with this approach?


1 Like

It’s not impossible, but there are two related problems which are quite tricky to solve GHA caching in its present form. You need to be able to update the cache at the some point, which means that the dependencies need to be part of the cache key. That runs a risk of ending up with lots of caches if/when dependencies update which can risk blowing the artefact storage allowance for caches. This could be mitigated by storing a differential cache - i.e. restoring .opam with the compiler (which is what setup-ocaml presently does) and then caching just the added packages, but I don’t think there’s an automated way to do this, so it’s quite a lot of work to set in place.

Incidentally, ocaml-ci solves these problems firstly by separating out the dependency calculation to a separate step (its Analysis stage) so that the exact opam packages to install as dependencies are known up-front (that means that the list of packages form part of the cache key) and it solves the layering problem by using a snapshotting file system, so the compiler is shared between all jobs on a given worker because it’s essentially at an earlier RUN stage of the job.

1 Like

When using the key opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-${{ hashFiles('**.opam') }} it takes into account a few things:

  • all opam files content to generate the cache
  • your current OS
  • the ocaml compiler version

Which will be invalid as soon as you change any dependency (or any description actually, but that’s fine), stored by OS and stored by OCaml --version.

As well, It’s pretty common to run an action to purge the cache each month or each week, depending on the activity of the repo.

Would that work?

Yes, sorry - I should have mentioned in passing above that you were doing that part!

The issue then is the cache sizes and actually getting updates - setup-ocaml is already caching the switch (a couple of hundred meg, I think) and caching .opam again will include that part. The other issue is that you’ll never get updates - so you need some kind of timing element in it (or, as you suggest, manually deleting the cache weekly/monthly). One possibility is to include, say, a week number in the cache key. ocaml-ci solves that by computing the oldest sha of opam-repository which gives the latest solution - that gets re-checked every time opam-repository receives a merge, but rebuilds are then only triggered if a relevant package has changed.

1 Like

Totally, if there’s the possibility to use opam.lock (haven’t use it in a while) as a key, you will get the updates when any local developer would install their enviroment and push to GH, which is a very desirable case.

Thanks for the fast answer @dra27

1 Like

Yes - using a lockfile should definitely work - the files are already checked out at that point, so it can be referred to in the hash as well. In essence, for a specific use-case it should be possible to improve caching with the action, it’s just slightly too risky for setup-ocaml to do that by default. If you have the time, I think a “FAQ” entry on how to improve caching would be a good addition to the README.

1 Like

Did you manage to get the caching to improve the performance of your actions?

On a project I’ve been working on recently (to be open sourced soon hopefully), we were able to get our action times to drop from 30+minutes to 5minutes by making use of the cache.

I’ve included the relevant snippet from the action we used:

      - name: Cache
        id: cache-opam
        uses: actions/cache@v3
          cache-name: cache-opam
          path: |
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/*.opam') }}

      - name: Opam & Coq
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
           opam repository add default --all-switches --set-default
           opam repository add coq-released --all-switches --set-default
      - name: Install Dune
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
          opam install dune
      - name: Install
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
          opam install . --deps-only

Actually, the hard work was done by an undergrad @mayank working with me, so I can’t really comment on whether this is actually a robust way of improving performance times.

I think the key step in reducing build times was to add the if check to the install steps to avoid running them when the cache was hit.

Yes, I have the exact same structure and I see an improvement over installing but there are times that I see “Nothing to do” and some times (cached) when installing all deps, so I was wondering if there’s something inconsistent there.