Cache with gh action `ocaml/setup-ocaml@v2`

davesnx · March 13, 2023, 11:42am

Hi,

I’m wondering if there’s any drawback from caching ~/.opam folder between builds using ocaml/setup-ocaml@v2. It’s currently undocumented in the README: GitHub - ocaml/setup-ocaml: GitHub Action for the OCaml programming language and I tried looking around on other “popular” projects and haven’t seen it myself.

The basic idea is to restore cache with uses: actions/cache/restore@v3 with the ~/.opam folder with a key like: opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-${{ hashFiles('**.opam') }} and store this cache the same way after instalation.

Note: In the case of Windows, _opam/ instead.

I have a few questions here:

I haven’t seen any improvement over non-cached installations (do I need to tell opam somehow to be aware of the cache)?
Is there any problem with this approach?

Thanks!

dra27 · March 13, 2023, 11:48am

It’s not impossible, but there are two related problems which are quite tricky to solve GHA caching in its present form. You need to be able to update the cache at the some point, which means that the dependencies need to be part of the cache key. That runs a risk of ending up with lots of caches if/when dependencies update which can risk blowing the artefact storage allowance for caches. This could be mitigated by storing a differential cache - i.e. restoring .opam with the compiler (which is what setup-ocaml presently does) and then caching just the added packages, but I don’t think there’s an automated way to do this, so it’s quite a lot of work to set in place.

Incidentally, ocaml-ci solves these problems firstly by separating out the dependency calculation to a separate step (its Analysis stage) so that the exact opam packages to install as dependencies are known up-front (that means that the list of packages form part of the cache key) and it solves the layering problem by using a snapshotting file system, so the compiler is shared between all jobs on a given worker because it’s essentially at an earlier RUN stage of the job.

davesnx · March 13, 2023, 11:54am

When using the key opam-${{ matrix.os }}-${{ matrix.ocaml-compiler }}-${{ hashFiles('**.opam') }} it takes into account a few things:

all opam files content to generate the cache
your current OS
the ocaml compiler version

Which will be invalid as soon as you change any dependency (or any description actually, but that’s fine), stored by OS and stored by OCaml --version.

As well, It’s pretty common to run an action to purge the cache each month or each week, depending on the activity of the repo.

Would that work?

dra27 · March 13, 2023, 11:58am

Yes, sorry - I should have mentioned in passing above that you were doing that part!

The issue then is the cache sizes and actually getting updates - setup-ocaml is already caching the switch (a couple of hundred meg, I think) and caching .opam again will include that part. The other issue is that you’ll never get updates - so you need some kind of timing element in it (or, as you suggest, manually deleting the cache weekly/monthly). One possibility is to include, say, a week number in the cache key. ocaml-ci solves that by computing the oldest sha of opam-repository which gives the latest solution - that gets re-checked every time opam-repository receives a merge, but rebuilds are then only triggered if a relevant package has changed.

davesnx · March 13, 2023, 12:07pm

Totally, if there’s the possibility to use opam.lock (haven’t use it in a while) as a key, you will get the updates when any local developer would install their enviroment and push to GH, which is a very desirable case.

Thanks for the fast answer @dra27

dra27 · March 13, 2023, 12:25pm

Yes - using a lockfile should definitely work - the files are already checked out at that point, so it can be referred to in the hash as well. In essence, for a specific use-case it should be possible to improve caching with the action, it’s just slightly too risky for setup-ocaml to do that by default. If you have the time, I think a “FAQ” entry on how to improve caching would be a good addition to the README.

Gopiandcode · March 13, 2023, 12:37pm

Did you manage to get the caching to improve the performance of your actions?

On a project I’ve been working on recently (to be open sourced soon hopefully), we were able to get our action times to drop from 30+minutes to 5minutes by making use of the cache.

I’ve included the relevant snippet from the action we used:

      - name: Cache
        id: cache-opam
        uses: actions/cache@v3
        env:
          cache-name: cache-opam
        with:
          path: |
            /home/runner/work/proof-repair/proof-repair/_opam/
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/*.opam') }}

      - name: Opam & Coq
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
           opam repository add default https://github.com/ocaml/opam-repository.git --all-switches --set-default
           opam repository add coq-released https://coq.inria.fr/opam/released --all-switches --set-default
      - name: Install Dune
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
          opam install dune
      - name: Install
        if: steps.cache-opam.outputs.cache-hit != 'true'
        run: |
          opam install . --deps-only

Actually, the hard work was done by an undergrad @mayank working with me, so I can’t really comment on whether this is actually a robust way of improving performance times.

I think the key step in reducing build times was to add the if check to the install steps to avoid running them when the cache was hit.

davesnx · March 13, 2023, 3:52pm

Yes, I have the exact same structure and I see an improvement over installing but there are times that I see “Nothing to do” and some times (cached) when installing all deps, so I was wondering if there’s something inconsistent there.

Topic		Replies	Views
[ANN] Set up OCaml 2.0.0 Ecosystem announce	0	831	March 2, 2022
GitHub Actions for OCaml: now stable and on the ocaml org Ecosystem opam , announce , ci	4	2670	June 20, 2022
[ANN] Set up OCaml 2.0.0-alpha Ecosystem announce	8	1373	May 25, 2021
Sunsetting opam-repository-mingw Ecosystem opam	0	1712	March 10, 2023
Best practices around using external libraries on opam? Ecosystem opam	6	1437	June 11, 2020

Cache with gh action `ocaml/setup-ocaml@v2`

Related topics