[ANN] opam-repository policy change: checksums (no md5) and no extra-files

Dear everyone,

the opam-repository policy just changed to not accept md5-only checksums, and also to avoid extra-files in packages (use extra-source instead).

NOTE: If you encounter issues during opam update, please make sure to have opam 2.1.6 installed, and gpatch (especially on BSD systems and macOS). This may break silently, if you encounter issues, please rm -rf ~/.opam/repo/default && opam update default See further notes in Possible breakage in opam update · Issue #25961 · ocaml/opam-repository · GitHub

What has been achieved?

These changes were automated using opam admin migrate-extrafiles and opam admin add-hashes (using the branch GitHub - hannesm/opam at migrate-extra-files). There is a utility to check that existing files and md5 checksums are still present in the new opam-repository GitHub - hannesm/opam-check-checksum.

Impact on users and developers

  • A lot of packages will want to be recompiled on opam upgrade (since checksum changed, extra-files/extra-source was modified) – sorry for the extensive use of CPU time
  • If you need to include a patch or an extra file for your opam package, you will need to host it elsewhere. You can host it using a gist (https://gist.github.com), or on your server. All the extra-source will be cached by opam.ocaml.org.

The reasoning for this change

Apart from making the mental model of “how does opam-repository work” easier (since there’s no more any files subdirectory which includes files that are added during the build), it also makes the approach to cryptographically sign the repository much smoother (since we can now rely on non-weak hash algorithms and don’t need to compute more hashes, and not need to add further hashes to the repository).

We needed to get both (weak hashes AND removing extra-files) through at some point, it has been done today.

25 Likes

FWIW, thanks to everyone involved, including @kit-ty-kate @mseri @raphael-proust @shonfeder, also earlier work by @AltGr and @cemerick. This has been a long journey (> 4 years), including discussions in https://discuss.ocaml.org/t/opam-repository-security-and-data-integrity-posture/ and Stronger checksums · Issue #17315 · ocaml/opam-repository · GitHub

I’m grateful we passed the finish line and can now realize further improvements. No need to look back.

10 Likes

Thank you for all your persistent work on this! It was a big lift, and we’ll all benefit :camel:

2 Likes

Thanks a lot to everyone involved. That’s a ton of work, both technical and to get everyone aligned.

I have a question regarding the layout of opam-repository. Now that there’s no files/ subdirectory, would it make sense to “flatten” the repository structure by storing all opam files for a package in a single directory? That is, moving packages/$x/$x.$version/opam to packages/$x/$x.$version.opam. I believe opam supports that out of the box, so I wonder if it has come up in the discussion (I think that “deep” hierarchy was meant for files/).

Thanks!

2 Likes

On another note, does anyone know why the directory structure is:

packages/$x/$x.$version/opam

And not:

packages/$x/$version/opam

?

The current format allows to group some packages on disk, which is convenient. For example we can have packages/frontend/x.version/opam and packages/backend/y.version/opam.

We use that feature to distinguish imports from the official opam repo, meta packages, internal packages, …

1 Like

I like that approach. I do think this is supported by opam, but that would need to be verified.

Also, it would require some new lint checks, but also updating dune-release and opam-publish (to avoid too much issues for the common workflow). Also, investigation would be needed if there are other tools that rely on that specific layout.

An underlying question is, and this should be tested: can we move these files and will an opam client (after opam update) reinstall these packages or not?

I for one, would appreciate removing ~32000 directories. So please go ahead and push this forward :smiley:

2 Likes

I don’t think it’s supported - the supported pattern (at least initaly) was packages/**/$name.$version/opam so you could have an arbitrary nested directories.

I for one, would appreciate removing ~32000 directories.

Is 32k files better than 32k directories? I don’t think Git like any of these options (as traversing a Git directory object is linear). It would probably be more efficients for Git (and Github viewer) to have a split like packages/aa/***, packages/ab/***, etc.

1 Like