In connection with another thread discussing the fact that Bitbucket’s closure of mercurial support had affected the availability of around 60+ projects’ published versions, I learned of a number of facts about how the opam repository is arranged, and how it is managed that are concerning.
In summary, it seems that opam / opam-repository:
- Never retains “published” artifacts, only links to them as provided by library authors.
- Allows very weak hashes (even md5).
- Allows authors to update artifact URLs and hashes of previously “published” versions.
- Offers scant support for individually signing artifacts or metadata.
To make things concrete, without plugging the above (and especially items 1-3):
- the availability and integrity of published libraries can be impacted by third-party hosting services changing or going offline (as in the case of the Bitbucket closure)
- the integrity of libraries can be impacted by authors non-maliciously publishing updates to already-released versions, affecting functionality, platform compatibility, build reproducibility, or all of the above (anecdotes of which were shared with me when talking about this issue earlier today)
- the integrity of libraries can be impacted by malicious authors publishing updates to already-released versions
- the integrity of libraries can be impacted by malicious non-authors changing the contents at tarball URLs to include changed code that could e.g. exfiltrate sensitive data from within the organizations that use those libraries. This is definitely the nuclear nightmare scenario, and unfortunately opam is wide open to it thanks to artifacts not being retained authoritatively and essential community libraries continuing to use md5 in 2020.
Seeing that this has been well-established policy for years was honestly quite shocking (again, in comparison to other languages’ package managers that have had these problems licked for a very very long time). I understand that opam and its repository probably have human-decades of work put into them, and that these topics have been discussed here and there (in somewhat piecemeal fashion AFAICT), so I’m certain I have not found (nevermind read) all of the prior art, but I thought it reasonable to open a thread to gauge what the projects’ posture is in general.