[Conversation starter] The terrible economics of package registries and how to fix them (FOSDEM 2026)

The title is a bit dishonest as it doesn’t really tell how to fix them, but at least i think this talk has the quality of being a great conversation starter.

As we are all aware opam-repository also suffers from all of these problems modulo size, and while parts of this have been discussed profusely here before, i think it would be nice to have an economics focused discussion featuring the interested parties (opam-repository maintainers, infrastructure maintainers, OCaml Software Foundation, Companies with available founds).

I’m not currently available to bootstrap this, but hopefully someone reading this is.

13 Likes

Can I say, first, thank you for posting this. Second, wow wow wow, Michael Winser does a great job in that talk. Everybody should watch it. It really is great.

Third, I would very, very much like to read the discussion that might emerge from people watching that talk and commenting here. It’ll be fascinating.

And last, again, wow, that was a great talk to watch, and thank you -again- for posting it.

2 Likes

What is the current cost of opam infrastructure? Are there any figures around? How this cost or the underlying metrics (e.g. number of monthly downloads) is growing?

What could be decentralized? Can a peer2peer protocol reduce some parts of the infrastructure, e.g. on download servers? I understand that some parts (building servers, CI infrastructure, manpower to run all that, …) cannot be easily distributed.

Beyond the economics of distributing packages, what’s concerning me is also the trust we can have in those packages: when I download a random opam package, can I introduce a security issue in my code?

I feel scalability and security are two sides of the same coin and should be probably tackled together.

3 Likes

It’s worth watching the talk. he discussed -at length- download bandwidth, caching, etc. And how that’s just not a problem. It comes up repeatedly in the Q&A, and that causes him to really insist on the point, with the chiming-in of other maintainers in the audience. For a little taste: he puts up a leaderboard (haha, borrowed from “Family Feud”) of all the things that these repository maintainers work on, and the priority order in which resources get allocated to that work. And …. “security development” is, literally, last place on the list.

That, I thought, was a great example of the problem he’s trying to get at. To be clear, he has no solutions: he’s diagnosing a problem, and that’s valuable b/c before you can search for a solution you have to know there’s a problem that needs solving, what its impact is, etc.

I wonder if the OPAM maintainers could comment on that leaderboard.

1 Like

Opam is actually already decentralized. See this thread Opam package decentralization

Can be, but isn’t for the vast majority of non enterprise users.

Some good news in that regard in the ocaml world:

  • conex development is active again
  • discussions are happening with the opam team to focus on some security features
  • Gabriel is actively using the ocaml software fundation to support the opam repository maintainers, so that we have a robust team of trustable people to review new submissions

note that there are actually two “opam” teams. One is working on GitHub - ocaml/opam: opam is a source-based package manager. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow., and one on GitHub - ocaml/opam-repository: Main public package repository for opam, the source package manager of OCaml.. Actually there are even more than that, as different people are taking care of the infra for the CI used by the opam repo. Which makes the auditing and distribution of resources potentially a bit more complicated.

4 Likes

Thanks for the link to the talk. This was very interesting.

My perspective is:

5 Likes

I know I’m not one of the interested parties, but I’ve been thinking for a long time that opam-repository seems to have further pressures than other language-specific registries. opam’s is the only one that I know of which:

  • Manually approves each published version,
  • Discourages upper bounds upstream, then spends time retroactively adding them as conflicts arise,
  • Runs CI to find these conflicts,
  • And feels the need to periodically announce the archival of old versions to reduce its burden.

This seems from the outside like a lot of maintenance time and money, consequence of running a repository for OCaml like you’d run an OS distro package manager. And I understand how opam got there, but I also think it’s inherently untenable for a growing library ecosystem, and way beyond the expectations of new users.

Maybe it’s excessive to apply opam’s workflow for the entirety of OCaml’s ecosystem. Compare with Go’s, which chose the minimal version selection strategy: Manifest files only support lower bounds, and contain enough information to transitively recover a dependency graph, resolve the highest lower bounds and download them directly from upstream, all without a package index.

So they don’t maintain a repository at all, despite Go having a lot more resources! The remaining features of a registry are performed by cache proxies, and Google’s proxy (there’s the bandwidth) builds the official package list and docs as download requests come, without any expectations about curation or long-term archival.

3 Likes

I skimmed quickly through the lengthy discussion so I might missed something, but in my original post, the idea of peer2peer / decentralization what to reduce bandwidth need. In the post you mention, decentralization is more about having several packages providers. Like others, I think having a central repository with Quality Checks on packages is a big bonus… with a cost.

1 Like