Opam package decentralization

Hello,

I think having all OPAM packages centralized in: GitHub - ocaml/opam-repository: Main public package repository for opam, the source package manager of OCaml. creates a bottleneck. I don’t really see the advantage of this setup and would prefer a more decentralized approach.

What I do like, however, is the continuous integration (CI) system and the feedback mechanism you get when publishing packages to the OPAM repository. I believe that functionality should definitely be kept — but it shouldn’t necessarily require that all packages live in that central repository.

For experienced users, the review and approval process probably feels often unnecessary and just slows things down.

From what I understand, the only essential part for installing a package is its .opam file. For example, this URL always returns the latest release information:

https://codeberg.org/api/v1/repos/removewingman/restricted/releases/latest

Installing packages could then look something like this: opam install codeberg.org/removewingman/restricted restricted

Or something like: opam install git+codeberg.org/removewingman/restricted restricted

The idea would be to always use the latest tag. There might be other ways to achieve this — I like how Zig handles this with zig fetch, and I wonder if something similar could work for OCaml.

I know about opam pin, but it doesn’t quite fit this use case. Of course, I could write a wrapper that clones a repository before running opam install, but that feels like a workaround.

I personally dislike how much of the OCaml ecosystem depends on GitHub, but that is another discussion.

Have there already been any discussions about decentralizing OPAM in this way, or is something like this already possible and I’ve just missed it? What do you think?

As an experienced user I’m extremely happy about the review and approval process – thanks in passing to the opam repository maintainters for the unglamorous work they do on it. It takes time but it doesn’t just slows things down it’s an excellent QA and usage signalling tool.

For example for the cmdliner 2.0 release which was an exceptional breaking release, the centralized repository allowed me to pretest the release and warn each package that was going to be broken by that release. The opam repository is also routinely used by upstream to test some changes to see if some syntactic quirks are present in the wild.

Besides it also shows me how my packages behave on platforms I wouldn’t personally take the time to test on.

If everyone publishes their package in their corner all that is lost. We no longer have a large corpus of language and package usage to inform some breaking decisions.

Note that while the process is currently managed on github. There is nothing in it that depends on github. AFAIK there not is a single line of code in the basic mechanics of opam that relies on github.

If you want to live disconnected from the eco-system you can always publish your own package repository.

15 Likes

Reviewing Software repository - Wikipedia may help.

It sounds to me like opam pin fully fits this use case. Could you elaborate?

I share this sentiment. Indeed it is another discussion. One we should have imo (but elsewhere). I’ll just note that it is not an ocaml-specific problem.

It is impossible to adequately upvote this point. I feel precisely the same way, and the enormous value that the maintainers provide by forcing a certain level of -uniformity of quality- on all packages is just impossible to calculate. I could go on and on using superlatives, about this. the OPAM repository is not a mono-repo, and yet the machinery of CI makes it so much closer to one, and that means OPAM users can install packages pretty much without worrying about crazy conflicts – the sort that would come about in typical “you can release anything that builds” package systems.

It would be a massive step-down if this stopped being the case. A massive, massive step-down.

8 Likes

I think we all agree on centralization for safety, backwards compatibility and test breaking changes; but decentralization and a faster process for publishing and sharing code (without the same guarantees).

It’s often said that pinning solves the descentralization/publishing fast, but the experience and issues that they carry are far from the same you got by installing.

The work from the maintainers has been extremely valuable for me, but the chore of pushing a new library has been a barrier to share more code or even split libraries in smaller packages.

To add a bit more into my view, I think this blog resonates with “opam update” and opam-repository clones to publish Package managers keep using git as a database, it never works out | Andrew Nesbitt

As mentioned decentralization already exist if you want to, let’s not pretend it doesn’t exist. opam is not a centralized system, it is currently being used in this mode, you can change that if you want, it only depends on you.

So what remains is “fast”. When you publish a package most of people are busy with other things, they don’t care. You are not the center of the world, they will notice, perhaps say “oh it looks nice” and at best make a mental note to try it or upgrade to it later if they are using it. There’s little interest in being fast in releasing (unless security problems are involved – but for that you have responsible disclosure to provide leeway – or perhaps if you live in the Silicon Valley).

Now I find this idea of using repo URLs to specify packages rather dubious. Suppose your you move your project from github to codeberg. I doubt github will provide you with a long term 301 (if any). So congratulations, you just broke all of your (unknown) users for a problem that they did not have. They likely have more interesting problems to solve than updating the URL of your package in a dozen of their projects. A repository is a DNS, your code can move out of github and there’s a single place you need to update the reference and your users won’t notice. Not to mention that the day I want to use yourbeautifulpackage I rather just opam install yourbeautifulpackage than scout the internet for that golden random URL.

What the OP proposes is just terrible user experience. Basically it’s trading convenience of the publisher for the usability of users. At some point, if you want to share code, you need to decide whether you want to have users or not. In the latter case pins are mostly fine.

Not sure how it relates to this discussion. This is a problem of the data representation you use for your package registry which is orthogonal to what is being discussed here.

3 Likes

Proponents of decentralisation often seem to propose breaking a working, centralised system as part of that push, but this is unnecessary. You could instead work on experimenting, staging and extending alternatives, which – if they are better – will eventually supplant the current system. It is entirely natural that some of these experiments will fail to deliver benefits, but we don’t need all of them to do so. And in the meanwhile, there is no risk of breaking the workflows of thousands of other users that depend on the current status quo.

For example, l am currently experimenting with moving the bulk of my code to tangled.org, which builds on the ATProto ecosystem and has the lovely property of allowing me to store my git repos on my personal servers, but manage the metadata via a more social database. The hypothesis is that this is “just enough social” while allowing much more flexibility about geographical code placement and provenance for large repositories. I am documenting things as I go along on my blog:

There are interesting issues yet to figure out (such as did:web support for full decentralisation) but other users have also been also hacking on OCaml solutions such as this PDS: futur.blue/pegasus at main · tangled

I’d be delighted to read about other similar experiences people have with services like Codeberg, which I haven’t used myself. In the meanwhile, as Daniel points out, opam overlay repositories are very straightforward to put together. Some users might find it handy to try opam repomin which I published over the weekend, which is an opam plugin to crunch multiple opam repos together to find a minimum cut of packages for that one project.

1 Like

There’s little interest in being fast in releasing

You must be joking

Now I find this idea of using repo URLs to specify packages rather dubious

It’s opam’s repository job to handle renames, redirects and cache/proxying. In fact, once per week I get a CI failure because the setup-ocaml in Github uses your direct URL instead

Opam repository in my view could be a service (that’s the point I was trying to make with the linked blog post) replicated with Cloudflare KV and fallback to the github repo in case of Cloudflare being down.


I like the idea of atproto, that could also be a great solution. The idea of a server is also a good solution to not break any existing system.

Absolutely not.

So you are for a centralized repo :–)

What you disagree is how that repo is managed. But again if you disagree with that you can already change that yourself.

This has nothing to do with the opam repository, but how opam itself handles archives:

  • if you init opam via opam init (the default) using the http remote protocol, it’ll download archives from opam.ocaml.org for all packages.
  • if you init opam via opam init git+https://... using the git remote protocol, it’ll download archives directly from the url specificed in the opam file.

There’s some stuff in the newer opam clients that I haven’t had a chance to keep up with to address the difference, but figuring out a solution by which CI can reliably go to an immutable content-addressible store would be nice, especially for more recent opam client releases.

I would like to go further. If the cost of the infrastructure required to set up an OPAM repository is an issue, our cooperative has developed a unikernel that does exactly what opam.ocaml.org does (namely, deliver an index.tar.gz of all the packages available in ocaml/opam-repository): it is fairly easy (as we try to do at every Mirage retreat) to replicate the repository to save bandwidth.

The goal is not to replace opam.ocaml.org, but the distribution (and replication) of ocaml/opam-repository across multiple instances is fairly simple.

8 Likes

See this discussion: Proposal: membership and opam repository · Issue #45 · ocaml-community/meta · GitHub

Thanks for the link. In the linked discussion I had mentioned an idea for a repo-of-repos (see the full post in the discussion if you are intrigued). I still find the idea interesting but this sort of assumes that there are going to be a somewhat broad interest in “having-your-custom-opam-repository”. That could be too little represented in the community at the moment.

Apart from that, if you wanna hear the perspective of a user who do routinely use a custom opam-repository: this solves a lot of pain points that I had otherwise felt before settling on with this. For me it is not either/or but I actually enjoy using both my own repo and the public one in conjunction and for different things.

When I feel the need for speed, I have scripts over dune-release and opam-publish that will essentially publish a package and add the commit to my opam-repo in mere seconds. I can use the own repo in CIs as an alternate to pins. That works really remarkably well, and kudos to opam, indeed, a decentralized system used mostly in a centralized fashion (like it was said a few responses ago!)

In addition to for the more personal packages that are not meant for large dissemination, I also do use the custom repo as a rehearsal / draft place where I can publish packages, verify I can opam update and opam install them, do some other kinds of sanity checks. And then, have a command I can later use to “catch-up” and submit to the public repo one or more releases at once of a package that’s been released to custom only.

It should also be noted that this workflow has some rough edges here and there, but this is an area of active development (e.g. see better support for this workflow discussed on this opam-publish issue).

I could list other things I like about this workflow, and things that don’t work too too well yet, and I’m happy to discuss further with folks that are on the fence about trying this approach (e.g. on Zulip). Thanks!

4 Likes

Does anyone know about what might be required to get AWS CodeArtifact to support OPAM at parity with Java, Rust, Python and the like?