[ANN] Marking let-if and ssh-agent-unix unavailable

Dear all,

Recently I decided to mark two packages I published as unavailable. The first is let-if, a PPX that introduces let%if that mimics the if let construct from other languages. The second is ssh-agent-unix which provides some simple “Unix”-specific functions for my ssh-agent library. The reason is these two packages seem to be unused, and for let-if I think it shouldn’t be used, and for ssh-agent-unix I initially didn’t intend to publish this package (but dune-release insisted and I didn’t want to fight the tooling), and the code quality has not been questionable. In either case I don’t want to maintain it anymore.

Another reason is I think keeping every single package in the opam repository forever is unsustainable. Each package results in more data being transferred when running opam update, and results in more load when running the solver or searching packages. Ideally, I would remove the two packages altogether, but that doesn’t seem possible at the moment. At least this should hopefully result in slightly less load on the opam solver. Given how many times opam is run across many, many machines I think even a minute improvement has an impact.

2 Likes

opam packages cannot disappear from opam-repository.
Many people want that if something was installable at time t, it must also be installable at time t+N.

2 Likes

No package manager can guarantee that. If that guarantee is required, the only option is vendoring all the dependencies with the project.

3 Likes

I think there are 3 options:

  1. Vendoring (ex. git submodule)
  2. An opam lock file
  3. Using opam repository add time-t --rank 1 'git+https://github.com/ocaml/opam-repository.git#GITREF_AT_TIME_T'

Of course, those assume the git repositories of dependencies don’t disappear.

Anyway, the third option is available to everybody at time t+N, so IMHO reducing the t+N set of available packages sounds desirable.

2 Likes

Note that available: false does note remove the package, one can still list it and install it ignoring the constraint via opam.

Separately, please feel free to chip in to the discussion at Requests for comments: how does opam-repository scale? · Issue #23789 · ocaml/opam-repository · GitHub if you have some good ideas on how to keep everything manageable. It is something that was discussed in a very scattered way multiple times, but it is becoming more and more important as time passes. Probably we should not discuss it here, but we could open a new thread about it for example.

4 Likes

Sources of every opam packages are archived by software heritage FYI. Opam has the ability to look on SWH for sources that disappeared. It is not done now as we still have to patch the repository with the swhid of each package but we’ll get there eventually.

4 Likes

There are suggestions already in the GH issue @mseri references, but I’d add that removing the packages is not the only way of solving these issues, and several of them are solved/solveable already:

  • opam update only transfers the data for the very inefficient “http” remote. The Git remote doesn’t suffer from this problem (it’s a git fetch). I would very much like to see the Git remote made the default in the next major release of opam - another benefit of consuming opam-repository via a Git store is that it’s quite feasible to do without “checking out” any of the files (although opam does not take advantage of that at the moment), so one doesn’t end up with tens of thousands of files being extracted.
  • In the case of two leaf (no dependents) packages, these are already not putting pressure on the solver - the package universe is trimmed prior to solving.
  • opam-repository at the moment is just shy of 30000 package descriptions coming from ~4500 distinct package names. In terms of search, aren’t we a long way from needing to worry about the impact of volumes of data?
3 Likes

I don’t quite follow your conclusions, @dra27. I think this needs a deep look into opam-repository, maintenance hours spent on it, and involved CI systems. Together with expectations and maintenance effort of packages (if you ask me, none of the packages I maintain will get new releases for old versions).

git vs http

At the moment, opam does not depend on git. I don’t see value in “switching the default repository to git”. One reason is that github.com is filtered in some countries (e.g. Cuba). Another is that ‘quite feasible to do without “checking out” any of the files’ - this is something that can be done very easily with a tar.gz as well.

smaller opam repositories

I stripped opam-repository down by 75% (having 7000 instead of 30000 opam files), and voila my 8 year old laptop (X250) was way faster with all opam operations (this is opam 2.1.5). I don’t think, as mentioned in the issue referenced, that opam does scale infinitely. This may not be primarily the solver, but the number of files created which all need to be parsed (and then stored in a state file to avoid re-parsing).

Also, take into consideration the long time opam.ocaml.org needs for an update (due to opam2web iterating over all packages). Roughly 4 hours last time I checked on OCurrent Deployer – surely “integrating this with ocaml.org” is the hope some have that it will be much faster, but honestly - how can that be faster (and if it easily can be, why did nobody bother to update opam2web in the meantime with the enhancements)?

From what I heard, at the moment “removal doesn’t work since there’s something broken on macOS”. Something that would be great to have fixed in the next opam release.

Discussion

I’d be happy to discuss this with the opam development team (and the opam-repository maintainers) - with a view on current issues (CI systems, opam.ocaml.org updates, security, maintenance). When I release packages, and I see how old packages are failing, I feel really bad for each package that the CI has to build now, but which will never be installed by anyone else (e.g. old tls releases - who’ll ever install tls 0.9.2?) - why should we put computer resources and human resources on it (triaging failures, updating upper bounds - this takes quite some amount of time (at least for me, when I release packages that are used by lots of other packages)?

PS: I like [ANN] Marking let-if and ssh-agent-unix unavailable - #4 by jbeckford option number 3.

1 Like

In making the default delivery via Git, I certainly did not mean necessarily switching it to GitHub. In passing, I agree it’s possible to analyse the tarball without checking all the files out (in fact, I contributed to ocaml-tar many years ago now as part of doing that as an experiment when porting what was then OPAM 1.3 to Windows, doing exactly that…). My primary point was that the git fetch only ever sends changes on opam update as a consequence as part of the design of Git. Sending differential updates with the .tar.gz approach is harder to conceive, and definitely involves a lot more work in both infrastructure and opam itself to support that.

I also agree that opam performs better with smaller repositories - where I differ is that my response is to want to improve opam! (see, for example, investigations done by @emillon in Performance analysis: opam show and opam switch list-available · Issue #4245 · ocaml/opam · GitHub). Not that I’m ever against temporary solutions, but if we are after that elusive growth then fixing opam seems more important than a temporary reduction in the package universe, or we’re just kicking the can down the road. opam-repository presently has ~4500 distinct packages. crates.io has two orders of magnitude more distinct packages, npm has another order of magnitude more over that (actually, I couldn’t tell if its claim of package count is distinct package or versions). If today we can reduce from 30000 to 7000 by pruning old versions or the odd leaf package (mine included…), then brilliant, but what axe are we going to have to use if/when we have 30000 distinct packages in opam-repository?

opam2web is performing three functions - one is generating the index.tar.gz update, which is a very small part of the update, generating the fall-back download cache, which is slightly slower but IIRC not that much slower, because it only has to download new package’s files. Most of the time taken is dominated by generating the opam.ocaml.org legacy part of the website. I don’t know what the plans are on the website side, but having a Git remote (whether served as https://github.com/ocaml/ocaml-repository.git or, much better, as 'https://opam-repository.ocaml.org`, etc.), that part of opam2web is largely redundant.

Well, without wishing to make it sound so hard that nobody would want to step up and assist, it’s also not that trivial (there is a tracking issue for the migration as well - Retiring opam2web in favor of ocaml.org · Issue #227 · ocaml-opam/opam2web · GitHub). Regardless, and I’m sure it’s not intentional, but please could we avoid language that suggests people being lazy? It’s tremendously discouraging when one pours much of one’s life into trying to improve things to read suggestions that one isn’t bothered/bothering.

3 Likes

I think Nix guarantees that.

Nope, it doesn’t. Easy to look this up 404 error with `fetchFromGitLab` · Issue #48215 · NixOS/nixpkgs · GitHub

Sorry to hear that you find my sentence suggesting people being lazy. I’m not a native English speaker.

My intention was to highlight that opam2web does an incredible amount of work, and I don’t quite understand how “move to ocaml.org” will speed it up. I’m aware of the tracking issue, and read through it without finding anything that relates to performance.

Thanks for your experiments, and your contributions to ocaml-tar, highly appreciated. Would you mind to point to your commits where you use ocaml-tar instead of executing tar (and avoid unpacking the index.tar.gz) - and even better at the results of that experiment?

Thanks a lot for your work - and as mentioned in my earlier comment, I’m keen to have a constructive (video-)meeting about the current pain points, experiments conducted, and the future. I’ve not much clue who’s “in charge” of opam these days, seems Raja and Kate are doing a lot of work - neither who’s working on opam-repository (mostly Marcello it seems). But of course, everyone with experience and experiments and knowledge and interest should be invited to participate.

Just to be clear, my entire goal since roughly a decade is to attract more OCaml users - mainly in terms of people using applications written in OCaml - where distributing binaries and having a concise security story is crucial. And when I hear pain points too often, I step up with the goal to fix them. At the same time, I sometimes see lack of libraries and develop them. Clearly, my focus is biased.

1 Like

If we are looking for performance, I don’t think tar is a very fast key-value store, if that’s what people are using it for (is it used in opam on windows?)…