OPAM Experiment and future Developer Experience improvements

OPAM is the package manager for OCaml. I’ve heard the complaints by many that OPAM is not beginner friendly, a hassle to work with, and the contrary that OPAM is an unbelievably powerful, elegant package management solution. So I want to understand the sentiment that these people had. I sat down 3 interns and had them try to publish a cli that I wrote in Javascript, Rust, OCaml. Each person had never published a package but had working knowledge of using these languages in projects I’ve assigned to at work.

Results:

  • JS successful package creation: 3/3
  • Rust successful package creation: 3/3
  • OCaml successful package creation: 0/3

Average time it took to publish the package (includes the time it took to make an account on npm/cargo):

  • JS: 21minutes
  • Rust: 18minutes
  • OCaml: N/A (none succeeded, took but on average it took 47 minutes for them to give up)

This was a bit alarming to me. The feedback I got:

  • Documentation was confusing
  • Opam commands ie. pin, switch, source are confusing and unlike anything they’ve used in other communities. Developers especially failed hard on switch and pin. Looking for familiarity, ie. package, publish, install, unpublish semantics of other package managers

Also many were surprised that there was a review process to be admitted, and the need to fork a repository, many turned off by the ceremony that similar to submitting an app for the app store.

Hopefully this is insightful of how OPAM could do a better job at streamlining this process of package creation and distribution. I’m a little uncertain for why OPAM takes this approach of having to fork instead of just using a cli like npm’s or cargo’s. Also maybe for opam 2, using commands similar/familiar to users would be helpful(package, publish, install, unpublish).

I asked the interns what they wish all these package managers did:

  • Package up the libraries in a format to be consumable by others
  • Optimize library size for consumption
  • Optimize library performance

Rust’s Cargo somewhat does these already

I hope that the people who work on OPAM can try to improve the developer experience for people who might be interested in working on libraries in OPAM.

6 Likes

Actually, there exists opam-publish which does all the fork and github PR work automatically and provides a cli to publish packages. More generally, when I created my first package, I found the documentation for creating ocaml packages informative: it is accessible here and you get to it quite quickly with a quick google search for ‘ocaml opam’ and then following the Documentation > Packaging link.

I think it would be interesting to know what kind of information each intern managed to learn before giving up and what they didn’t, in order to get precise points that could be better. For instance, did they found the packaging documentation webpage ? what part of the package creating process did they understand or not ? did they reach the point of having a working opam file ? did they stop just short of proposing a github PR to the opam repository ? etc …

1 Like

The fact that such tool exists and isn’t baked into opam itself is wildly unintuitive.

All 3 interns found the page you linked to the Packaging documentation after searching on google. This is actually the first thing all 3 did. Most of them fumbled on pin and switch confusing for install and were confused about the OPAM file format. All 3 stopped short of making PR because they weren’t sure if they had the correct versions of deps installed when building a distributable version of the CLI. Also were confused as to how to use source command and whether the source of the package would be found. Most confusion came from terminology and ambiguity, which led to frustration on their parts.

2 Likes

I’m normally very sympathetic to this kind of criticism. Certainly, having gone to cargo’s homepage and npm’s homepage, it’s clear that their websites are more attractive than OPAM’s, and more usable as well. Cargo, for example, allows you to browse the packages by category, which is tremendously useful. However, I’m not sure it makes sense to measure the ability of neophytes to publish packages.

If you’re new to the language, it’s not clear that there’s much value in being able to publish packages. Instead, for this group of users, I would want to measure how intuitive it is to use OPAM for downloading packages, and I think OPAM is just as good as other options in this regard. By the time you’re ready to publish a package, you’re already invested in the language, and you know the process that’s required, or are willing to put in a little bit of time to learn it. I don’t think we want the kinds of contributions npm notoriously gets, where packages often involve nothing more than a couple of lines of code.

In other words, a little bit of a barrier to entry may be acceptable or even desirable in this particular case.

3 Likes

I agree it is surprising for it to be separate (though it is reasonable seing as it has additional dependencies compared to opam). That said, its usage is explained on the packaging documentation webpage, and these explanations come first (i.e. before the explanation about how to manually create a github PR), so I find it surprising that they didn’t try it.

I don’t quite understand the problem here (probably because I’m now accustomed to the way opam works). The opam file has a reasonably well documented depends field to specify the dependencies of a package, including dependencies on versions, as mentionned at the bottom of the documentation webpage. Does it mean they weren’t sure what other packages versions to depend on ? Or does the package have non-ocaml dependencies ?

I get the impression that there may have been a misunderstanding about how to distribute the package with opam. Did the interns try and build the cli executable on their machine and then distribute it through opam ? Or is it just me misreading ?

Another point that seems strange to me: opam pretty much only deals with source code. A package on opam-repository is not much more than a link to the source code plus the instructions to build it (along some other useful metadata) in an opam file. This is actually quite clear when you look at the contents of a PR that adds package, which in general consist of only three files:

  • a file containing the url where the source are to be downloaded from
  • a file containing the description of the package
  • the package’s opam file
    Do you mean that the interns weren’t sure where to specify the url for the package’s source when preparing the PR ?

This discussion is really the flip side of Serious OPAM package quality issues.

I recently made my first PR to opam-repository and was delighted that the reviewer was thorough enough to discover an issue. Users should only trust a repository of executable code if there is some kind of verification in place. An alternative to reviews is to incrementally establishing trust in publishers and individual packages after the fact, and this may be an increasingly common approach to regulation. But I can’t help thinking part of the mechanism at work is to make early adopters (read: unsuspecting users) guinea pigs. At least without access to really large number of users, propagating trust though social connections or known authorities seems more reliably, which may be the idea of the upcoming conex.

For newcomers, I would believe the most immanent issue would be to fint a way to distribute packages to themselves and colleges, in which case one can bypass the PR and create a personal opam repository.

2 Likes

I don’t doubt the value of reviews, but outside of OCaml, evaluating unreviewed packages is routine for many programmers. It has been for me, at least. You learn to use multiple criteria. It’s not perfect, but nothing is. (Depends on your needs, though. I don’t write banking or satellite launch code.)

Yes, I do too, and the fact that most of it is open source with full revision history, issue tracker, reverse deps, etc. makes a big difference. I think it’s a luxury to have review, but we should have a way to establish initial trust in authors. General code quality and support status can be accumulated after publication1, but if an anonymous author can post a package to a repo, that’s a security nightmare.

1 For OPAM splitting up the official channel into a testing and stable area may help avoiding damage to production environments, but we’d need a way to collect metrics if we want to factor out the review for the stable channel. CI is very nice, but does not replace human judgement on all points.

I would just like to point out that you didn’t test the working programmer experience here, but the newcomer experience. As such I find the title of your post and conclusions a bit misleading.

It is a common mistake among system designers to optimize for beginners rather than the working user; ideally both should be as well served but there is a tension (which is where all the interesting design choices lie btw.). As your introduction mentions we have anedoctical evidence that opam works very well for many working programmers which is already quite encouraging.

It seems we now have anedoctical evidence that it works less well for newcomers and we should try to streamline all this, including the documentation. However I’m not too “alarmed” by the result of this. Good software takes time to produce and as @bluddy mentions by the time you’ll be ready to publish something you might be a little bit more familiar with the surrounding tools.

In any case I do not want you to think that no one cares about these issues. In the platform project we are constantly trying to devise new tools and workflows to improve the eco-system experience and this includes lowering the barrier to entry to publish packages.

Here’s a working prototype worfklow that relies on unreleased experimental software (so there are a few extra steps below and still a few rough edges in the out of the box experience, this is just to show you). Assuming you are using github to publish your software the following takes you from project setup to versioned publication on the OCaml opam-repository:

opam pin add carcass https://github.com/dbuenzli/carcass.git
opam install publish topkg topkg-care
carcass setup # Answer the personal questions
git clone git@github.com:user/mypkg.git && cd mypkg
carcass body topkg/pkg . && git add . && git commit -m "First commit."
topkg tag v1.0.0
topkg bistro 
# You are done

But this still need a bit more work before everyone gets to use this. Note however that the topkg release workflow (see topkg help release) is already available for anyone to use (and the whole thing if you are not bothered dealing with unsupported software, I am so I generally advise against this).

2 Likes

opam maintainer here.

Thanks for the experiment, as pointed above it’s a specific case (beginners), but it gives useful insights nonetheless. Posting on the opam tracker woud have made sure I wouldn’t miss it.

It’s true opam publish is a bit confusing at the moment, but this is being worked on. I think the primary issue here is the packaging guide, which clearly needs a refresh. For example, it’s obvious your interns followed the instructions from the tl;dr as a sequence, while they are separate examples: maybe a big OR was missing, or that should focus on a single workflow. Why otherwise would they need the command to “get a local copy of an existing package and install from there” ?

It seems the “OR” between using opam-publish and manually forking is not clear either: separating into “short version” / “detailed version” of the guide could help.

Maybe we should just start with the different ways to publish: in their case, if they only want to share with colleagues, a git repo containing an opam file with build instructions is all that is needed, with the command opam pin URL to install on the other side.

I like to think about myself and colleagues as in the middle ground between beginners and working OCaml programmers, say “casual fluent programmers” :wink: . Meaning we know OCaml fairly, even quite well for some of us, we are rather fluent in functional programming, software engineering and even a bit of PLT, but we are not everyday working programmers. So we may program quite intensively for some weeks (to develop a proof of concept) and then not at all for months. For this reason, our personal experience with OCaml does not seem as lean as I think it could be (but tremendous progress has been made these past years, thanks to the efforts of a bunch of people, incl. yourself).

The main reason is that the developer experience relies on a lot of things to know that, I must admit, we forget all the time as we practice them twice in a year at most: setting up a building environment, understanding how to declare an opam package (this is long), how to publish it, how it works with Github PRs (yes: we use Git but are not used to Github, well once again no more than once or twice a year), know what to do when an opam publish fails, know about a standard way to set up tests and play them automatically, know how to handle versioning automatically (and have the executable contain the version tag)…

We wish there was a single, automated, tool to handle this (as long as the developed projects are not too exotic, of course). I see we’re not too far from this (cf. the workflow you’re showing), in particular thanks to your efforts and that of Ocamlpro, Ocamllabs and Jane Street (and others), but it’s still a bit rough on the edges and with too many things to learn. Well, at least for someone who prefers to spend his or her time designing and programming rather than setting up and maintaining a build/release environment :slight_smile:

I know this has already been said countless times, and that the people taking care of the topic know about it (and can even see its shortcomings) but having a unique command such as Rust’s cargo seems so relieving :smiley:

2 Likes