Extending Dune With Package Management

chrisnevers · April 28, 2020, 10:28pm

After looking at cargo and cabal, I was wondering why OCaml’s build system and package management tools are fragmented. If I recall, Dune wraps odig for documentation generation. Is there any effort for Dune to “wrap” or make use of opam for installing project dependencies? As far as I know, Dune can tell me all the dependencies I don’t have installed, but I have to manually install them with opam?

Chet_Murthy · April 28, 2020, 10:40pm

Since I’m not a dune user, I don’t really care too much, but … I think it’s a dangerous step to start down the path of the build-tool automatically installing dependencies. That’s the direction Java went in, and it’s been a resounding failure: both from my own experience in the 21st century COBOL galleys (J2EE) and what I hear from my friends who toil there today.

I think it’s already a great thing, that dune informs you that there are deps missing, and tells you (in cut-and-paste-able form) how to remedy them. But it would be dangerous to go further than that, I feel.

c-cube · April 28, 2020, 11:14pm

Cargo does that, when asked to build a project, and people seem to
generally love it. Npm too, but that’s probably more controversial.

What’s the problem with downloading dependencies if they’re only
affecting the current project (nothing global)?

Chet_Murthy · April 28, 2020, 11:31pm

TL;DR Flexibility is a great thing. But too much flexibility is a rope, and it inevitably gets coiled around your neck at the least-opportune time.

My experience (and esp. that of others also) with Java, as well as with Ruby, is that

when dependencies are automatically downloaded, unless there is a team-wide way of controlling precise versions, things get rapidly out-of-control, with different developers (and different machines owned by the same developer) having different package/version collections. It’s hellish when it comes time to debug or share code.
opam already provides a way to set up multiple independent ‘sandboxes’ on a single machine, so if you want to have different sets of packages for different projects, it’s already easy to do.
a “dune project” is too small a granularity to be providing this sort of flexibility, and almost all the flexibility you desire is already provided at the opam level.
[though I think dune already does a pretty good job of this, maybe if] dune could provide more-comprehensive reporting of what needs to be installed. Think of the way that debian’s package-builders output Build-Depends. Or the way Perl’s MakeMaker scripts do the same.

c-cube · April 29, 2020, 12:05am

For sure, it’s a bad thing to do without lockfiles, and a good one to do with lockfiles. But cargo has lockfiles and so you get better reproducibility than with the typical opam workflow. opam 2.1 might improve on that situation by also making lockfiles easy to use for us OCamlers. Other communities have the answer.

Chet_Murthy · April 29, 2020, 12:07am

agreed: lockfiles for opam will be nice. But still, I think that having downloaded-package-sets on a per-project basis is too much flexibility, when that flexibility is already available at the “opam sandbox” level.

Whatever: I’m a Makefile guy, so none of this affects me.

yawaramin · April 29, 2020, 12:32am

Opam already has lockfiles. What it’s missing is a shared package and build cache across projects. If it had that, it would save users a lot of time and disk space. Esy provides that and makes package management a breeze IMHO. For even more advanced package management and build caching capabilities some users (OCaml, Haskell, other communities) even turn to Nix, which is pretty much the state of the art.

Actually, one more thing that Opam is missing now is, I think, making lockfiles the default workflow. Right now I think it doesn’t automatically manage lockfiles in each project. You need to update the lockfile yourself every time any dependencies change. (Someone correct me if this has changed since I checked.)

This not being the default workflow–it effectively means that it won’t get done.

Chet_Murthy · April 29, 2020, 1:12am

A couple of years ago, I did a significant project in Node.JS. It was blockchain-based, so used a -bunch- of different projects from all over the place in JS. Each project came with its own lockfile, and it was a bit of a rat’s nest untangling it all. I’ve had similar experiences with Golang and its vendor-branch stuff.

The problem with all these approaches is that they treat the -project- as the unit of granularity, and that is far, far too small. Each project gets its own entire set of external deps, and that allows developers to be lazy about making sure their code has the minimal set of dependencies on external behaviours. Usually when I’m developing, I’m working on a number of projects, usually one of them is the focus, but others I make small modifications to, e.g. bugfixes, fixups. And so it’s good to have them all supported by the same set of foreign modules – the same opam lockfile, in a sense.

other communities) even turn to Nix

I’m actually a little surprised by this. Do developers need to install new packages so frequently, that build-time is actually a problem? That’s … astounding, really. I’m hacking pretty hard right now, and maybe need to install/reinstall packages … a few times a week? Seems like a lot of extra moving parts, for not much gain.

I’m reminded of when a certain large Internet company started using bittorrent to distribute executables to their fleet. The guy who did it apparently spent the first weekend redeploying the app … like … every few minutes, trying to fix some bug. The extra flexibility was irresistible, and he sure used it to coil some rope around his company’s neck. Nothing bad happened, but it could have, and he was certainly irresponsible.

yawaramin · April 29, 2020, 1:53am

Yup, this makes sense, and the JavaScript community has evolved tools like Lerna and Rush to help with this problem. Maybe other package maangers will evolve in a similar direction.

Yup, very much a problem, because we are constantly running builds and tests in CI. Most reasonable projects–every time you send a pull request, or update one, it sets off a build and a series of checks. You would like that to be as quick as possible. If it uses a good build cache it can bring it down to a couple of minutes.

But even if that weren’t the case, we all benefit if all builds, even cold builds, are super-fast. From moving around across different projects, to checking out different branches, updating dependencies, and a bunch of other tasks–all of these can be hugely benefited by near-instant builds.

Chet_Murthy · April 29, 2020, 2:14am

You’ve adduced two different scenarios: fast CI and “fast builds”. And I’m unconvinced that either is really dependent on extremely-flexible and fast “external package installation”.

(1) To dispel with [sic] the issue of “fast builds” for devs, I guess I’ll have to trust you, that devs actually are [un]installing, {up,down}grading external packages all the time, all. the. time. And that they’re doing this in a single opam installation, instead of having multiple such installations going for different projects that need different versions of external dependencies. I find that to be a bizarre way to work, and the existence of systems like “virtualenv” for Python tells me that I’m not alone in this.

There’s a difference between “fast setup of environments of external dependencies” and “fast builds”. To conflate the two doesn’t seem all that useful. Once upon a time, we thought we had to do that, because we could only have one environment at a time. But now with copious disk space, SSDs, and virtualization, that shouldn’t be a problem.

(1) It’s not clear to me how a per-project lockfile and set of external-deps helps speed up CI. Now, a build-cache of those deps would indeed help, but unless it’s done really carefully, will introduce version-skew, and that’ll be death. Furthermore, such a build-cache would be -independent- of a per-project set of external-deps. You could imagine that opam could build+install, but could also build the equivalent of a DEB/RPM, and that later it could install from a cache of such DEB/RPMs, instead of doing build+install. IIRC there was talk of doing that, in the past.

Further thought: for a while I was doing a ton of hacking on Thrift in C++ and other languages, and they have a pretty good CI setup over there. They use docker containers that get prebuilt with all the external deps, and those containers are used to run the builds. I didn’t see this as particularly burdensome; indeed, it meant that when I ran those CI builds on my workstation, I could use the same docker containers, thus preventing version-skew. I don’t see why this can’t be done for Ocaml projects. For some languages, the required external dependencies weren’t available as DEBs, so it required some actual builds to be run, to setup the container. Again: all got done once, and those container-build scripts were upgraded whenever the -prereqs- of the project (Thrift) changed, never otherwise.

yawaramin · April 29, 2020, 2:42am

Not necessarily even ‘all. the. time.’ Even if you do it just once in a while, changing a dep and then having to wait for a ton of deps to get rebuilt can easily take you out of your flow. Flow is valuable. Distractions abound. I’d like to get my work done with minimal distraction.

It’s quite normal in my experience. You’re working on FeatureA, then you get a bug report suddenly and track it down to some release branch, which has to be fixed, and can easily have a different set of dependency versions, because of semantic version ranges. Now obviously this is not normal in the opam world, because opam makes this kind of workflow very painful. But with package managers that can reproduce a project’s dependency set fairly reliably, this is quite common.

There’s a grey area between them, which is–fast download of a cache of build artifacts. With this, your build system can leapfrog the slow process of building base dependencies from scratch.

You should check out Esy (linked above). It reliably solves these problems.

This is a more coarse-grained and heavyweight version of what I’m saying package managers can do for you with a proper build cache implementation.

cemerick · April 29, 2020, 2:55am

After you install new or updated packages, just run opam lock, and it will pave over your lockfile. Handily, the relevant portions of the lockfile are alphabetized, so the resulting diff is easy to double-check if you want.

(Or maybe you were meaning something else, like updating the lock file as part of package installation…)

yawaramin · April 29, 2020, 3:04am

Yup, exactly. If the tool doesn’t enforce it, it will get missed. Over time, the tendency will be that it just won’t get done.

Chet_Murthy · April 29, 2020, 3:11am

Not necessarily even ‘all. the. time.’ Even if you do it just once in a while, changing a dep and then having to wait for a ton of deps to get rebuilt can easily take you out of your flow.

Such work (changing the dependencies of a project) is very rare: if it weren’t, most of software would be chaos. Also, in any real software organization, I think that sort of thing is tightly controlled: again because it produces chaos, but also because the set of external dependencies is precisely also the set of code that the company has the least control over, and thus it will want to ensure that that code changes in the most well-controlled manner.

It’s quite normal in my experience. You’re working on FeatureA, then you get a bug report suddenly and track it down to some release branch, which has to be fixed

Unstated, is the assumption that you’re going to do this work on a release-branch defect, in the same “environment” as you use for your dev-branch work. I don’t see why that would be the case. Heck, I don’t see why I wouldn’t have multiple checked-out copies of the entire set of projects I work on – one for release, maybe one for really experimental work, and one for my dev responsibilities. Sure, the first time I have to switch from dev work to that release-branch defect, it’ll take some time; but every time after? The environment will be there, waiting. I mean, who even uses a single git repo per project these days? Sure, you can “stash” your changes, switch branches, etc, etc. But what about files that aren’t yet checked-in? A second-or-nth copy of the repo is no harder to keep up-to-date, and means you don’t have to actually “set anything aside” – just switch to a different set of windows.

And again, I point out that this is precisely for Python’s virtualenv is for.

This is a more coarse-grained and heavyweight version of what I’m saying package managers can do for you with a proper build cache implementation.

Perl and Python (among others) have build-systems that interface well with the operating-system. One can install a DEB or get the same effect by building-from-source. Why isn’t this sufficient? OS-level package-managers have been dealing with this sort of issue (as well as many others) for decades; why reinvent all of it for each different programming language?

I’ve been here before: watched Java try and fail to solve this problem, multiple times, because it wanted at all costs to -avoid- using the OS-level package-manager.

It is true that this doesn’t solve the problem for the individual developer. But as I noted, Perl/Python packages can be installed by dpkg/rpm, -or- by a source-level build. For a developer working on a defect, they install from source; for CI, external deps are installed by dpkg/rpm.

XVilka · April 29, 2020, 4:51am

Cabal is a poor example to be honest. Haskell community struggled a lot with older (“v1”) and newer (“v2”) concepts, thus invented Stack and Stackage. Still, often dealing with building and dependencies in Haskell world can be cumbersome. This is why Nix becomes very popular among their community. Cargo, on the other hand, is easier and more enjoyable to work with. We still have to see how this will play out in the long run though.

orbitz · April 29, 2020, 5:07am

FWIW, I don’t experience this problem when developing, but my CI runs take much longer than I think is desirable because they always start from scratch, and I require a CI run to merge.

Chet_Murthy · April 29, 2020, 5:09am

I completely agree that speeding up CI is important. And this is a good argument for opam to learn how to produce DEBs, so that one could rapidly install an entire suite of switch+packages for running a CI suite.

orbitz · April 29, 2020, 5:20am

If by DEBs you mean Debian packages, those don’t help me too much. I perform CI runs across multiple OSs. I don’t necessarily care what package manager is used but I do care that it’s not a massive effort to add a new OS to my list.

Whether or not opam needs that intelligence is up for debate. I remain sad Nix didn’t push hard to become the universal package manager since it solves so many problems every language-specific package manager invariably runs into.

Chet_Murthy · April 29, 2020, 5:29am

It seems like you’re saying “because it’s too much work to support multiple OS package-managers (which really means just DEB&RPM), we need to invent another non-OS-level package-manager and use that” ? That approach was tried before (specifically in the Java world) and it didn’t end well: it means that instead of there being one consistent way to install software on a machine, there were two passes (first the OS, then Java) and sometimes more than two (b/c some Java subsystems had their own special sauce …)

-A- solution is to ensure that the package format emitted by the build-system can be converted to all the desired OS-specific formats. That actually isn’t that hard to do for packages of software that don’t need to OS-subsystem-level configuration. About the only detail is interacting with ld.so’s configuration.

Also, is your software delivered as source or as binaries? If as binaries, then don’t you have to package it to deliver it on the various OSes? And if you don’t, it seems to me, your users will have to do that themselves. [Obviously this is my opinion, so] OS-level package-managers really have won the argument when it comes to deploying. And since all testing (specifically, CI) is meant to give us greater certainty of what will transpire in deployments, it seems like CI should use OS-level package-management wherever feasible.

orbitz · April 29, 2020, 6:08am

which really means just DEB&RPM

I’m using FreeBSD and Alpine mostly, so no, it means more than that. If one believes DEB & RPM are basically all one needs then I can see why you would come to the conclusion you did, though.

we need to invent another non-OS-level package-manager and use that

I’m not quite sure what you mean by this. I was bemoaning that Nix, which handles pretty much everything, and does a very good job at it, isn’t the underlying technology of all of these language specific package managers. I wish most OS’s used it as well. Nix already exists and it is used for example in Haskell. And Nix existed before opam. So I am possibly saying the opposite: we should stop inventing non-OS-level package managers.

An issue I’ve come across at some point in language-specific package managers is language heterogeneous projects are just an absolutely terrible experience. And each of these language specific package management tools + build systems are fighting against each other and it’s really miserable. This is one reason why I, like you, tend to stick to Makefiles despite all their weaknesses.

I realize this vision is not attainable. But my statement was simply that I don’t know if opam should bear the responsibility for making OS-specific packages. And whatever that thing that does it ends up being, I hope it’s not a PITA to add a new OS.

Also, is your software delivered as source or as binaries? If as binaries, then don’t you have to package it to deliver it on the various OSes?

I’m not delivering software to customers in this case so I can get away with being pretty hacky. This original thread was in response to build caching where you proposed building DEBs was a solution to that. I don’t know if caching build dependencies is the same problem as delivering software to costumers.

Also, is your software delivered as source or as binaries? If as binaries, then don’t you have to package it to deliver it on the various OSes?

I’m not sure this is true. I believe there is a difference between building software and installing it. Considering that I might be working on projects using different, incompatible, versions of libraries, requiring them to be installed in my OS seems inconvenient (unless you’re using Nix). And if you want to draw a distinction between developing on your desktop and what the CI is doing: the CI doing something different than my desktop has a lot of issues on its own.

The end result being an OS package makes sense to me, but not the dependencies. This is one reason we have opam as it is, isn’t it? Often OS’s have old versions of ocaml on them so we have opam switch. Being able to jump back and forth between versions of libraries is difficult as well. Keeping OS’s up to date with packages is hard so we have opam packages. The amount of diversity in OS package managers and best practices seems so large that doing this all the way through the stack seems like a lot of work with marginal benefit. But perhaps I’m being short sighted.

Topic		Replies	Views
Defining standard OCaml development lifecycle processes Learning	32	4364	April 19, 2021
[ANN] Dune Developer Preview Updates Ecosystem opam , announce , build , dune	60	5517	April 8, 2025
Explorations on Package Management in Dune Community announce	31	3085	July 20, 2023
Should opam use a lockfile by default? Community opam	23	3554	October 15, 2019
Portable External Dependencies for Dune Package Management Ecosystem announce , dune	22	1024	July 13, 2025

Extending Dune With Package Management

Related topics