Explorations on Package Management in Dune

The OCaml Platform team is excited to announce that we have started explorations to add support for package management in Dune.

This joint work led by the Dune, opam and opam-monorepo teams, started a few months ago with discussions on how to address what is one of the most important pain points reported by the community. This is the continuation of the focus we had in 2022 on prototyping new developer workflows and we’ll explore new workflows this integration enables throughout 2023. In particular, we’ve been developing similar workflows with opam-monorepo and we’re building on our experience of the past 3 years to explore how to implement these workflows in Dune.

As we want to involve the community as much as possible and start gathering feedback early, we wrote an RFC on GitHub that lays down a high-level overview of the different features.

The project is in the prototyping stage, so the goal of the RFC is to invite feedback and discussion from the community, rather than serve as a definitive spec of package management in Dune. We will be opening separate RFCs for the different parts as we continue our explorations.

If you want to follow our work, you can look at issues and Pull Requests tagged as package-management in Dune. You can also have a look at the dev meeting minutes for Dune and opam.

It’s the beginning of quite a big project, with many people involved, so I want to take a moment to acknowledge the contributions of everyone. The initiative is spearheaded by Tarides and the Dune, opam and opam-monorepo development teams are leading the project. We are also grateful for the generous funding provided by Jane Street, whose support is instrumental to continuously improve the OCaml Platform.

Happy coding!

27 Likes

Can we add this to the project as well? :slight_smile: Automatically add opam dependencies to dune-project file · Issue #7449 · ocaml/dune · GitHub

2 Likes

That looks like a useful feature - the confusion between packages, libraries and modules is a big pain for beginners, and inferring the package from the library sounds like a low-hanging fruit to help with this.

It seems independent from package management support though, so I presume the Dune team will prioritize it during the dev meetings.

It’s a fine feature to add, but it only really works with packages built by dune and therefore respect our lazy loading scheme. To make it work in general, I’m afraid we’d need to ask opam which package the library belongs to. Once we can build packages ourselves, it would not be necessary to ask opam, but it would still mean that we need to build every single package int the lock file to answer which package provides a particular library.

TL;DR: I have question about your medium/long-term vision regarding the opam client. Is the idea to keep it available as a separate packagement-management solution (with continued support in Dune), or to deprecate it in favor of “dune pkg” subcommands?

If I understand the RFC correctly, the “Dune Package Management” plan includes reimplementing large parts of the current opam logic within Dune (not just calling opam through its existing API), such as:

  • Managing a cache of the opam repository data.

  • Solving dependency constraints. (PR)

  • Downloading repository archives. (PR)

  • Executing the package build, installing and in particular caching the build artifacts. (Caching is new and it requires on fine-grained provenance tracking / incrementality as Dune already has logic for.)

  • Configuring the build commands to locate installed packages.

I suppose that the main reusable ingredient is the parsing and programmatic interpretation of .opam files, and the main ingredients that you would not neeed at all is the handling of global switches (reusing Dune’s workspace logic instead), pins (native vendoring solutions), and most lower-level layers such as system interface and job scheduling.

Once all of this is ready, Dune users (which is most OCaml developers) will have the option to use “dune pkg” instead of the opam client (or esy) for their projects, and get various nice features programmed on top of it – direct vscode integration, better caching across separate projects, faster builds from scratch, etc. On the other hand, “dune pkg” will not create a local switch in a format that the opam client can understand: for each project on my machine I will have to use either dune pkg or opam to manage my development environment, but I cannot use both at the same time.

The question for the dune, opam and opam-monorepo teams: I suppose that, during a transition period, you would have to maintain both the opam client and the dune package-management logic, and support both approaches inside Dune (the logic to locate installed packages will likely differ in both modes). But what is the medium/long-term plan moving forward? Are you planning to:

  1. Keep both options available as first-class workflows (at the cost of extra effort to maintain two implementations), in particular for the benefit of the few (sometimes large) organizations that are not using Dune today, or for people who would stick to the existing workflow of using separate tools for their build system and package manager?

  2. Or is the plan to deprecate the opam client in the medium/long term and ask all users to use “dune pkg”?

How long should we expect Dune to remain compatible with development environments setup using the standard opam client?

4 Likes

In general, the bar for breaking existing workflows in dune is very high.

There’s no plan to remove or even deprecate any old functionality. The code is modular enough that maintenance isn’t going to be much of an additional burden.

1 Like

What will happen with the opam itself? And the opam packages repository? It’s unclear from the RFC

My understanding from the RFC is that the opam-repository will remain unchanged. There would be no benefits to migrating to a different format, and there is a lot of tooling around it (continuous integration, etc.) that would be burdensome to adapt.

There is no plan to deprecate opam client. Both tools will coexist, in order to give a better experience to all users. From the RFC, there is no plan to have dune package management compatible with an opam install. As you pointed @gasche, there will be 2 main workflows on a given project:

  • Only use dune, for build & package management.
  • Use dune for build, and opam for package management to have more functionalities.

There is still some project/people that don’t use dune, they still need opam for package management. A lot of CI things too.
Besides, as opam is agnostic (only the default configuration is bound to OCaml), some people uses it to manage non-OCaml packages. It’s minor, but it exists.

Keeping both options available as first-class workflows does not require such extra effort than today. We need to keep synchronising on some fundamentals, and each project can continue to evolve.

1 Like

Thank you @gasche for your interest and input!

Expanding on what @rgrinberg and @rjbou mentioned earlier, there are no intentions to phase out the opam client. As a matter of fact, the Dune team has currently forked opam and is patching the opam libraries, with the ultimate goal of merging them back upstream once the libraries exposing the necessary APIs for Dune package management appear to be stable. You can see some of this work in progress in pull requests like #5568, #5508, #5498, #5496, #5452, and so on.

As for the opam repository, you understood correctly that there is absolutely no plan to deprecate it, or even to make large changes to it in the context of Dune package management. The goal is for Dune package management to be 100% compatible with the opam repository.

To expand on a slightly divergent path and talk about the role of the OCaml Platform: the Platform essentially mirrors the state of the world. For the opam client to become deprecated in the Platform, it would need to become the de-facto reality first. While it might be that the opam client (as for any other tool) enters a maintenance mode and eventually becomes deprecated, that seems unlikely for now, given the number of users who are relying on the opam client. That being said, if and when that happens, the Platform’s role will be to make sure that there is a smooth transition path for users, and that’s something that will require careful planning and discussions. All of which is entirely out of scope for the initial release of package management in Dune.

On a different note, following a discussion with @dra27, opam switches are architected around findlib/ocamlfind. Dune package management presents an alternative solution to achieve the same result. As you point out, it’s not meant to be reusable between workflows: the opam packages Dune compiles for your project are intended for Dune’s internal use during its build, not for external use with the shell. This could be viewed as a parallel to how opam builds switches for opam exec (with eval $(opam env) serving as a convenient shortcut). So, you can think of Dune package management as performing a similar function but specifically for dune exec.

Now, there is the question of how to make sure this doesn’t create confusion and hurt adoption.
But that’s not something that’s specific to Dune and opam. In fact, if anything else, it makes the Platform more cohesive: odoc, ocamlformat, merlin, utop and mdx are all tools that work well independently, but with which you don’t need to interact if you use Dune. Dune has grown as the frontend of the Platfrom and the integration with opam is another step in this direction, not something very new if we look at what’s being done with the other tools.
And this is the best of both worlds: as a power user, you’re free to use each tool independently and you’re not locked in, but as a newcomer or even as a power user who’s happy with the default experience, you can just use Dune.

1 Like

Reading the opam package management RFC gives me the impression that the situation with the opam client is in fact rather different from the situation with other tools, because Dune is in the process of reimplementing large parts of this logic internally, instead of delegating to opam, because you want a tighter integration into Dune internals than a pure-delegation model allows. As far as I know, dune is not reimplementing logic from merlin or odoc or ocamlformat.

I tried to make uninformed guesses at which part of the opam client responsibilities Dune would reuse (sharing code with the opam client) and replace (by dune-specific code) in my post above. My best guess is that the main part you could reuse in the long term is “parsing and programmatic understanding of opam files”. Are there other important ones that I missed?

Another consequence of this design is that the new features which are planned, and are indeed quite nice, will be specific to projects that use dune for package management. The plan is for Dune to provide, for example, good support for incremental rebuilding (when dependencies change), caching (of package artifacts across independent projects), a nice local-switch-first command-line UI with lockfile integration by default, but also editor support (building package dependencies from the IDE directly). None of those features are planned for people using the opam client – if I understand correctly. Some of those features (in particular incremental rebuilding) are clearly in the ballpark of a build system and implementing within Dune makes a lot of sense. But for some others, for example the latter three in my list, adding them to the opam client would also have been a possible approach, but you chose to work within Dune instead.

This is also the root of my question on whether the long-term strategy is to keep two tools/codebases alive, or just one. For ocamlformat or odoc, it wouldn’t make sense to ask whether odoc will disappear once dune gets first-class documentation support. For the opam client and package management, it does.

2 Likes

My longer term view has always been that we should focus on having well-specified file formats that our tools use, and let many domain specific tools that operate over that file metadata bloom. The reason for this is that files that are checked into a project have a habit of sticking around for the long-term (or forever, if you consider historical releases), whereas tools naturally evolve and perish.

The only thing necessary to publish something “into the OCaml community” (that is, something that shows up on a package search on the website) is a tarball with an opam file in it. This opam file specifies interdependencies and a build plan. We have, as of just now, 28296 of these checked into the central opam repository. The vast majority of those packages can be downloaded, extracted, and an installation plan generated simply by looking at the local opam file in the tarball and the central collection of them that represent potential dependencies (the opam repository).

Over the years, we’ve had many build tools spring up: OCamlMakefile, omake, ocamlbuild, oasis, b0, ninja, and jbuilder/dune. What makes dune so interesting from a long-term perspective is that the checked in dune file is also separately versioned, so that it should (with a sufficiently good specification) be possible to analyse the build logic of a repository just by examining it. With most of the other build systems, you needed to run an executable to get a build plan (notably with ocamlbuild, and even with oasis running over ocamlbuild), which tightly couples it to a particular tool. That’s why I’ve been so resistant to the idea of publishing opam packages which do not include a generated opam file (even if its autogenerated from dune), since you then lose the property of simply being able to examine a published artefact to determine how to build it.

What other file formats do we have in common platform tools? We used to have .merlin files, but they’re autogenerated now from a dune build plan in most cases. There are .ocamlformat files, mostly in a k/v format. Generally speaking, we’ve been pretty good at promoting and exposing metadata in an opam or dune file and not having too much of a proliferation of other files.

What tools operate over opam files?

Given that an opam file exists, what tools can actually run over them?

  • opam.exe - the main CLI client, and which exposes an excellent CLI interface to avoid having to parse them directly.
  • opam-0install-solver - implements a much-simplified version of the solver to do ‘one-shot’ solutions that do not need to take existing packages into account.
  • (upcoming) the dune integration, which will also use opam files (and repositories) to perform source fetching operations. Notably, this also allows dune to generate build plans for non-dune packages, which was not possible before.
  • And others, like lsp-server, can also use these checked in files to perform editor-driven operations.

Do we actually need a CLI?

One key architectural difference between build systems and package managers is how stateful they are: build systems usually maintain very little outside of their build tree, whereas package managers (especially opam) have a lot more.

So why do we actually still need an active CLI? The zero-configuration ocaml-ci is a step towards showing that we don’t need anything beyond files that are checked into a source code repo! Consider the following operations, and mappings to how to do them by modifying files and having a background worker process watching for file changes:

  • opam install: Edit the opam file to add a new dependency, and then the background watcher can transactionally install it into a local switch.
  • opam pin: Edit the opam file to add a pin-depends.
  • opam remove: Edit the opam file to remove a dependency.
  • opam remote add: We don’t current offer an official way to check in which opam repositories a project depends on. Could use an x-opam-repos extension field and establish a standard.
  • opam switch: Edit the dune-workspace file to register a new local switch for a project with a compiler version.

Storing all project state in existing metadata files has huge workflow advantages: it means you can statelessly build a project without having to reconstruct local pins/switches for others, which in turns means that CIs like ocaml-ci “just work” when you push the code remotely! It also makes the act of releasing a package much easier, since you can just remove pins/overrides progressively with help from local editors and global CI tools. It also works really well in a monorepo workflow.

The purpose of this little segway is to demonstrate why I think well-specified and versioned file formats are more important than tools, since you can then build the right tool to solve your particular workflow problem for a given context. And to go back to the @gasche’s original question, I don’t think we should be thinking about the dune and opam projects/codebases merging, but rather what elements of their respective codebases should be focussed on to allow more interoperability between tools for their respective file formats.

Some possible considerations:

  • solvers: the full solver libraries are quite heavyweight (and subjectively, overly complex C++ based solvers), but opam-0install is a lovely alternative if single-shot solutions are all that is required. Can these be made more accessible and embeddable to other CLIs (initially dune, but also LSP and whatever else wants to solve for version constraints?)
  • repositories: how can we manipulate opam repositories in a more unified way? Right now they are just a collection of files, but we do need to figure out how to move older packages out of the way, but still retain the ability to install them on demand. This is a top priority for the opam-repository maintainers, and presumably will become a problem for other downstream users such as the coq-opam-repository maintainers as they hit scale issues as well.
  • more formal specifications: if we view tools as interpreters over DSLs (the opam and dune files), then why aren’t we formally specifying these better? After all, we have close to 30000 of them published now by thousands of us across 12 years! And we need to interoperate with other distributions and their package managers. Wouldn’t it be lovely to be able to install opam packages from within Debian, or even other multi-version package managers like Pub.dev

For dune, you can conduct a similar thought experiment, but the most obvious interop point is to take dune files and embed any OCaml project within a larger build system like Bazel or Buck2, without having to write any manual bridging rules.

I’m sketching out my thoughts on the opam repository management roadmap next, but I’d be delighted to hear more about others’ thoughts on what new tools you’d build over sufficiently well specified dune or opam file formats…

4 Likes

I hate to be a Reply Guy on this point, but it seems to me the obvious way to embed OCaml projects as Bazel packages is to write BUILD.bazel files for them.

I guess the actual question I have is “why do I want to manage packages with dune instead of a tool designed expressly for building multi-lingual projects?”

You can go right ahead and do that; nothing stops you from using Bazel directly with OCaml build rules today.

But the dune file is a very succinct way to write down the build specification of an OCaml project today, and so it seems natural to generate downstream build rules to interoperate with other build systems from those, and benefit from the good editor integration (e.g. LSP) when developing the OCaml code.

2 Likes

and in a later post:

" But the dune file is a very succinct way to write down the build specification of an OCaml project today, and so it seems natural to generate downstream build rules to interoperate with other build systems from those, and benefit from the good editor integration (e.g. LSP) when developing the OCaml code."

I’d like to register a respectful but strong dissent here, born of many months spent working on a dunefile to Bazel converter. IMO the dune language is not a good choice as a general meta-build language for OCaml. It’s semantics are full of implicit Dune magic, and it doesn’t even have a published schema. And that’s just the dune part - dune files in the real world often depend heavily on free-form shell scripting (in Dune rule stanzas). So any tool that wants to translate dune files into some other language will have a lot of work to do.

I think the better solution is to decide on a schema (in some language to be determined) that can express the minimal amount of info required to specify a complete build, explicitly, in a way that minimizes the amount of work necessary to translate into specific build system languages. That means adding a lot of info to what dune files explicitly contain. The conciseness of dune files may be a boon to users who write them by hand, but it’s just a PITA for tool writers.

FWIW my conversion tool converts dune files to a more elaborate schema that contains all the info needed to generate BUILD.bazel files more-or-less directly. I believe (but have not verified) that it could be used to generate build files in other languages - BUCK2, ninja, etc. It might even be possible to use mustache templates for generation, I’ve got the mustache tooling but the data has to be massaged a little to be suitable for that - wouldn’t it be nice if you could just write some mustache templates to emit your build code?

The schema (which at this point is entirely informal) would need considerable refinement before it could be used generally, but I think it counts as a Proof-of-Concept. The general idea would be to use such a schema to intermediate between Dune and other systems - instead of converting dune files to (say) BUILD.bazel files, convert them to the generic schema, and from there to whatever. And of course one could go the other way around: start with (and maintain) the generic schema and translate to dune.

I don’t disagree with you! I was merely pointing out that have the beginning of a concrete build specification is so much better than what came before (you’d have to execute the ocamlbuild + plugins in order to determine the dynamically discovered build graph).

What you lay out with respect to a build system generation approach sounds promising, and I look forward to seeing it develop. Could dune do something to make that ‘implicit dune magic’ easier to access for you, so that dune files can be interpreted in a more standalone way?

I think another way I might more usefully express my concern is that dune is already a bit of a cognitive burden on newcomers.

It’s a whole language that you have to master (in addition to the OCaml language itself) in order to get anything useful done while you’re climbing the newcomer learning curve. Which would sort of be defensible if dune really were the only build tool that OCaml programmers ever need to use (it’s almost achieved that already, and fully achieving it soon enough seems within reach), but I’m not seeing how loading it up with a whole package management layer in addition to the one we already have in OPAM will help reduce the cognitive burden on newcomers.

Shorter james: we used to have the “what build tool should I use?” problem, and that’s rapidly fading away, but I’m not seeing how trading that for “what package manager should I use?” is a good trade.

Good question. The obvious thing is what you mentioned previously: a formal specification of the schema. Beyond that, I confess I’ve not put much thought into how it could be improved. I’ll keep that in mind as I go along. FYI I’m in the middle of major refactoring and then handling of rule stanzas needs quite a bit more work. Maybe I’ll publish what I’ve got soonish and see if anybody wants to help out. The tools are all written in C and (s7) Scheme, by the way - fast and portable.

1 Like

I should add: we have another entrant in the package management sweepstakes: Bazel modules (bzlmod). I’ve begun converting all my stuff and so far it works great: versioned packages, transitivity, etc. A huge improvement for Bazel. Includes an extension mechanism that supports integration with external systems like maven etc. Might be possible to integrate with OPAM in some manner, but I haven’t gotten to that point yet. More info at Bzlmod User Guide.

2 Likes

Right and now if you want to do anything serious build-wise you have to understand the programs that generate the dune files :joy:.

So that:

is pure fiction.

1 Like