Explorations on Package Management in Dune

Reading the opam package management RFC gives me the impression that the situation with the opam client is in fact rather different from the situation with other tools, because Dune is in the process of reimplementing large parts of this logic internally, instead of delegating to opam, because you want a tighter integration into Dune internals than a pure-delegation model allows. As far as I know, dune is not reimplementing logic from merlin or odoc or ocamlformat.

Right, the integration is done differently and uses opam as a library instead of a binary as itā€™s done with other tools.

It might cause some level of duplication, but that itself isnā€™t a good enough reason to justify the deprecation of a separate opam client (the user base isnā€™t the same, and the workflows are not the same). And indeed, as stated earlier, the current plan is to have the opam client and dune package management co-exist.

But itā€™s also important to remember that for now, package management in Dune is still in a prototyping stage. Blockers are being found, and better ways to interoperate with opam are being discussed. As a recent example, we were discussing whether we should upstream the ā€œDuneā€ lockfile generation to opam.

So I wouldnā€™t be too quick to judge the plans based on where the code currently lives. This will change. And to echo @avsm, the files that are meant to be consumed by the different tools carry more weight than the tools that consume them. Thatā€™s a belief thatā€™s applied to the design of Dune lock files, which goal is also to integrate with tools other than Dune, like Nix.

For now, in the interest of having a good development pace until we have the first version of package management in Dune, a lot of code is being written in Dune directly. But the opam team is part of the effort, and the plan is to have Dune and opam client co-exist, Iā€™m not worried that weā€™ll converge to something that minimizes maintenance efforts and benefits everyone down the line.

1 Like

This whole discussion is making me reconsider my decision to adopt dune for the project I have currently underway as a followup to Orsetto. I donā€™t want to contribute to this consolidation of an OCaml-specific build tool and language independent package management.

Yes, god forbid we solve the fragmentation issues in one dimension of
OCamlā€™s ecosystem! :sweat_smile:

What do you worry about? Almost every language has its custom build
system and package manager (or systems for the unlucky ones). Dune
works well and is aware of a lot of things that would be really annoying
to encode by hand in, say, make. There is no way rust would be as
popular or successful as it is, if it did not have Cargo. Or even
javascript/node.js without npm (!).

6 Likes

Thatā€™s not my view of what this proposal does. Instead I see it as introducing more fragmentation into the ecosystem where none really exists today.

My view on this. When people talk about Opam, this covers different aspects.

  • the opam files and files repositories. The main repository is (for us OCamlers) ocaml/opam-repository, but the Coq community also has one, and many companies use their private repository. We do not want to break this workflow. The Dune package management proposals aim to stay fully compatible with this workflow and work with any opam packages (using dune or not). Thereā€™s also exciting ongoing work for package signing that we do want to see land at one point. We have also built and are operating an extensive CI infrastructure around these repositories ā€“ for instance, ocaml-repo-ci is building 100,000 jobs daily on all the Tier1 supported platforms for OCaml. We do not want to rebuild this once more time!

  • the opam client(s). The main one is the opam CLI, but many more tools use the opam files metadata. There are a few tentatives of generating nix derivations for those files. Thereā€™s also esy and the package managers that try to close the gap between the OCaml and JavaScript ecosystems. The client is built around a library (opam-lib), but this has never been designed properly. When I wrote opam initially, it was only focused on the CLI. Later with @AltGr, we tried to split it a bit more cleanly, but the API is still painful to use (for instance, every function that needs to load the filesystem needs a value that holds that state that would take dozens of arguments - then, as these functions perform file-system or network effects, you somehow need to keep these values synchronised with the new filesystem state ā€“ thatā€™s painful and error prone). This API can somehow be split into various parts:

    1. Reading the opam repository state: parsing opam files, building a dependency graph (thatā€™s the part you mention @gasche)
    2. Resolving constraints: opam has a pluggable interface for constraint solving, and by default, it will use what solver is installed on your system (or some internal heuristics which used to be very naive but seems much better nowadays). Opam needs to serialise and parse solver requests, including solver errors that need to be somehow pretty printed to the user.
    3. If the solver can devise a building plan, parse it and prepare it by downloading (and caching) the build/package sources.
    4. run the build commands for all the packages and install them locally.

Nowadays, most package managers also have the option to snapshot the state of the build plan between 2. and 3. Thatā€™s for instance, what tools like opam-monorepo lock or opam lock are doing.
But when you do opam install --locked, opam still calls the solver (to check that your lock file is consistent and complete), so doing 3-2-3-4. And opam-monorepo pull does 3. and delegate 4. to dune build (so all your dependencies need to use dune and be co-installable in a dune workspace).

So to come back to your question @gasche: The Dune package management experiment will be using opam-lib to do 1, 2 and 3 (whether itā€™s the current opam-lib or an improved version that relies a bit less on filesystem state is still in discussions - whatever the result is it will be upstreamed). And it will be using the dune scheduler to do 4 (but still using the opam build instructions).

2 Likes

Right and now if you want to do anything serious build-wise you have to understand the programs that generate the dune files

FYI Iā€™m in the middle of major refactoring and then handling of rule stanzas needs quite a bit more work.

What level are you interested to integrate with these build rules? For instance, I am wondering if you tried looking at the output of dune rules <target> (or even dune rules --makefiles if you donā€™t like S-expr). They are known to be imperfect (and contributions to improve those are welcome) but maybe that could serve as a good, low-level integration point between the various build systems.

Thanks! This is a clear answer and it usefully refines/complements the proposal. Currently the code being written rather gives the impression that the goal was to do 1 with opam-lib and the rest in dune. @tmattio had already pointed out that this is a prototype, but having an idea of the long-term vision makes this helpful.

Two questions out of curiosity:

  1. if dune builds all the packages locally, could it also consider installing them in an opam switch once they are built? In a sense we could have a dune package export command that only reruns the install step to talk to another opam client ā€“ it has done the build already.

  2. A few years ago I suggested that opam builds could be sped up by talking advantage of Dune composability: better integration between opam and dune for building? Ā· Issue #4283 Ā· ocaml/opam Ā· GitHub . At the time the response to the suggestion was to avoid special-casing opam on dune behavior. But I wonder if this could possibly be reconsidered in the future: if the code to create ā€œlocal monorepo islandsā€ and build them faster is mostly there in Dune, we could also take advantage of this from the opam client.

Thanks, thatā€™s helpful for trying to figure out the semantics of dune stanzas, since it evidently prints the result of dune processing, but that makes its output dune-specific and thus unsuitable for driving generation of a build-system-agnostic specification.

What is specific to dune in there? A rule is just a set of inputs, outputs and an action (so it can be pretty-printed easily as a Makefile rule).

But maybe thatā€™s too low-level as it includes specific sandboxed paths? Are you looking for something with a more abstract definition of compilation/library units? Or something where paths are less concrete? It would be very helpful to understand the right level of abstraction to target, and Iā€™m sure dune rules can be adapted :slight_smile:

Maybe Iā€™m not doing it right. Take for example ast/dune in ppxlib:

(library
 (name astlib)
 (public_name ppxlib.astlib)
 (libraries ocaml-compiler-libs.common compiler-libs.common)
 (flags -w -9)
 (preprocess
  (action
   (run %{exe:pp/pp.exe} %{read:ast-version} %{input-file}))))

(rule
 (targets ast-version)
 (action
  (run %{ocaml} %{dep:config/gen.ml} %{ocaml_version})))

(cinaps
 (files *.ml *.mli)
 (libraries astlib_cinaps_helpers))

If I run dune rules ast I get a large amount of output that is not easily mappable to the dune stanzas. Just a snippet:

((deps ((File (In_source_tree ast/pp/pp.mli))))
 (targets ((files (default/ast/pp/pp.mli)) (directories ())))
 (action (copy ast/pp/pp.mli _build/default/ast/pp/pp.mli)))

((deps
  ((File (External /Users/gar/.opam/4.14.0/bin/ocamllex))
   (File (In_build_dir _build/default/ast/pp/pp_rewrite.mll))))
 (targets ((files (default/ast/pp/pp_rewrite.ml)) (directories ())))
 (context default)
 (action
  (chdir
   _build/default/ast/pp
   (chdir
    ../..
    (run
     /Users/gar/.opam/4.14.0/bin/ocamllex
     -q
     -o
     ast/pp/pp_rewrite.ml
     ast/pp/pp_rewrite.mll)))))

This shows how dune does it. Other build systems may do things differently, e.g. no copying to a _build subdir, no need to (chdir _build/default...) etc.

The specific task for (action ...) is to make explicit and unambiguous the tool, the inputs, the outputs, and the command syntax. For example magic variables like ${exe:pp/pp.exe} must be replaced by fully-qualified build target labels, in this case something like //ast/pp:pp.exe (thatā€™s Bazel label syntax, but itā€™s generic and unambigious). Similarly %{dep:config/gen.ml} becomes //ast/config:gen.ml. Etc. Currently I have to figure that stuff out going only from the dune files, which is definitely not easy. If a dune command could output that information (sexp syntax is good for me) it would be wonderful. The output would not be specifications of build actions, but an elaborated version of the dune file. Thatā€™s essentially what my conversion tooling tries to do. Originally I was going from dune file to BUILD.bazel in a pretty ad-hoc manner, mainly because I had no global sense of the whole, but the better strategy is obviously to partition the logic in two: first convert the dune stuff to a generic syntax, then write code to emit build files.

FWIW I expect to have finished with the refactoring in a few days, then I can get back to the conversion logic. At that point Iā€™ll have some concrete examples which should make easier to discuss whether and how Dune itself might do something similar.

1 Like

I had the idea to create an opam plugin that would be able to read the dune setup, and ā€œexportā€ it in an opam switch. Like that you can have your local project dune package managed, and if needed, via the plugin opam can read it and interact with it (in a read-only way).
Once dune package management stabilised, we will see if such plugin is easily feasible.

Note that this workflow (installing OPAM dependencies locally) is useful in more than one way: by installing these dependencies you lower the amount of work needed to be done by Dune during incremental rebuilds: in large codebases the extra cost incurred by scanning the source of every dependency, checking whether everything is up-to-date, etc., is not negligible, and mostly useless once they are initially built since you donā€™t normally touch the depenendencies. On Windows the effect is rather more prounounced due to slower file/process operations.

As a data point, at LexiFi we put together a similar workflow where we build the OPAM dependencies of our code (they are manually kept in a monorepo, we donā€™t actually use OPAM) in a single Dune workspace, and then we dune install them to a local directory, which we make available to the rest of the codebase using the OCAMLPATH variable. This works very well for us and lets us avoid the hit on Dune incremental building times.

I am hoping that such a workflow will be supported by Dune at some point.

Cheers,
Nicolas

1 Like