My longer term view has always been that we should focus on having well-specified file formats that our tools use, and let many domain specific tools that operate over that file metadata bloom. The reason for this is that files that are checked into a project have a habit of sticking around for the long-term (or forever, if you consider historical releases), whereas tools naturally evolve and perish.
The only thing necessary to publish something “into the OCaml community” (that is, something that shows up on a package search on the website) is a tarball with an opam
file in it. This opam file specifies interdependencies and a build plan. We have, as of just now, 28296 of these checked into the central opam repository. The vast majority of those packages can be downloaded, extracted, and an installation plan generated simply by looking at the local opam
file in the tarball and the central collection of them that represent potential dependencies (the opam repository).
Over the years, we’ve had many build tools spring up: OCamlMakefile, omake, ocamlbuild, oasis, b0, ninja, and jbuilder/dune. What makes dune so interesting from a long-term perspective is that the checked in dune
file is also separately versioned, so that it should (with a sufficiently good specification) be possible to analyse the build logic of a repository just by examining it. With most of the other build systems, you needed to run an executable to get a build plan (notably with ocamlbuild, and even with oasis running over ocamlbuild), which tightly couples it to a particular tool. That’s why I’ve been so resistant to the idea of publishing opam packages which do not include a generated opam
file (even if its autogenerated from dune), since you then lose the property of simply being able to examine a published artefact to determine how to build it.
What other file formats do we have in common platform tools? We used to have .merlin
files, but they’re autogenerated now from a dune build plan in most cases. There are .ocamlformat
files, mostly in a k/v format. Generally speaking, we’ve been pretty good at promoting and exposing metadata in an opam or dune file and not having too much of a proliferation of other files.
What tools operate over opam files?
Given that an opam
file exists, what tools can actually run over them?
opam.exe
- the main CLI client, and which exposes an excellent CLI interface to avoid having to parse them directly.
opam-0install-solver
- implements a much-simplified version of the solver to do ‘one-shot’ solutions that do not need to take existing packages into account.
- (upcoming) the dune integration, which will also use opam files (and repositories) to perform source fetching operations. Notably, this also allows dune to generate build plans for non-dune packages, which was not possible before.
- And others, like lsp-server, can also use these checked in files to perform editor-driven operations.
Do we actually need a CLI?
One key architectural difference between build systems and package managers is how stateful they are: build systems usually maintain very little outside of their build tree, whereas package managers (especially opam) have a lot more.
So why do we actually still need an active CLI? The zero-configuration ocaml-ci is a step towards showing that we don’t need anything beyond files that are checked into a source code repo! Consider the following operations, and mappings to how to do them by modifying files and having a background worker process watching for file changes:
opam install
: Edit the opam
file to add a new dependency, and then the background watcher can transactionally install it into a local switch.
opam pin
: Edit the opam
file to add a pin-depends
.
opam remove
: Edit the opam
file to remove a dependency.
opam remote add
: We don’t current offer an official way to check in which opam repositories a project depends on. Could use an x-opam-repos
extension field and establish a standard.
opam switch
: Edit the dune-workspace
file to register a new local switch for a project with a compiler version.
Storing all project state in existing metadata files has huge workflow advantages: it means you can statelessly build a project without having to reconstruct local pins/switches for others, which in turns means that CIs like ocaml-ci “just work” when you push the code remotely! It also makes the act of releasing a package much easier, since you can just remove pins/overrides progressively with help from local editors and global CI tools. It also works really well in a monorepo workflow.
The purpose of this little segway is to demonstrate why I think well-specified and versioned file formats are more important than tools, since you can then build the right tool to solve your particular workflow problem for a given context. And to go back to the @gasche’s original question, I don’t think we should be thinking about the dune and opam projects/codebases merging, but rather what elements of their respective codebases should be focussed on to allow more interoperability between tools for their respective file formats.
Some possible considerations:
- solvers: the full solver libraries are quite heavyweight (and subjectively, overly complex C++ based solvers), but opam-0install is a lovely alternative if single-shot solutions are all that is required. Can these be made more accessible and embeddable to other CLIs (initially dune, but also LSP and whatever else wants to solve for version constraints?)
- repositories: how can we manipulate opam repositories in a more unified way? Right now they are just a collection of files, but we do need to figure out how to move older packages out of the way, but still retain the ability to install them on demand. This is a top priority for the opam-repository maintainers, and presumably will become a problem for other downstream users such as the coq-opam-repository maintainers as they hit scale issues as well.
- more formal specifications: if we view tools as interpreters over DSLs (the opam and dune files), then why aren’t we formally specifying these better? After all, we have close to 30000 of them published now by thousands of us across 12 years! And we need to interoperate with other distributions and their package managers. Wouldn’t it be lovely to be able to install opam packages from within Debian, or even other multi-version package managers like Pub.dev…
For dune, you can conduct a similar thought experiment, but the most obvious interop point is to take dune files and embed any OCaml project within a larger build system like Bazel or Buck2, without having to write any manual bridging rules.
I’m sketching out my thoughts on the opam repository management roadmap next, but I’d be delighted to hear more about others’ thoughts on what new tools you’d build over sufficiently well specified dune or opam file formats…