Dune: serious bug found in the design of library variants

Dear all,

Dune 1.9.0 which was released about 3 weeks ago introduced a new feature: library variants. However, we recently realised that there is a serious flaw with the current design of this feature: some of the choices Dune is making depend on the set of opam packages installed. This is a serious reproducibility issue. This post describes the problem, explains what immediate actions we are taking to mitigate it, and what it means if you were already using this feature or were planning to use it.

The problem

When you write:

(executable
 (name prog)
 (libraries a b c)
 (variants x y z))

If a, b, c or one of their transitive dependency is a virtual library and no concrete implementation was found in the list of transitide dependencies, then dune will look at the variants attached to all installed libraries in order to automatically select a concrete implementation. As you might have guessed, the result depends on what is installed on the system. In particular, reinstalling a previously installed package could lead to a different result.

Who is affected?

Anyone who is using the new library variants feature, i.e. any project with at least one of these fields in their dune files: variant, variants, default_implementation.

I grepped the dune-universe repository to see if this feature was already used by released packages. Fortunately, it doesn’t seem to be the case.

For the record, dune-universe is a repository that embed a snapshot of the latest version of every package released in opam and using dune or jbuilder.

Resolution

We decided to take the two following immediate actions:

  1. we are narrowing the scope of library variants to something we are more confident can be safely supported
  2. we are putting back the feature into development mode

We will soon release Dune 1.9.2 with these changes and mark Dune 1.9.0 and 1.9.1 as unvavaiable in the opam repository to prevent these versions from being used by newly released packages.

Limiting the scope of variants

It will now be forbidden to attach a variant to a library that implements a virtual library from another project. More precisely, if you write:

(library
 (public_name foo)
 (implements bar)
 (variant blah))

Then foo and bar must be part of the same Dune project. Put it another way, one must declare all the variants of a virtual library upfront.

Putting the feature back in development mode

In order to give us a bit more time to think about the design and come up with a strong final one, we are putting back the feature into development mode. Technically, this means that to be able to use the variant, variants and default_implementation fields in your dune files you will need to add the following line in your dune-project file:

(using library_variants 0.1)

The 0.x version indicates that the design of this feature is not finalized and might change at any point. Once the design of this feature is finalized, this line will need to be removed from dune-project files and the feature will be part of the vanilla dune language again.

It is OK to release packages in opam using this feature while it is still in development mode. However, it means that your package will get a upper bound on its dune dependency in a few weeks.

Future plans

We are planning to brainstorm more about library variants to come up with a more robust design. Once we find a more satisfying one, we will implement it, test it and finally integrate it into the vanilla dune language. In the meantime, feedback on library variants is warmly welcome!

4 Likes

While I understand the desire to have variants (it provides quite a bit more extensibility), I feel like everything would work fine for a while with just virtual and concrete libraries, and without the tag system that was added on top.

The current version of the tag system is very implicit and global, and strongly reminds me of all the wrong parts of Haskell’s typeclasses, notably their anti-modularity.

1 Like

Yh, I don’t disagree here. I heard that variants could help a lot for the mirage project though. One possibility I had in mind was to strengthen the design by using some naming convention. For instance, the only library that could provide the xen variant for zarith would have to be called zarith+xen.

I have been thoroughly appreciating virtual libraries. I was initially skeptical of variants, but I’m now convinced that they (along with default implementations) are going to be very useful at scale and I’m eager to try them out.

I have a question about your description of the problem:

As you might have guessed, the result depends on what is installed on the system. In particular, reinstalling a previously installed package could lead to a different result.

I’m curious if this is just a problem with global shared package installs, or if the problem still exists in a more granularly sandboxed package build process like esy’s - which provides greater isolation during each build in the package tree.

In esy, you still have “transitive” visibility into your dependencies (which Dune would resolve variant implementation from), but the environment is reevaluated at each node in the package build graph. So if A depends on B and C, and neither B or C depend on each other, you won’t see A or C when you’re building B, so none of the concrete implementations that A or C define will get selected when building B. This gives you more predictability and reproducibility, because when B builds, its build doesn’t need to anticipate residing among an unpredictable set of packages. No other dependencies that you add to A can make B build in a different environment (unless adding those dependencies causes another version of B to be installed and in that case the cache is automatically invalidated anyways).

It’s still not perfect because it sees its transitive dependencies, and they may define new default implementations or variant tagged libraries, which B might be surprised about if B bumps its dependency versions, but usually those would be considered “breaking changes” and part of the “contract” of B’s dependencies.

In either case, selecting variants seems like the kind of thing that should only be done at the top level application package. If you had that restriction, are there any problems that remain in global environment, or in esy’s granular sandbox model?

There is an analog with Yarn/esy “resolutions” at the package level. Only the top level package’s resolutions are taken into account when solving constraints, because you need someone to be the final authority of which versions are selected when bypassing the constraint solver.

I’m happy to run any tests.

1 Like

You are definitely right that it shouldn’t be an issue with esy. However, a lot of dune users are using opam or even neither opam nor esy, so we need to take that into account.

At this point, we could only rely on sandboxing if dune itself was restricting its view to the declared package dependencies. That’s an option, although there are some annoying corner cases to consider.

Another possibility I was thinking about would be to have only default implementations but allow them to contain variables that would be set at link time. i.e. you’d write (default_implementation foo-%{os}) in the virtual library and then set os when defining the final executable. That gives you the same flexibility as variants but with a much simpler system.

I’m glad the opam case is being thought about thoroughly as well, but some feedback on this point: I appreciate how both opam and esy separate the concerns of the build system from the concerns of the dependency environment, so (like you, probably) I’m not exactly thrilled about dune having to add package manager specific dependency checks.

I’m not exactly clear on the failure mode/bug that happens even in a global environment though.

As you might have guessed, the result depends on what is installed on the system. In particular, reinstalling a previously installed package could lead to a different result.

What exactly is wrong particularly bad about this? Doesn’t Dune invalidate its own internal build cache when new libraries are made available in the environment? If you install a new package into the environment, and a new invariant becomes selected, isn’t that expected behavior? Is it that the order the packages are installed in somehow break ties?

Let’s assume you have the following configuration:

  • package a provides library a which is virtual
  • package b provides an implementation of a with variant v
  • package c provides an implementation of a with variant v
  • package d links an executable depending on a with (variants v). package d only depends on a

Now consider the following set of instructions:

$ opam install a b
$ opam install d
$ opam install c
$ opam reinstall d

The first tree steps will succeed. The last one will fail because dune can’t choose between b and c. This doesn’t seem right.

IMO, the most important feature of variants is the ability to select implementations for transitive dependencies without having to specify all of them explicitly. And it seems to me that we can offer a solution that doesn’t suffer from the above issue.

1 Like

package d links an executable depending on a with (variants v) . package d only depends on a.

Shouldn’t d be required to specify a dependency on either b or c, or something that transitively depends on them at the library level? This seems like a middle ground because, yes you do need to specify something per executable to get it to select concrete implementations, but you’re not forced to enumerate every single one of the concrete libraries. For example, someone could make a platform-js package that defines concrete implementation for several important virtual libraries, tagged with the variant "js". Then the executable only needs to add one library dependency on platform-js.

Even if d required b explicitly, the last step would still fail with the current implementation. That’s the thing we need to fix.

1 Like

In this case, why the problem is not in the opam repository ? Why the package b and c do not conflict ? Maybe there should be cooperation with opam in adding a provides field like there is, for instance, in .deb file for virtual package. In your example, packages b and c provide the same variant for the virtual library a and they conflict.

If I understand correctly how virtual library and variants work, if I do opam install a d and neither b or c is installed, then the compilation of d should fail. But does opam know that d as a missing dependency ?

hmmhmm, as the first integration of variant, it should be more easy (and less error-prone) to only allow implementations of an interface under the same package - which is mostly the case currently in Mirage.

I can understand the purpose to let others to make their own implementations under a interface/variant and it could be interesting in Mirage but may be we miss a step before that.

@kantian this could indeed be considered as a bug in the opam metadata. However, it seems to me that this is a pretty easy mistake to make. Moreover, such problems are likely to be a pain to debug as they might be triggered by transitive dependencies. We generally do whatever we can in Dune to prevent users shooting themselves in the foot and the feature in its current state seems too dodgy to me, so I prefer that we take a step back before it’s too late. We’ll re-release it once it’s foolproof.

@dinosaure that’s an option as well. Unless we can find a satisfying design, we could do this.

1 Like

I’m not entirely sure about this… reading up on variants, they should help us to specify mirage-net being only an interface, and the target-specific implementation (mirage-net-unix, mirage-net-xen, mirage-net-solo5) being their implementations.

For the other use case – choosing C vs OCaml implementations (e.g. for checkseum etc.) – all implementations are part of the same package.

ah yes, I completely agree about that where it’s fit entirely to the case of MirageOS. But, IMHO, the usability of old hack still is fragile in some ways (and dune can help us about that) and the case where different OPAM packages have implementation of one single interface available in an other package, we did not reach this case by hands at this moment - where we still use functor for that.

I think it’s better to allow properly the case of digestif/ptime/checkseum in a first round, and be care about what happens for dependencies (in my case, irmin and ocaml-git).

And then, move on to the next stage if all seems fine.

I say that because we discover, in the mean time, several bad designs (like: disallow a library to choose an implementation for example) and this bug is one of them. It’s a more conservative way - and surely slow - but I still think that we don’t really fully know the implication of this feature in the OCaml eco-system.

1 Like

after discussing a bit more with @dinosaure, I completely agree with him that variants spanning over multiple packages should not be a goal (at least for an initial version of this feature), and are pretty tough to get right.

I see a bit of back and forth here regarding variants and virtual libraries/implementations and I just want to clarify one thing:

The issue discussed in this thread is only relevant to variants. There are no issues with implementations and virtual libraries spanning multiple packages.

This is all a bit of our fault for choosing such confusing terminology, but I want to re-iterate that the issues are only relevant to the tagging system we call “variants”.

One thing that comes to mind when thinking about the implication for the eco-system: adding a function to a virtual module is a breaking change. As a result, it is likely to be a good idea if the virtual library and the various implementations are always developed inside the same repository. Distributing the implementations as separate packages is fine as long as they are all developed in the same repo.

In this respect, I don’t think it’s too different from other OCaml libraries. For example, if I publish a functor as part of my library, I have the same concerns to worry about.

1 Like

Note that virtual/concrete libraries would already help quite a lot for this. We do not need variant tags: the mirage tool has complete knowledge of all the concrete libraries associated to a virtual library, and will simply select the right one, as done currently.

What variant tags would enable is to move to an open world, where the mirage tool doesn’t need to know about all the concrete libraries. This is desirable, but not at such a steep cost. :slight_smile:

This is a sane and reasonable principle.

But, even without the conflict problem (the current bug with variants) there is still a dependency problem for opam. In its metada, the package d should mention something like “I need a package that implement variant v for the virtual library a”, otherwise it would fail to compile. Currently I don’t think it’s possible to express this in opam and d should explicitly mention a dependency on b or c. Am I wrong ?

So even without the design problem with variants, if a package can’t say “I need the variant v for library a, no matter which package implements its” and other packages can’t claim “I provide variant v for library a”, the possibility to declare variants in external projects will not be very useful.