Finding the root of a Dune project (request for comments)

Hi there,

I’m reporting on a discussion I had with @diml on dune’s issue tracker.

As of now, when dune is run, it looks for the root of the project to be built. This is done by recursing upwards in the file hierarchy, looking for dune-workspace/dune-project files, until there aren’t anymore.

Which means that if you happen to have such a file in a parent directory, your build may break because this directory has other descendants that fail building. Like this:

R/dune-workspace
R/P/ ... my perfectly fine project ...
R/F/ ... a failing project ...

If you run dune from P, your build will fail due to F and the root being inferred to be R.

To prevent this, you may pass a --root command-line option to dune, but of course it means that you should be aware that there are dune-workspace files upper in the hierarchy.

As I already had build failures, such as the one sketched above, several times, I’d like to be able to set the root of a project explicitly. For instance by stopping at the first dune-workspace upward (incl. the current directory). @diml told me this question comes again from time to time so he suggested I’d poll the community about this, so here I am.

What are your thoughts?

It’s probably worth noting that failures only occur when the other project has a broken dune file. Regular build failures are ok.

Personally, I like this feature, because it makes it easy for me to create local forks of libraries without fussing with opam. I don’t have nested dune-workspaces, but I think some libraries have their own and I’d like mine to take precedence.

As far as I’m concerned, I have old files hanging here and there, as well as erroneous dune or jbuild files written by students and that I should assess. So it’s a bit cumbersome to suddenly have a failing build of a correct piece of software because of other sibling directories. Additionally, what if you have, by misfortune, dune-project/-workspace files in a piece of the hierarchy that you can’t control (e.g no write access?).

I believe this is the root of the issue. In my opinion, the way you organize your R directory isn’t optimal. Why would you put a failing and a working project in the same workspace? I try to organize my workspaces a little more carefully than that. Hence I’m never a in a situation where dune is discovering an invalid workspace.

However, I do think that we need the following features in dune:

  • A way to define workspaces without relying on the directory structure. I.e. there should just be a way to specify which directories are a part of the workspace. Relying on symlinks is sometimes quite annoying.

  • A way to turn off auto climbing directory feature. Enough people to seem to prefer to use dune with essentially `–root $(pwd). We might as well make it easy for them by making this a configurable thing. I don’t think this will ever become the default however.

1 Like

(First, let me stress that I’m very appreciative of dune and the work its developers have been doing, keep up the good work!)

Well the situation described was a simplification. It suffices to have such a file in a far ancestor directory to meet this error. Which can be the case:

  • by mistake (making a temporary copy in $HOME while trying new configurations, and then forgetting this copy for several days);
  • or making an erroneous copy by typing a wrong path without realizing it;
  • or being a person not understanding dune at all but being forced to use it and having no time or commitment to learn about it (= my ~100 students/year and fellow professors who have been TAing -very well- the language since the early 90s (yes!) but never use it on a daily basis);
  • or some fellow programmers that know OCaml enough to program some bits and pieces of code together, but not enough to know that the build tool contrary to classics like make also looks upwards.

So I think this behavior is risky w.r.t. newcomers or casual users, which is a pity as dune, as it happens, makes things so much easier (dune runtest and dune utop are so cool when teaching!).

From a user point of view, I tend to think this behavior is a bit exotic, but I can see it has some advantages. But:

  1. I don’t know what is the proportion of people who leverage this default option to work in a better way (for some definition of better);
  2. Even if you don’t make it a default option, I would advocate to give users the possibility to mark explicitly the root of a project (without relying on the --root option). Either with an option in an existing file (dune-workspace for instance) or by using yet another, optional, file (say dune-root). Or something else you can think of.

I think this should be possible. This is much less of a restriction than just completely getting rid of composition. All we’re doing is just asking users to run commands “closer” to the workspace root they’re intending to use. Otherwise, they can also specify the workspace manually. This would to dune-project files as well.

I just did a bit of testing, and found that we have not been burned by this behavior of proceeding past the first dune-workspace / dune-project file only by dumb luck. No home directories need to be involved. With the current behavior, as soon as one of these dune- files is added in an ancestor directory of one of the ‘project’ directories below it, with its own dune-workspace and dune-project files in the repo, the build for the lower-level project is broken.

Perhaps there is a way to modify the build scripts to be insensitive to whether there are dominating dune- files, but I don’t know it. Perhaps it doesn’t matter if the targets dune is asked to build are only @ aliases, but in our setting we need to specify particular targets, and they end up having names that are not independent of the location of the _build directory.

I would very much like a setting that could be put in dune-workspace or dune-project files that would indicate that dune should not continue up the file system.

Perhaps there is a way to modify the build scripts to be insensitive to whether there are dominating dune- files, but I don’t know it. Perhaps it doesn’t matter if the targets dune is asked to build are only @ aliases, but in our setting we need to specify particular targets, and they end up having names that are not independent of the location of the _build directory.

Do you targets start with _build?. The two following commands are supposed to be always equivalent:

$ dune build <a>/<b>
$ cd <a> && dune build <b>

So normally targets are independent of the root by default. It’s only if you have targets that start with the build directory that this is a problem. Indeed, dune interpret targets differently depending of whether they start with the build directory. Maybe that’s a mistake though. Instead we could have a special syntax for that, such as dune build "(in_context <context-name> <target>)"

Regarding the original topic, I’m a bit reluctant of making this aspect of dune configurable. I’d rather we settle on the right behaviour. A behaviour that might be less surprising is this one: dune stops at the first dune-workspace file. If no such file exist, it stops at the first dune-project one.

And that would still make sense regarding the composability aspect of dune. Indeed, the expectation is that each dune project as exactly one dune-project file and no dune-workspace file. Indeed, dune-workspace are meant for setting up workspaces for local development.

1 Like

If the behavior is changed, wouldn’t it break projects which are nested?

Only if you run build commands in the nested sub projects I believe. In any case, this is simply changing the behavior of how dune infers project roots. For released builds, we don’t rely on inference but on specifying the root explicitly. So it wouldn’t count as breakage anyway.

I’m for this change. I believe it’s consistent with other tools like git which also stop climbing up as soon as they find a suitable root marker.

Even without configurability, the old behavior can always be faked with export DUNE_{WORKSPACE,ROOT}=.., so I don’t think it’s a big loss to make this non-configurable.

I’ve created a PR that simplifies the root detection: https://github.com/ocaml/dune/pull/2891

We believe the better behavior is to stop at the first dune-project or dune-workspace encountered. Please let us know if this addresses your problem.

Yes, they are mostly of the form _build/<context>/<something>.exe or _build/<context>/<something>.install. I don’t know of a way to select some but not all of the contexts without using such explicit targets, but maybe there is one?

I really don’t understand this point. The dune-workspace file is AFAIU the only way to specify the opam switches used for the different dune contexts. In order to have even almost reproducible builds, and for all developers of a project to see the same build errors, etc., this information is crucial. That is a very strong pressure to put it (or something that generates it) into source control.

Is there some other expected workflow that can support specifying opam switches to use for build contexts?

(There is an additional step to ensure that the opam switches have the “right” packages installed across developer machines, but that isn’t dune’s problem I think.)

I think that reproducibility is the job of the package manager rather than dune’s. What’s required for this is far more than what can be specified in the workspace file anyway. An opam lock is currently one way to provide this information.

Note that there not being checked in was phrased perhaps a bit too strongly here. Dune itself checks in workspace files that make it easy to test against multiple versions. I think the way to understand it is that the dune-workspace file describes the environment (or workspace) rather than the project itself. Hence, it should always be optional as the project should be usable in another workspace. You are of course free to provide configured environments to your users. But today, workspaces aren’t a great solution for that.

I don’t know of a way to select some but not all of the contexts without using such explicit targets, but maybe there is one?

Indeed, there is no way at the moment. That should probably be added though.

Regarding committing the dune-workspace file, yh actually that seems fine. It just means that what you are sharing is not just a project but effectively a workspace. As Rudi mentioned, Dune’s workspace files don’t capture very much. In fact, it’s not even guaranteed that the opam switch names will mean the same thing on different machines or that they will be installed already. Some people might also build without opam.

Right now, if you commit a dune-workspace file, then anyone who would want to build the project by calling dune directly would likely need to install the various switches beforehand.

To be clear, that’s a breaking change and requires a major version bump. Since we are releasing Dune 2.0.0, we can pass this change now.

I’m in favor of this.

I think that reproducibility is the job of the package manager rather than dune’s. What’s required for this is far more than what can be specified in the workspace file anyway. An opam lock is currently one way to provide this information.

I completely agree that the scope of this reaches beyond the responsibilities of dune. But I think it is still important for dune to support an interface to the other components (like opam and a lock file) that are needed for reproducible builds.

Note that there not being checked in was phrased perhaps a bit too strongly here. Dune itself checks in workspace files that make it easy to test against multiple versions. I think the way to understand it is that the dune-workspace file describes the environment (or workspace) rather than the project itself. Hence, it should always be optional as the project should be usable in another workspace. You are of course free to provide configured environments to your users.

I also agree with this, but each project can only be tested in some limited set of environments, and so “should be usable” is more an aspiration than something that a project can ensure always holds. So I think it is still important to have good support for a project to provide at least one “known good” configuration, which will currently involve some scripting to set up opam switches, lock files, and a dune-workspace file to tell dune to use those opam switches.

But today, workspaces aren’t a great solution for that.

Maybe, but they are a real help, even if not perfect / covering the whole job.

Note that in this discussion I’m not saying that there is a particular problem, I just want to raise awareness the use case of using dune-workspace files under source control as part of a larger system to provide (close to) reproducible builds, as the discussion seemed to indicate that dune-workspace files were considered to be only a form of user preference file, while they are much more in some case.