Upcoming Dune Feature: (include_subdirs qualified)

Hello, the dune team is working on adding (include_subdirs qualified) support to dune and would like your feedback on some user facing details. I’ll explain how the feature works in this post, but you can also read the initial feature request for some background.

Wrapped Libraries

First, let’s review how wrapped libraries work. This is important because (include_subdirs qualified) just generalizes the scheme to arbitrary directories. Suppose we have the following library:

(library
 (wrapped true) ;; this is the default. it's added here for clarity
 (name foolib))

By default, dune will make every single module in this library available under Foolib - e.g. Foolib.X. In this example, the “library interface” module is Foolib and it is always present. In this example, it is generated. But it can also be written by hand:

$ cat foolib.ml
(* We can choose to export whatever we want *)
module X = X

The advantage of hand writing this interface module is of course tighter control over the interface of the library. The disadvantage is that it has to be manually written.

Qualified Subdirectories

The stanza (include_subdirs qualified) generalizes the above scheme. In particular, one may introduce a directory with modules. For example:

$ ls
 foo.ml
 sub/
  x.ml
  y.ml

Inside foo.ml, we’ll refer to Sub.X and Sub.Y. While x.ml and y.ml we’ll be able to refer to each other in an unqualified manner (X and Y). Naturally, the module Sub will also be an interface module and the user will have the option to write it manually. This is where we get some options.

Interface Modules for module groups

Given the example above, where should the user write the interface module for Sub and how should it be referred to in dune files? I’ll list two options and briefly describe their advantages:

  • sub/sub.ml - this would be most similar to how we handle the toplevel library interface. It also maintains the invariant that every directory has at most one interface file.
  • sub.ml - this module would live in the same directory as sub/ and would allow sub/sub.ml to exist as Sub.Sub. I think this behavior is more intuitive to users.

Finally, how should we refer to such modules in dune files? For example, in dune files we can set per module preprocessing or mark some modules as private. How should we make the Sub private?

(library
 (name foolib)
 (private_modules foo)
 ;; or this
 (private_modules foo.foo))

If the interface modules exists at sub/sub.ml, then we should probably just forbid foo.foo. While if the interface module is sub.ml, both paths are allowed and simplify refer to different modules (sub.ml or sub/sub.ml).

My questions to the community:

  • Which scheme do you think is more natural?

  • Do you have any other comments about the feature?

8 Likes

I think there’s also the consideration of how the interface module for Sub would be manually written.

If it’s sub/sub.ml, then in there module X = X would match the current scheme for wrapped libraries.

If it’s sub.ml next to sub/, then should you write module X = X or module X = Sub.X?
The former feels odd because at the library level X refers to sub/x.ml if it happens to be in sub.ml, but x.ml if it happens to be in some other file that doesn’t have a matching subdirectory.
The latter feels odd in a different way because you’re referring to Sub.X while defining the interface for Sub itself.

1 Like

Good question. I expect the user to write module X = X. So the group interface for Sub would have Sub__ (the alias module) opened. This is the convention we have library interfaces already.

Yup, we definitely want to avoid that.

Personally I think that having sub.ml at the same level as sub/ makes more sense, but that you made the opposite choice with wrapped module whose explicit interface is in the wrapped directory rather than outside it (so you cannot have Foolib.Foolib, as you remark for Foolib.Sub.Sub).

(On the other hand, wrapped libraries are not necessarily in a subdirectory of the project – although that is the standard convention – so sometimes it would not be clear where to put the interface file.)

Another option would be to use, in the subdirectory, a conventional file name that does not clash with a submodule name, for example (urgh) _.ml and _.mli.

My preference An alternative would be that (include_subdirs qualified) also switches the behaviour for the toplevel to your first option, i.e. all modules are moved down into a subdirectory,

$ find *
dune
foo
foo/sub
foo/sub/y.ml
foo/sub/x.ml
foo/sub.ml
foo.ml

so that that fully qualified module path matches the directory path relative to the dune file.

Update: The weakness of this scheme is that a common library prefix needs to be repeated. E.g. consider a project irc containing three packages, irc, irc-async, and irc-lwt, where the first packages contain two libraries, irc.core and irc, a possible layout would be:

irc.opam
irc/
  lib/
    dune
    irc_core/
    irc/
irc-async.opam
irc-async/
  lib/
    dune
    irc_async/
irc-lwt.opam
irc-lwt/
  lib/
    dune
    irc_lwt/

While this is very systematic, the “irc” prefix need to be repeated at two levels, which might be inconvenient with longer names?

Can these subdir modules be integrated with ocamlfind subpackages in some way, and to allow migration from them?

E.g. as a concrete example currently core_kernel has a uuid subpackage (core_kernel.uuid), but this leaks out of that namespace and causes conflicts with user program also having a module named Uuid (we’ve had to rename one such module in our code to Uuidx to avoid the conflict).
It would be good to have consistency between ocamlfind/opam namespacing and OCaml module namespacing, to avoid leaking such internal module names to the toplevel where they cause link time conflicts.

With this proposal would it then be possible to turn core_kernel/uuid/uuid.ml into Core_kernel.Uuid at the OCaml module level, while still making it possible for applications that explicitly link against core_kernel.uuid to get to that module more directly (without the Core_kernel prefix, just with Uuid)?

i.e. with your example:

(library
  (name foolib)
  (public_name foo_public)
  (public_submodule_name foo_public.sub)
)

Would create a foo_public ocamlfind library containing Foolib (which contains Foolib.Sub), and foo_public.sub ocamlfind library which contains just Sub as an alias for Foolib.Sub.
(or perhaps using a mechanism similar to the deprecated_library_name, where foo_public.sub is there only to aid migration between library versions).

Python uses __init__.py for a similar purpose, though I think “init” would be a misnomer for our case.

2 Likes

That’s the python inspired approach and it’s also worth considering. Some even suggested the name __init__.ml. Personally I don’t mind this convention, but it seems like a more radical change than what’s already been proposed.

@paurkedal This approach would be quite nice if we had another convention for inserting binaries and tests in these trees without all these additional directories. But as you’ve mentioned, the repetitiveness isn’t going to be well received by the users.

@edwin you’ve pointed out a real problem, but it would require quite a more work than even this already hefty proposal. If there was a community wide agreement that this is the way to go, it could probably be done. For now, I would rather not build more namespacing on top of dune without knowing what the compiler team’s intentions about namespacing in the future.

1 Like

I don’t think I would mix tests and binaries in the same directory as the library. Isn’t dune already encouraging a dedicated directory, since it allows omitting the modules list in this case? Or maybe I misunderstood your point.

It occurred to me afterwards than this could be addressed by allowing the directory to be specified, e.g. with the same example:

(library
 (public_name irc)         ; Module name Irc ...
 (module_directory main))  ; ... with sources in main/ (and main.ml).
(library
 (name irc_core)           ; Module name Irc_core ...
 (public_name irc.core)
 (module_directory core))  ; ... with sources in core/ (and core.ml).

I’m still not sure whether it’s better; it’s maybe just as much a matter of where the dune file is relative to the sources than vice versa.

Interface module - is it on purpose that all the examples use an ‘.ml’ file there? I would’ve expected just an .mli (with the .ml auto-generated by dune to contain aliases for the needed things)

My vote would be for the python approach. It doesn’t have to be 2 underscores since we don’t have the same python convention of double underscores so 1 can be enough. But it still means python programmers would be right at home, and I think this approach also generally makes sense. In fact, it should hopefully become the default at some point, as the current method of turning each directory into it’s own library is highly suboptimal. EDIT: Maybe something like _module_.ml makes more sense. Newcomers to python don’t understand why the file is called __init__.py. Unlike python, we also don’t need to have it in every directory, though the convention is useful for leaving out certain directories (such as data directories).

What happens though with this approach if you have both a sub.ml in the same directory and a /sub subdirectory?

If this approach isn’t taken, I think the second choice is the better one.

@paurkedal what I mean that we currently structure packages as:

$ ls
pkg.opam
pkg/
  lib/
  test/
  bin/

So we have a few directories that don’t correspond to any logical namespace but need to be attached to the package somehow. We could simplify this though. I’ve thought about schemes such as $name.exe.ml being interpreted as defining a library or $name.test.ml defining a test. That would make it unnecessary to have all these sub directories in the common case.

@edwin One should be able to write the .mli only as well.

@bluddy Yes, creating all the sub libraries is indeed suboptimal and this proposal is supposed to help with some of that. But note that _module_.ml (or whatever you call it) doesn’t give us any additional power. The other proposals for naming interface modules for a directory are just as good for the purpose.

That depends on which proposal we’ll adopt. If the interface file will live at sub/sub.ml, then sub.ml would not be allowed to be a neighbour of sub/. If the interface file is at sub/_module_.ml or sub.ml, then there’s no problem with sub/sub.ml existing as Sub.Sub.

This proposal seems to raise the same question as it did for the rust community with the switch from a contained mod.rs file to an outside directory <module>.rs file (rust-book). It could give us an idea about the advantages and drawbacks of each approach.

I have two concerns with moving the sub.ml file outside the sub directory:

  • It removes the “container” effect, where everything about this submodule lives in one place.
  • As mentioned above, if we can write sub.ml and sub/sub.ml, it can be confusing in this kind of situation:
(* --> sub.ml *)
module X = X
(* or depending on the choice *)
module X = Sub.X

(* --> sub/sub.ml *)
module X = struct
  let bar = ignore
end

(* --> sub/x.ml *)
let foo = ignore

Apart from the design choice opinion, this could introduce confusion in knowing the module’s source.

The python approach would be the best as it keeps the “container” effect and does not introduce name conflict in the structure.

Will this behaviour affect how the wrapped stanza is treated, or is it a new stanza with its own behaviour?

Aren’t you again solving something in dune that should likely be solved upstream ?
Cue @lpw25’s namespaces.

2 Likes

Arguably. But I’m careful enough not to add any new concepts. Just lifting a limitation that allows users use our existing wrapping feature in more places than just top level library names.

How about declaring parent or sub libraries, and still explicitly defining a library in the subdir,
i.e. inside foolib/sub/

(library
  (name sub)
  (public_name foolib.sub)
  ....
)

The __init__.ml proposal looks interesting, and that could still be done (regardless of whether submodules are used) independently of this.

This would be more tedious on the library author (having to write a dune file for each subdir), but wouldn’t have to change any expectations on where to place things, i.e. the question whether it should be sub/sub.ml or sibling to sub/ would answer itself: it’d have to be inside sub/sub.ml because that is the only way in which you could declare the dune files currently if they were “full” libraries, and submodules retain most aspects of “full” libraries, except for the flexibility of wrapping things into submodules instead.

Obviously dune will have to assemble the foolib.ml after it becomes aware of all the foolib.<sub> declarations, so further requiring the directory hierarchy to be obeyed might both simplify the implementation and enforce good practice (you wouldn’t want to declare a foolib.sub submodule while inside barlib/sub by accident).

And we already have the concept of ocamlfind sublibraries, which can be created by just adding a . into the name, so why not make it possible to do something similar for public_name or name of a library itself?

Obviously there is value in simplifying this further, like in your initial proposal, so one doesn’t have to write dune files for all those subdirectories, but if the user needs to tweak anything with regard to the submodule then it would be useful to know that they can by just creating a dune file and declaring the name of the submodule using dots.
Dune wouldn’t be changing anything about how the language works (so when upstream gains namespace support dune can be taught to use it), it’d just offer a convenient way to autogenerate the toplevel foolib.ml and foolib.mli as it already does today for toplevel wrapping.
(one could manually achieve a similar outcome by manually writing all of foolib.ml, foolib_sub.ml, and so on).

However please try to ensure that documentation generation still works when submodules are implemented, it is already losing module level documentation today when includes are used, and this might further complicate the situation. I realize that this may be an odoc bug, but if there is anything dune can do to make documentation generation more robust here, or use only the constructs that are known to work (especially when submodules will be used) that’d be great.

1 Like

Note that the empty __init__.py files (just to mark package root directories) are unnecessary / optional since about python 3.4:

(Of course, they are still used as real, nonempty files in cases where re-exporting or adding to the top level package namespace is needed.)


Ian

2 Likes

Would this be equivalent to the user writing a top level library that composes all the sub libraries? If so, I would suggest with doing this pattern manually and if there’s enough adoption, dune can help to cut the boilerplate. Or do you still mean that dune should take the public name foolib.sub into account when generating the names of compilation units?

@maiste The existing (wrapped true) is fully backwards compatible with this proposal. If we were to switch to __init__.ml it would have to be something we introduce in 3.x but make it opt-in. In 4.0, the default could change to allow this module to be the group or library interface.

2 Likes

This looks like a dope feature. OCaml has always been annoying with the global module namespace and this would solve it!

Personally, I’m always a fan of convention over configuration so anything that naturally reflects the filesystem layout is always better, i.e. foo.ml and private foo.

Thanks for the hardwork!

2 Likes