Properly wrap a package’s modules with dune

dune is great for brainless, fast compilation. However it’s hard to control with precision what it produces.

Situation

Say we are writing a library pkg with modules A and B, and we want a module PKG as the single entry point of the library.

Files:

$ ls
A.ml   B.ml   dune
A.mli  B.mli  pkg.opam

File dune:

(library
	(public_name  pkg)
	(name         PKG)
	(wrapped      true)    ; this is the default
)

As of dune 1.11.3 (with OCaml 4.08.1), when building this package, dune will first alpha-rename modules A and B to PKG__{A,B}, then compile them, then compile a generated wrapper module PKG, with submodules {A,B} which are aliases to PKG__{A,B}.

This does provide an entry point named PKG with the contents we want, but it also exposes internal modules PKG__{A,B} in the environment (for example, they pollute the suggestions in utop). We cannot remove them because PKG.{A,B} are mere aliases to them (we need their .cmi for compiling, and their .cmti for documentation).

Things become worse when we try to write the wrapper module ourselves (for documentation purposes, or to select what to expose, or to customize module paths, or to have toplevel values). Then, it appears that dune still generates the same wrapper module as before, only this time it calls it PKG__. Only after that the generated module is compiled, our custom PKG.ml file is compiled, and references to either A or B are aliased to PKG__.{A,B} (which are themselves aliases to PKG__{A,B}). So now we have one more internal module polluting the environment.

(I was going to develop further on how it makes the documentation generation fragile as you try more and more to customize your build, with mentions of “underscored” modules sparkling the final user documentation, and/or missing pages for the “non-underscored” modules, but I realize it’s going out-of-topic.)

0th try: private modules

dune has an option (private_modules A B), but its only apparent effect is to hide the pKG__{A,B}.cmi in a directory .private/, effectively making these modules inaccessible, but it also breaks the module aliases.

1st try: module inclusion

So the issue is providing modules which are aliases. Then, what about this:

File PKG.ml:

module A
: module type of A
= struct include A end

module B
: module type of B
= struct include B end

In this case, building the library with dune still produces the underscored modules PKG__ and PKG__{A,B} but, as we used module inclusion instead of module aliases, the sub-modules of PKG are not linked to these underscored modules. We can get rid of them once PKG is built. That is, we can get rid of their .cmi files, and they won’t be accessible anymore to the final user. We still need their .cmt/.cmti files for documentation.

That’s closer to what we want, but I found a number of pitfalls.

  1. I suspect that, if B depends on A, then A would be duplicated in the archive (both the original module PKG__A and its copy PKG.A would be packed in the archive; PKG.B would use PKG__A internally).
  2. If module B has a type u = A.t, then the type equality between PKG.A.t and PKG.B.u is lost.
  3. odoc produces cyclic links (e.g. if A has a functor Make, the documentation for A.PKG.Make is missing, instead it brings back to PKG.A).

2th try: source-level preprocessing

In fact, the only way I found to get exactly what I expect is by using a preprocessor to include modules at the source level:

File PKG.ml.cpp:

module A
#if __has_include("PKG/A.mli")
  : sig
    #include "PKG/A.mli"
  end
#endif
  = struct
    #include "PKG/A.ml"
  end

module B
#if __has_include("PKG/B.mli")
  : sig
    #include "PKG/B.mli"
  end
#endif
  = struct
    #include "PKG/B.ml"
  end

File dune:

(library
	(public_name  pkg)
	(name         PKG)
	(modules      PKG)
)

(rule
	(target  PKG.ml)
	(deps    (:dep PKG.ml.cpp) (glob_files PKG/*.ml{,i}))
	(action  (run cpp %{dep} -o %{target}))
	; source files are moved to a directory PKG/, because otherwise
	; the glob would also match PKG.ml, creating a cyclic dependency
)

This is easier to understand, and produces the right thing with respect to both exposed modules and documentation (you’ll still have to customize a bit the pkg.install, because you’ll likely want to provide the true source files PKG.ml.cpp and PKG/* instead of the generated PKG.ml). But, of course, we lose separate parallel compilation…

At this point, it does not seem reasonable to keep trying. Has someone else solved this issue? Or is it regarded by the community as a non-issue? I can see that real-world packages such as re are happy with bundling modules Re__*.

If I understand you correctly, the issue is that ‘physical’ modules like PKG__{A,B} are being exposed to package consumers, instead of just aliases like PKG.{A,B}, right?

My impression is that this is an accepted trade-off in the community right now. People are thinking about introducing true namespacing into OCaml, but that will take some time to figure out and in the meanwhile we use aliasing which works in this way as you can see.

Now, an argument has been made that this is a tooling issue and that tools should ignore ‘physical’ modules i.e. any module with a double-underscore in its name. Various tools in the ecosystem have varying levels of support for that, I guess. The problem is you have to get all tools involved (dune, odoc, ocamldoc, utop, opam, merlin) to agree that this is the way to go and then actually implement it.

Your system is basically replicating -for-pack but using a preprocessor. Unfortunately, the fundamental problem with this technique is that you will never decent incremental build times as the number of modules and libraries will increase.

However, one way to improve the current system without waiting on namespaces is to agree on some sort of convention for private modules. One way is indeed to bless the Alias__ convention in tools such as utop, odoc, etc. Another possibility is agree on some sort of ppx attribute for marking modules as “private” ([@@@private] for example). The advantage of this is that it’s independent of any mangling scheme and that it will also work for private modules in unwrapped mode. Of course dune will need to be updated to emit this modules that follow this convention.

Ok, this is what I was suspecting, but I guess I needed confirmation. I look forward to have namespaces in OCaml. Otherwise it doesn’t seem hard to add a boolean field to each module in an archive file to indicate whether they should be visible from the outside.