dune is great for brainless, fast compilation. However it’s hard to control with precision what it produces.
Situation
Say we are writing a library pkg
with modules A
and B
, and we want a module PKG
as the single entry point of the library.
Files:
$ ls
A.ml B.ml dune
A.mli B.mli pkg.opam
File dune
:
(library
(public_name pkg)
(name PKG)
(wrapped true) ; this is the default
)
As of dune 1.11.3 (with OCaml 4.08.1), when building this package, dune will first alpha-rename modules A
and B
to PKG__{A,B}
, then compile them, then compile a generated wrapper module PKG
, with submodules {A,B}
which are aliases to PKG__{A,B}
.
This does provide an entry point named PKG
with the contents we want, but it also exposes internal modules PKG__{A,B}
in the environment (for example, they pollute the suggestions in utop). We cannot remove them because PKG.{A,B}
are mere aliases to them (we need their .cmi
for compiling, and their .cmti
for documentation).
Things become worse when we try to write the wrapper module ourselves (for documentation purposes, or to select what to expose, or to customize module paths, or to have toplevel values). Then, it appears that dune still generates the same wrapper module as before, only this time it calls it PKG__
. Only after that the generated module is compiled, our custom PKG.ml
file is compiled, and references to either A
or B
are aliased to PKG__.{A,B}
(which are themselves aliases to PKG__{A,B}
). So now we have one more internal module polluting the environment.
(I was going to develop further on how it makes the documentation generation fragile as you try more and more to customize your build, with mentions of “underscored” modules sparkling the final user documentation, and/or missing pages for the “non-underscored” modules, but I realize it’s going out-of-topic.)
0th try: private modules
dune has an option (private_modules A B)
, but its only apparent effect is to hide the pKG__{A,B}.cmi
in a directory .private/
, effectively making these modules inaccessible, but it also breaks the module aliases.
1st try: module inclusion
So the issue is providing modules which are aliases. Then, what about this:
File PKG.ml
:
module A
: module type of A
= struct include A end
module B
: module type of B
= struct include B end
In this case, building the library with dune still produces the underscored modules PKG__
and PKG__{A,B}
but, as we used module inclusion instead of module aliases, the sub-modules of PKG
are not linked to these underscored modules. We can get rid of them once PKG
is built. That is, we can get rid of their .cmi
files, and they won’t be accessible anymore to the final user. We still need their .cmt/.cmti
files for documentation.
That’s closer to what we want, but I found a number of pitfalls.
- I suspect that, if
B
depends onA
, thenA
would be duplicated in the archive (both the original modulePKG__A
and its copyPKG.A
would be packed in the archive;PKG.B
would usePKG__A
internally). - If module
B
has atype u = A.t
, then the type equality betweenPKG.A.t
andPKG.B.u
is lost. - odoc produces cyclic links (e.g. if
A
has a functorMake
, the documentation forA.PKG.Make
is missing, instead it brings back toPKG.A
).
2th try: source-level preprocessing
In fact, the only way I found to get exactly what I expect is by using a preprocessor to include modules at the source level:
File PKG.ml.cpp
:
module A
#if __has_include("PKG/A.mli")
: sig
#include "PKG/A.mli"
end
#endif
= struct
#include "PKG/A.ml"
end
module B
#if __has_include("PKG/B.mli")
: sig
#include "PKG/B.mli"
end
#endif
= struct
#include "PKG/B.ml"
end
File dune
:
(library
(public_name pkg)
(name PKG)
(modules PKG)
)
(rule
(target PKG.ml)
(deps (:dep PKG.ml.cpp) (glob_files PKG/*.ml{,i}))
(action (run cpp %{dep} -o %{target}))
; source files are moved to a directory PKG/, because otherwise
; the glob would also match PKG.ml, creating a cyclic dependency
)
This is easier to understand, and produces the right thing with respect to both exposed modules and documentation (you’ll still have to customize a bit the pkg.install
, because you’ll likely want to provide the true source files PKG.ml.cpp
and PKG/*
instead of the generated PKG.ml
). But, of course, we lose separate parallel compilation…
At this point, it does not seem reasonable to keep trying. Has someone else solved this issue? Or is it regarded by the community as a non-issue? I can see that real-world packages such as re
are happy with bundling modules Re__*
.