What is the reason of separation of module implementation and signatures in OCaml?

#16

See A History of OCaml. OCaml has its origin in Caml, Caml Light, and Caml Special Light as new implementations of ML – originally as a bytecode interpreter. Standard ML has signatures but at least Standard ML of New Jersey is an image system without separate compilation whereas Caml Light had files mapped to top-level modules. This raised the question how to implement signatures for those and I believe interface files were a natural answer. In particular, interface files shield client modules from re-compilation if only the implementation changes. I believe this is not true in the case of native compilation, though.

Addendum: as pointed out below by @dbuenzli, the -opaque option can be used to make natively compiled modules only depend on interfaces for faster re-compilation at the cost of reduced code quality.

4 Likes

#17

That will indeed be useful, but yeah if it’s not done already I imagine there are some considerations involved.

0 Likes

#18

Nobody mentions that separation between module implementation and signature allows separate compilation or compilation in parallel.

(* a.ml *)
let message = "hello world !"

(* a.mli *)
val message : string

(* b.ml *)
let () = print_endline A.message

and now (sleep 1 is here only to be sure that compilation of a.ml will not end before b.ml is compiled):

% ocamlc -c a.mli; (ocamlc -c  b.ml & (sleep 1; ocamlc -c a.ml)); ocamlc a.cmo b.cmo -o prog

% ./prog
hello world !

without a.mli

% rm *.cm* a.mli
% ocamlc -c b.ml & (sleep 1; ocamlc -c a.ml)
[1] 3392
File "b.ml", line 1, characters 23-32:
Error: Unbound module A
[1]+  Termine 2               ocamlc -c b.ml
8 Likes

#19

For reference, here is the orginal Xavier Leroy’s paper on the module system Manifest types, modules and separate compilation (emphasis is mine). The first paragraph of introduction is named : Modules and separate compilation.

4 Likes

#20

@kantian good point, I wasn’t fully aware of what you just stated. But I think that you can still have separate compilation without mli files, because the Ocaml compiler knows to create a cmi file from the ml file when there’s no mli file present.

2 Likes

#21

It’s a bit more powerful than that since you can compile against a .mli file without having any implementation which you only need to provide a link time. This means that you can compile against a given interface and choose an actual implementation (i.e. the concrete .ml file) only at link time.

Note that for this to work in native code it used to be the case that you had to hide the corresponding .cmx file otherwise those would be used for cross-module inlining. Nowadays you should compile these mlis with the -opaque option for this to work. See the docs.

5 Likes

#22

For sure, but in this case the compilation is not fully separated (and that’s why I added sleep 1 in my example, without it the second example may succeed). And it’s cheaper to compile a .cmi from an .mli one than from an .ml one (you have to type check the .ml first). Moreover, as @dbuenzli said, you don’t even need to implement a.ml in order to compile b.ml.

% ls
a.mli  b.ml

% ocamlc -c a.mli b.ml

% ls
a.cmi  a.mli  b.cmi  b.cmo  b.ml

% cat << EOF > a.ml
> let message = "hello world !"
> EOF

% ocamlc a.ml b.cmo -o prog
% ./prog 
hello world !
0 Likes

#23

I always write the .ml first and the .mli last, when everything is working.

2 Likes

#24

One reason is to allow for separate compilation.
I.e. having builds that scale despite the size and complexity of the software, if it is
well designed into separated modules.

0 Likes

#25

Standard ML has signatures but at least Standard ML of New Jersey is an image system without separate compilation whereas Caml Light had files mapped to top-level modules.

I think comparison with SML is helpful. I prefer the SML approach, which does not treat files as implicit modules, so does not require signature files for specifying an interface. It is optional whether you put the signature in the same file or in a different one. But, of course, there are trade offs and, iiuc, in SML one generally ends up moving signatures to a .sig file once the implementation file becomes sufficiently complex.

Despite my preference, this isn’t a bike shed I’d want to spend a whole heap of time in :slight_smile:

1 Like

#26

All that can be done without mli files, because the compiler can produce cmi files from ml files directly (although as kantian said the build will be slightly slower).

0 Likes

#27

No, you will not have separate compilation without .mli file : this is a problem of dependency graph. Instead of drawing a graph, I’ll write it in a Makefile.

With .mli file:

all: hello.byte hello.native

hello.byte: a.cmo b.cmo
	ocamlc -o $@ $^

hello.native: a.cmx b.cmx
	ocamlopt -o $@ $^

%.cmi: %.mli
	ocamlc -c -opaque $<

%.cmo: %.ml
	ocamlc -c $<

%.cmx: %.ml
	ocamlopt -c $<

# objects files depend only on a.cmi
a.cmo a.cmx b.cmo b.cmx: a.cmi

Then the compilation is fully separated, if I touch a.ml I will not have to recompile b.ml:

% make
ocamlc -c -opaque a.mli
ocamlc -c a.ml
ocamlc -c b.ml
ocamlc -o hello.byte a.cmo b.cmo
ocamlopt -c a.ml
ocamlopt -c b.ml
ocamlopt -o hello.native a.cmx b.cmx

% touch a.ml
% make #no need to recompile b.ml
ocamlc -c a.ml
ocamlc -o hello.byte a.cmo b.cmo
ocamlopt -c a.ml
ocamlopt -o hello.native a.cmx b.cmx

Without .mli, the dependency graph in the Makefile will be:

all: hello.byte hello.native

hello.byte: a.cmo b.cmo
	ocamlc -o $@ $^

hello.native: a.cmx b.cmx
	ocamlopt -o $@ $^

%.cmo: %.ml
	ocamlc -c $<

%.cmx: %.ml
	ocamlopt -c $<

# objects files from b.ml depends on objects files from a.ml
b.cmo: a.cmo
b.cmx: a.cmx

Hence, if I touch a.ml I have to recompile b.ml, the compilation is not separated:

% make
ocamlc -c a.ml
ocamlc -c b.ml
ocamlc -o hello.byte a.cmo b.cmo
ocamlopt -c a.ml
ocamlopt -c b.ml
ocamlopt -o hello.native a.cmx b.cmx

% touch a.ml
% make #must recompile b.ml
ocamlc -c a.ml
ocamlc -c b.ml
ocamlc -o hello.byte a.cmo b.cmo
ocamlopt -c a.ml
ocamlopt -c b.ml
ocamlopt -o hello.native a.cmx b.cmx

Summary: with separate compilation, once you have a stable interface, you can modify its implementation without having to recompile every module which depends on it. :wink:

5 Likes

#28

I’m rather skeptical about the usefulness of the particular type of separate compilation you described, it looks like a micro-optimization to me. You’d need empirical data and studies (or at least personal examples from your experience) to convince me of its usefulness.

In my experience, “changing the implementation without changing the interface” is only one possible way things can change, and a rare one. Much more common are :

  • The unstable phase, where the implementation and interface change together.
  • Coexisting different implementations, you choose the one that best suits your needs each time (note that with the mli system, you are forced to have distinct mli files with identical content in this scenario, another example of “duplication of information”).
  • Even when an interface has become stable and you love it the way it has stabilized, much later you will like to add a tiny little bit of functionality here and there (it is more pleasurable to enlarge a familiar module than to create a brand new one with a few isolated functionalities), and here again, the interface will change along with the implementation.
1 Like

#29

In the ML module system, modules represent abstract data types with existential types, as shown in the foundational work by Mitchel and Plotkin. Compare with conventional languages, such Java or C++, where abstract data types are (poorly) modeled with classes (and interfaces) that bind together the nominal abstraction with the set of methods (operations). The ML module system does not invent any ad-hoc constructs, such as classes, but relies on mathematics to deliver proper definitions that are well-tested by time. In the ML module system, structures denote mathematical objects, and their types are denoted by the signatures.

The separation between abstraction and implementation is the essential part of modular programming in particular and reasoning in general. Properly chosen abstractions reduce the amount of information that we need to reason about and allow us to build complex systems from smaller parts. One of the responsibilities of the modular system in programming languages is to protect the abstractions by ensuring that modules depend on abstractions, not implementations. Consider Python, Common Lisp, and many other dynamically typed languages that do not protect the abstractions as they do not provide mandatory information hiding mechanisms. As a project evolve, the diffusion process rots through the module binaries, that essentially leads to projects that are hard to maintain and hard to understand.

Of course, the ML module system is not the only mechanism for implementing abstract data types. We have also classes and interfaces (as in Java,C++), another option is to use type classes as in Haskell (they all basically differ in the way how the represent polymorphism - that’s a completely different topic). But in any case, just having types of definitions, without providing a mechanism to define types of mathematical structures (i.e., sets of operations) is not enough.

Whether or not to have a separate mli file for signatures that is a design question. I personally like it, though it poses some technical problems and doesn’t play well with namespaces. In OCaml, you can consider mli files as a shorthand and even consider them optional. Some projects (e.g., ocamlbuild) define all their abstractions (module types) in one ml file, that is used then, in different implementations. Although that’s not common today, it’s a viable option.

7 Likes

#30

I’m describing separate compilation, but you are not arguing against .mli file but against separate compilation. Separate compilation is this : a module B uses values define in module A but you can compile B without having to compile A. In what you describe, this is never possible. If you want an example of usefulness: you just change the implementation of one function in A for performance reason.

By the way, more fundamentally, you should ask you this question: “I was not aware of certain things (you admit it) and the compiler writers provide mechanism to handle these (for instance -opaque option), hence maybe this is useful even if I still don’t see why?” Or to say thing other way: I have not to convince you, but you have to convince the OCaml team and the OCaml users that .mli are not useful. :wink:

1 Like

#31

I think the concept of separate compilation is nice as an ideal, but we’re going to see less and less of it de facto, simply because performance requires peeking into the dependent modules and importing/inlining functions and values. We see this happening with Flambda, and as the performance gap between optimized and non-optimized code increases, it’ll make separate compilation (as enabled by mli files, rather than the dependency graph) into an abstract topic.

1 Like

#32

It is true, but can just lead to two compilation “modes”. To some extend, while hacking on a piece of code you just want separate compilation in order to have short edit/compile/test cycles, and later accept to sacrifice separate compilation for performance when building the “final” binary.

3 Likes

#33

Many very good points were raised already, so I’ll just add a couple of historical notes and personal preferences.

The “one compilation unit = one .mli interface file + one .ml implementation file” design goes back to Caml Light and was taken from Modula-2, Wirth’s excellent Pascal successor. As previously mentioned, it works great with separate compilation and parallel “make”.

But the main point in favor, in my opinion, is that it clearly separates the user’s view of the module (the .mli file) from the implementor’s view (the .ml file). In particular, documentation comments on how to use the module go to the .mli file; comments about the implementation go to the .ml file. This stands in sharp contrast with most of the Java code I’ve written and read, where comments are an awful mixture of documentation comments and implementation comments, and serious IDE support is needed to see just the user’s view of a class and nothingelse.

Also, in the .mli file declarations can be listed in the most natural / pedagogical order, starting with the most useful functions and finishing with less common ones, while definitions in .ml files must come in bottom-up dependency order.

OCaml combines this Modula-2 approach to compilation units with a Standard ML-like language of modules, featuring nested structures, functors, and multiple views of structures. The latter is a rarely-used but cool feature whereas a given structure can be constrained by several signatures to give different views on the implementation, e.g. one with full type abstraction for use by the general public and another with more transparent types for use by “friend” modules. Again, and perhaps even more than in Modula-2, it makes sense to separate structures (implementations) from signatures (interfaces) that control how much of the implementation is visible.

27 Likes

Project structure and build system for a project with OCaml native on the backend and Bucklescript on the frontend
#34

For sure, this method is very very very well tested since a long time, Euclid already used it to organise his discourse. In ML module system terminology, Elements of geometry is a package with 13 modules (named Book 1 to Book 13). For instance, the value Prop 1 from Book 2 (cf. page 50) use the values Prop 31 and Prop 34 from Book 1, hence the [Prop 1.31] and [Prop 1.34] in the proof. :slight_smile:

@xclerc already answered this point : it depends why you compile your code.

  • if you are hacking, correcting bugs, reorganizing code, factorising… then it’s useful to rely on separate compilation (mostly with bytecode that you can then test in the toplevel);
  • if you are releasing, or pushing to prod, then you can get rid of separate compilation for performance reason.
2 Likes

#35

This is something I’m really interested in, because it seems like a genuine problem that comes up quite often. How hard would it be to take, for example, the Map module in Stdlib, and make an additional Map.Internals module that has full access to the internals of the Map and is also fully compatible with the regular Map interface?

5 Likes