What is the reason of separation of module implementation and signatures in OCaml?

A module signature can hide details of the implementation, in particular it can hide the details of a type such that a client can only use what is revealed in the interface. Below is a fifo stack that uses internally two lists but this is not revealed to clients. A simpler implementation could use just one list. The implementation could also contain additional functions and values that can be hidden to a client (which is not the case here).

(* fifo.mli *)
type 'a t
val empty: 'a t
val push : 'a -> 'a t -> 'a t
val pop  : 'a t -> 'a t
val peek : 'a t -> 'a option


(* fifo.ml *)
type 'a t = 'a list * 'a list

let empty = ([], [])

let push y = function
  | []    , []    -> [y], []
  | []    , ys    -> assert false
  | xs    , ys    -> xs, y::ys

let pop = function
  | []    , []    -> failwith "fifo is empty"
  | [x]   , ys    -> List.rev ys, []
  | x::xs , ys    -> xs         , ys
  | []    , ys    -> assert false

let peek = function
  | x::xs , _     -> Some x
  | []    , []    -> None
  | []    , _     -> assert false

8 Likes

for such case we can use private/public declarations or naming conventions.

P.S. I’m not against that language design. I just want to understand the reason of such implementation cause I have some interesting (at least for me) ideas that I want to implement in my own ML-inspired language. So if there is no strong reason of such decision - I’ll try to implement signatures in other way (I’ll implement explicit typing with no type inheritance which also should speed up compiler).

Do you want to keep the concept of an interface such that you can decide whether an implementation matches an interface? If so, I think this is a good reason to have an explicit syntax for interfaces (sigend) of which interface files (*.mli) are a special case.

1 Like

I already argued about this elsewhere in this website.
Ideally one would have had only ml files, and any wishes the programmer had about how a value in a module should appear outside the module, would be specified in the ml file directly, around the definition of the value.
But that would have made the language harder to design and specify, I guess. Which is why OCaml retains this C-like cumbersome duplication of information into header and implementation files.

4 Likes

In your design, how would you express the signature of a functor argument?

It’s not cumbersome. It seems we keep having this discussion.

5 Likes

Indeed, and I thank you for linking to a post where I completely
disagree with everything you wrote. Let me answer to it :

“I makes it easier to orient oneself in large code bases. A single file I can peruse will indicate me the exact piece of functionality the module is exposing to the rest of the code base without having to crawl through the private parts of the module.”

On the contrary, the larger the module, the more incomplete and the less useful the mli is and in most cases you’re better off going to the ml directly.

“encourages interface design and thinking”

No, it only encourages writing a shiny mli and does nothing to promote clean, well-organized code in the ml where it is most needed.
To avoid the punishment of having to update everything twice, programmers cannot develop the interface and implementation concurrently and are forced into always writing the interface first, which is only merely one of several possible design paradigms.
Abstracting first is not always right. Abstraction is a trade-off like everything else. Think of leaky abstractions.

6 Likes

I have nothing against signatures or module types as part of the language. It’s mli files I object to.

2 Likes

An implementation (*.ml) is not forced to have an *.mli file but if you want to hide certain aspects, you would have to do that on the level of sub-modules.

(* fifo.ml *)

module T: sig
  type 'a t
  ..
end = struct
  type 'a t = 'a list * 'a list
  ..
end

But I understand that you would prefer annotations on value definitions that control visibility and only resort to signatures when this is not enough. (I don’t consider maintaining interface files much of a burden and consider them a good mechanism and place for documentation.)

3 Likes

There’s no such thing as an incomplete mli file. The .mli file precisely tells you what the module exposes to the rest of the code base. The rest if off limits thanks to the hiding property of interfaces and this is precisely what makes modular understanding of a code base easier.

Neither does not having .mli files ;-). At least you get clean interfaces and a clear summary of the entry points to the module.

You are not forced to write the interface first (you can also not write it at all initially). These things tend to be developed and refined in conjunction.

10 Likes

I can argue the exact opposite: the larger the implementation, the more useful the interface file becomes to skim over the irrelevant implementation details and only bother with understanding the interface i.e. how to use the module.

Not really. You can think about the interface and the implementation separately–in fact that’s the whole point. Whether you write a well-organised implementation or not is up to you; the mli just encourages writing a well-organised interface to make things easier for your users.

Sure, and you definitely don’t have to abstract first, in fact a lot of people don’t in OCaml. Write the implementation first, then worry about the interface.

Side note: @egoholic, some people ask the related question, ‘Why do I need two separate files for the module interface and implementation?’ In case you were wondering, you don’t actually need two separate files, if you want a single file you can use the include trick:

(* id.ml *)
include (struct
  type t = int
  let make int = int
  let toInt t = t
end: sig
  type t
  val make: int -> t
  val toInt: t -> int
end)

This defines and constrains the module to a signature in a single file.

14 Likes

I also personally like the clarity that comes from the separation.

That said, there are things that I find annoying about interface files:

  1. We can’t jump to a specific function implementation from interface (I think?).
  2. As a consequence of (1), we can’t jump to implementation from a documentation page.
  3. If you’re doing API-driven development, it would be nice to be able to automatically generate an .ml file with stub functions that satisfies the interface. A blog at Jane Street has a way to work around this but I think an editor support is nice to have.

I think .mli files could (and should), if present, be used as hints to do the .ml. Just like in Java where you say a class is implementing an interface, then the IDE would tell you that you’re missing an implementation of such-and-such methods.

4 Likes

I tried to talk more than once to @let-def in making merlin on C-c C-l switch from implementation to interface and vice-versa like C-c C-a does at the file level when the cursor is on the identifier of a val or let rather than stop with the unhelpful “Already at definition point” but I don’t remember exactly the outcomes of the discussion. Maybe the problem is not as trivial as one can think when you factor in the full power of the language (or maybe he thinks it’s a bad idea).

2 Likes

No one has mentioned what was an “original” reason, so the history behind this might be with only a few people. @xavierleroy?

I can only suspect that it was done for clarity. Given one can specify the parameter types in the .ml file and you don’t need an mli at all. But reading through the implementation code whilst searching for a signature doesn’t seem efficient; mli files are nicer indexes.

And if you were to abstract or obscure the implementation it makes sense to produce a simple file of signatures to escort with your library. Note that you can initially auto-generate your mli and the trim it down.

See A History of OCaml. OCaml has its origin in Caml, Caml Light, and Caml Special Light as new implementations of ML – originally as a bytecode interpreter. Standard ML has signatures but at least Standard ML of New Jersey is an image system without separate compilation whereas Caml Light had files mapped to top-level modules. This raised the question how to implement signatures for those and I believe interface files were a natural answer. In particular, interface files shield client modules from re-compilation if only the implementation changes. I believe this is not true in the case of native compilation, though.

Addendum: as pointed out below by @dbuenzli, the -opaque option can be used to make natively compiled modules only depend on interfaces for faster re-compilation at the cost of reduced code quality.

4 Likes

That will indeed be useful, but yeah if it’s not done already I imagine there are some considerations involved.

Nobody mentions that separation between module implementation and signature allows separate compilation or compilation in parallel.

(* a.ml *)
let message = "hello world !"

(* a.mli *)
val message : string

(* b.ml *)
let () = print_endline A.message

and now (sleep 1 is here only to be sure that compilation of a.ml will not end before b.ml is compiled):

% ocamlc -c a.mli; (ocamlc -c  b.ml & (sleep 1; ocamlc -c a.ml)); ocamlc a.cmo b.cmo -o prog

% ./prog
hello world !

without a.mli

% rm *.cm* a.mli
% ocamlc -c b.ml & (sleep 1; ocamlc -c a.ml)
[1] 3392
File "b.ml", line 1, characters 23-32:
Error: Unbound module A
[1]+  Termine 2               ocamlc -c b.ml
8 Likes

For reference, here is the orginal Xavier Leroy’s paper on the module system Manifest types, modules and separate compilation (emphasis is mine). The first paragraph of introduction is named : Modules and separate compilation.

4 Likes

@kantian good point, I wasn’t fully aware of what you just stated. But I think that you can still have separate compilation without mli files, because the Ocaml compiler knows to create a cmi file from the ml file when there’s no mli file present.

3 Likes

It’s a bit more powerful than that since you can compile against a .mli file without having any implementation which you only need to provide a link time. This means that you can compile against a given interface and choose an actual implementation (i.e. the concrete .ml file) only at link time.

Note that for this to work in native code it used to be the case that you had to hide the corresponding .cmx file otherwise those would be used for cross-module inlining. Nowadays you should compile these mlis with the -opaque option for this to work. See the docs.

5 Likes