De-duplicating module signatures that depend on abstract data types

MattWindsor91 · May 20, 2019, 8:19am

In trying to avoid duplicating my module type signatures across ml and mli files, I’ve ended up using the _intf.ml pattern:

(* foo_intf.ml *)
module type Basic = sig (* ... *) end
module type S = sig (* ... *) end

(* foo.mli *)
include module type of Foo_intf
module Make (B : Basic) : S

(* foo.ml *)
include Foo_intf
module Make (B : Basic) : S = struct
  (* ... *)
end

This usually works well (though I’m not sure if there is something more elegant I can do using one of the ppx_import type things). However, suppose I now want to add a module representing an abstract data type:

(* foo.mli *)
module Config : sig
  type t
  (* ... functions ... *)
end

(* foo.ml *)
module Config = struct
  type t = (* ... *)
end

If I want to then use Config.t inside the module types I declared in Foo_intf, then I find that I can’t easily do so without either:

moving the implementation into foo_intf and either leaving it transparent or restricting the interface I import out of it with an 'expose these in the mli’ signature at the end of foo_intf;
adding the type into the Foo.Basic and/or Foo.S module types, then changing Make's types to add sharing constraints/destructive substitutions to insert Config.t;
declaring Config.t in another file and referring to it from foo_intf.

All of these approaches have fairly unpleasant drawbacks (I lose abstraction, bloat my code with more Weird Module System Things™, or have to split up what is conceptually one module just to solve a dependency problem). Is there anything I’m missing here?

(It may very well be that the problem is using _intf.ml in the first place )

ivg · May 21, 2019, 1:41pm

It looks like that you have abstracted your question too much, it is really hard to guess what you are trying to do. Thefore my answer would be a little bit unstructured.

There a couple of problems with your approach. It could be because you are misunderstanding some of the concepts in OCaml’s module language, or that you are misusing them, and trying to apply modules in the way in which there weren’t designed.

First of all, I would like to advise against using the include module type of construct. It has very few legit uses, and better should be avoided as it has several drawbacks and caveats. Like, for example, given a module Y : module type of X = X, we don’t have type equality between Y.t and X.t. Or even stronger, module type of X refers to types which are different from the types of X.

The same is more or less true for the include statement, you shall also use it sparingly. An abstraction that is introduced via include or, worse, include module type of is not an abstraction. Basically, if you want to refer to an abstraction, you shall refer to it directly by its name. If you want to refer to several abstractions, without having to enumerate them all, then instead of using the include statement, you shall create a new abstraction which refers all the abstractions you need directly by name, and then refer to this abstraction by name. Probably, the only legit usage of the include statement is when you’re extending an existing abstraction, e.g.,

module type S = sig 
    type t 
    val init : t
    val succ : t -> t
end

module Extend(Base : S) : sig 
   include S with type t = Base.t
   val (+) : t -> t -> t
   val (-) : t -> t -> t
   (* ... *)
end

Another idea, that you might be missing, is that when you define a signature with an abstract type, e.g.,

module type S = sig 
   type t 
   val init : t 
   val succ : t -> t
end

Then every time you reference the signature S, either as a type of a functor parameter or as a module type in your interface, the type S.t will be always different, e.g.,

module X : S  
module Make(P : S) : S

In the example above, we have type X.t different from type Make(X).t as well the type P.t of the parameter of the functor Make is different and incompatible from X.t and Make(X).t.

If you want to make them equal, you should use manifest types, for that, e.g., to make the functor Make return a module which has type t that is the same type that was passed to it, you have to manifest this,

 module Make (P : Basic) : S with type t = P.t

To summarize, when you define an abstract type

module X : sig 
   type t
   val init : t
   val succ : t -> t
end

You define a structure with a set t and a pair of operations init, succ defined for that set. But when you define a module type

module type S = sig 
   type t 
   val init : t
   val succ : t -> t
end

You define an abstraction of an abstraction, i.e., a set of sets equipped with two operations, init,succ. And therefore, every time you reference an abstraction S you’re referencing different sets.

Going back to your problem, you shall decide whether your foo module operates with a set of sets abstraction, i.e., it is generic and applicable to any module which implements the Basic interface. Or it is actually specific to a particular abstract type Config.t with a specific interface S. If the latter, then it doesn’t make any sense to use a functor. It could be also possible, that you are just missing the sharing constraints in your interface and that is what confuses you.

Finally, the _intf.ml idiom should be used very differently. It is usually used, when you have several module types and a functor (or several functors) which operate on those module types, therefore in order to avoid duplication of signatures between the implementation and the signature files, we define a third file with all those module types, and then use those module types (usually with sharing constraints) by names, e.g.,


(* foo_intf.ml *)
module type Basic = sig (* ... *) end
module type S = sig (* ... *) end

(* foo.mli *)
open Foo_intf
module Make (Input : Basic)  : S with type t := Input.S

(* foo.ml *)
open Foo_intf
module Make (B : Basic) : S = struct
  (* ... *)
end

On rare occasions, when it is nearly impossible to avoid this, we will do

(* foo.mli *)
include Foo_intf.S

You might see the code like this in Core, but you shouldn’t repeat this approach in your code. Not because it is bad, but because it is very specific to Janestreet Core library history and OCaml history.

MattWindsor91 · May 21, 2019, 3:37pm

It looks like that you have abstracted your question too much, it is really hard to guess what you are trying to do. Thefore my answer would be a little bit unstructured.

Ah, apologies. I’ll clarify: the main issue I’m worrying about here is when I have something like this in my .mli (ignoring the intf pattern for now):

(* foo.mli *)

module Abstract_type : sig
  type t [@@deriving foo, bar, baz]
end

module type Basic : sig
  type t
  (* something *)
end
module type S : sig
  type t
  (* possibly `include Basic with type t := t` here *)
  val get_abstract : t -> Abstract_type.t
end
module Make (B : Basic) : S with type foo = B.foo (* ... *) end

The problem then is avoiding, in the implementation, duplicating the module types Basic and S. By misreading Jane Street code, I got into my head that the usual way to avoid the duplication would be doing something like:

(* foo_intf.mli *)

module type Basic : sig type t (* etc *)

module type S : sig
  type t
  (* and now we hit a problem -- we don't have Abstract_type.t! *)
  val get_abstract : t -> ??
end

(* foo.mli *)

include module type of Foo_intf (* !! *)
(* and so on *)

The question I was getting at is that I was wondering how to avoid this situation, where by pulling my module types out into an _intf file, I’ve lost the ability to refer to the Abstract_type I’ve introduced in foo.mli. The only ways I could think of were to add it back in at the return type of the Make functor using a sharing constraint, or to abandon the _intf hack itself.

Replies to rest inline:

There a couple of problems with your approach. It could be because you are misunderstanding some of the concepts in OCaml’s module language, or that you are misusing them, and trying to apply modules in the way in which there weren’t designed.

Almost certainly This has become a recurring theme in my posts here this month. It seems like intermediate/advanced module usage is either something that isn’t visibly covered in the literature, or I managed to avoid all of it. (I came into ocaml through Real World OCaml v1—I know there was some module and functor documentation there, but I’m not sure how deep it went?)

First of all, I would like to advise against using the include module type of construct. It has very few legit uses, and better should be avoided as it has several drawbacks and caveats. Like, for example, given a module Y : module type of X = X , we don’t have type equality between Y.t and X.t . Or even stronger, module type of X refers to types which are different from the types of X .

Fair—what you said at the end about the Jane Street codebase is what I missed here. I assumed it was a common idiom.

If you want to make them equal, you should use manifest types, for that, e.g., to make the functor Make return a module which has type t that is the same type that was passed to it, you have to manifest this,

Yeah, I’m aware of sharing constraints/destructive substitution, but I didn’t make this clear in my question. Apologies!

Going back to your problem, you shall decide whether your foo module operates with a set of sets abstraction, i.e., it is generic and applicable to any module which implements the Basic interface. Or it is actually specific to a particular abstract type Config.t with a specific interface S . If the latter, then it doesn’t make any sense to use a functor. It could be also possible, that you are just missing the sharing constraints in your interface and that is what confuses you.

Yeah, I meant to ask specifically about cases where there is generalisation. (Though, in my OCaml code at the moment, there’s a general tendency to overuse functors where parametric types would’ve done. I’m trying to stamp this out.)

Finally, the _intf.ml idiom should be used very differently. It is usually used, when you have several module types and a functor (or several functors) which operate on those module types, therefore in order to avoid duplication of signatures between the implementation and the signature files, we define a third file with all those module types, and then use those module types (usually with sharing constraints) by names, e.g.,
(* foo_intf.ml *)
module type Basic = sig (* ... *) end
module type S = sig (* ... *) end

(* foo.mli *)
open Foo_intf
module Make (Input : Basic)  : S with type t := Input.S

(* foo.ml *)
open Foo_intf
module Make (B : Basic) : S = struct
 (* ... *)
end

In this case, I presume I’d need to refer to the Basic and S types from outside Foo by either opening Foo_intf or replacing instances of Foo with Foo_intf; is this idiomatic? It seems like a bit of an abstraction leak, but I figure it’s better than the ocaml abuse I was doing earlier.

You might see the code like this in Core, but you shouldn’t repeat this approach in your code. Not because it is bad, but because it is very specific to Janestreet Core library history and OCaml history.

Ah. I figure that trying to get my idioms from Core without knowing their context was a big mistake

ivg · May 21, 2019, 4:49pm

Yes, it is. I think that this is a (poor) name choice that confuses you. You assume, that Foo_intf has actually something to do with the Foo module, hence the name. In fact, the idea is that you define your abstractions in the module Foo_intf and then your Foo and other modules depend on abstractions, rather than on the implementations like Foo. I, myself, rarely, if ever, use the foo_intf.ml naming scheme. Usually, I tell myself that if I can’t give a name to abstraction, then it is probably a bad abstraction to start with. I usually define some number of module types in a file called library_types.ml, e.g., compiler_types.ml. And then refer to those abstractions where necessary. Note also, that any module type in OCaml acts as a generator for a family of module types, e.g., if you have a module type

module type S = sig 
   type t 
   val init : t 
   val succ : t -> t 
end

Then you can use it to create module types for concrete types, e.g., S with type t = int and S with type t = expr, etc. Therefore, your module types in compiler_types.ml should be as free from constraints as possible/reasonable.

Concerning your style (I looked into act), it looks like it is heavily influenced by Haskell, where you define type classes, make your implementation dependent on those type classes, and then instantiate a solution with a concrete selection of implementation types.

Before applying the same approach in OCaml you shall consider a couple of differences between those two languages and correct your approach accordingly.

OCaml provides a stronger and more powerful module system than Haskell
OCaml functors are more expressive, but less mechanized than Haskell type classes, therefore they usually employ more cognition burden

The first point, is that where in Haskell you have only type classes to protect your abstractions, in OCaml you have module types, with sharing constraints and strengthening. Modules with abstract (opaque) types provide sufficient enough protection, so in most cases it is fine to depend on a concrete module rather on an abstraction that this module implements, e.g., consider the following two approaches:

module type Var = sig (* ... *) end
module type Exp = sig (* ... *) end
module type S = sig 
   type exp 
   val run : exp -> exp
end
module Optimizer(V : Var)(E : Exp with type var = Var.t) : 
   S with type exp = Exp.t

which basically mimics the Haskell style, where you have two type classes (Var and Exp) and a generic function run_optimizer defined in the context of those two type classes. And finally, you have a particular instantiation of your framework with concrete instances of type classes,

module Exp = Non_hashconsed_exp(String) 
module Optimizer = Optimizer(String)(Exp)

let main input = 
   Exp.deserialize input |> 
   Optimizer.run |>
   Exp.serialize

This is a perfectly fine solution, where you try to be as generic as possible, so that your code will become robust to the future changes. And I’m not advising against this style, except that the more abstractions you introduce, the more indirections you have, the more delayed choices you make, the higher is the cognition burden of your framework, which at the end of the day contributes to its maintainability, testability, and usability. So you have apply the Occam razor principle and use the least heavy method when you build your system and call for heavy artillery only if and when needed. Going back to our example, it is perfectly fine to implement the Optimizer module referring directly to Var and Exp modules, especially since we don’t have (and probably do not plan in the near future to have many different implementations of those). Keep in mind though, that when you write a function val optimize : Exp.t -> Exp.t you’re actually introducing a dependency to an abstract type Exp.t not to a concrete implementation, so you are protected from the technical dept of the poor choices made in the exp.ml implementation by the exp.mli interface. Therefore, you shall design the exp.mli interface very carefully, basically, you shall try to find the strongest possible theory, that is sufficient enough to implement the optimize function, without leaking any details. Therefore, if later you will decide to try another representation, you can generalize your Optimize module and make it a functor and go back to the functorized solution once it is really needed. You can even make it backward compatible, i.e., without breaking the interfaces. E.g., it starts as

(* file optimizer.mli *)
val run : Exp.t -> Exp.t

which is later generalized to

(* file optimizer_types.ml *)
module type Exp = sig ... end
module type Var = sig ... end
module type S = sig 
   type exp
   val run : exp -> exp
end

and

(* file optimizer *)
open Optimizer_types

module Make(E: Exp)  : S with type t = E.t

(* and the default implementation, using concrete `Exp.t`  *)

include S with type exp = Exp.t

where the optimizer is usually generalized by just adding module Make(Exp : Exp) = struct ... end around the old function, e.g.,

(* file optimizer.ml *)
open Optimizer_types
module Make(Exp : Exp) = struct 
    let rec run input = 
       Exp.analyze input 
         ~case_add:(fun x y -> ...)
         ...
end

include Make(Exp)

To summarize, do not afraid to depend on modules, as long as your modules have sufficient mli files. A good indicator that you’re using a functor where you can just depend on a module is when you have lots of sharing constraints referring to concrete types in your mli files.

One final note, do not afraid to duplicate signatures, as duplicating signatures (even via copy pasting) is very different from code duplication. The main reason why the code duplication is conceived as a bad practice is because it duplicates errors, and once you fixed an error in one place it will still persist in the place where it was duplicated. However, when you duplicate your a signature, it is not a code, since it doesn’t have any runtime semantics. In other words it can’t go wrong. Moreover, whenever you will update your signatures, the compiler will automatically verify that all it duplicates references are still consistent, so that you can fix/update them.

Some may say, that duplication of interfaces duplicates the amount of reasoning about the code, since the reader might now need to read the same types twice. It is correct to the certain degree. However, indirection also increases cognition burden, and you know this by reading Janestreet’s interfaces where you will find lots of annoying include Foo_intf.S where foo_intf.ml itself includes other interfaces and so on, until you lost what you were looking for. So probably having all the interfaces here, at your hands, inlined is better.

I, myself, usually leverage this approach to two or more set of duplicated interfaces. For example, when I define a library, I have a set of modules each having the so called internal interface. And an umbrella module, which publishes a subset of those modules and an interface which is itself a subset of their union. And this interface I call the public interface. Here is a concrete example, which in fact involved lots of module types and functors. (Note it is a work in progress, so it lacks documentation). For more finished projects, consider the Bap Primus framework or the Monads Transformer Library. All those projects involve a substantial amount of signature duplication, e.g., all interfaces in monads_types.ml are repeated in monads.mli.

thierry-martinez · May 22, 2019, 12:48pm

You can indeed use ppx_import to avoid repeating the definitions of Basic and S twice, and doing this let you define Config.t first.

(* foo.mli *)
module Config : sig type t (* ... *) end
module type Basic = sig (* ... *) end
module type S = sig (* ... *) end
module Make (B : Basic) : S

(* foo.ml *)
module Config = struct type t (* ... *) end
module type Basic = [%import: (module Foo.Basic)]
module type S = [%import: (module Foo.S)]
module Make (B : Basic) : S = struct (* ... *) end

Yes, and you can even do the destructive substitutions before defining Make by defining the aliases Foo.Basic and Foo.S for Foo_intf.Basic and Foo_intf.S. This solution gives you a code which is very close to the code you obtain with ppx_import. (Aliases are repeated twice indeed, but the core of the signature of Basic and S is only written in foo_intf.ml.)

(* foo_intf.ml *)
module type Basic = sig type config_t (* ... *) end
module type S = sig type config_t (* ... *) end

(* foo.mli *)
module Config : sig type t (* ... *) end
module type Basic = Foo_intf.Basic with type config_t := Config.t
module type S = Foo_intf.S with type config_t := Config.t
module Make (B : Basic) : S

(* foo.ml *)
module Config = struct type t (* ... *) end
module type Basic = Foo_intf.Basic with type config_t := Config.t
module type S = Foo_intf.S with type config_t := Config.t
module Make (B : Basic) : S = struct (* ... *) end

I would just like to note that if foo_intf.ml is an interface file (without any values or modules), you may reflect this intention by defining instead foo_intf.mli (without foo_intf.ml). One of the benefit of this is that there will only be compiled foo_intf.cmi file for this module, and no .cm[ox].

Yes, but it is worth noticing that you can avoid these caveats with the following idiom: include module type of struct include X end.

There is also a maintenance burden that remains true when you duplicate interfaces.

ivg · May 22, 2019, 1:02pm

Yes, you can strengthen like this, but indeed this only highlights the problem. And it is not the only caveat actually.

Sure, read the next paragraph after the one you quoted.

thierry-martinez · May 22, 2019, 1:05pm

This paragraph is about cognition burden, isn’t it?

ivg · May 22, 2019, 1:17pm

Yes, but I personally do not really distinguish the two. Maintenance requires the maintainer to understand the code. So the easier it is to reason about the code, the easier it is to update the code.

Concerning the maintenance. The BAP core library public interface file is 10,000 lines of code. This interface is fulfilled by 5 internal libraries, each having about 40 modules with their own interfaces, giving more than 200 modules. Thanks to the separation I can easily update the interfaces, and even keep the semantic versioning. Moreover, the separation gives me extra flexibility, as I can line up modules in the interface not in the order of dependencies, but in an order of importance and ease of cognition.

MattWindsor91 · May 23, 2019, 10:36am

I would just like to note that if foo_intf.ml is an interface file (without any values or modules), you may reflect this intention by defining instead foo_intf.mli (without foo_intf.ml ). One of the benefit of this is that there will only be compiled foo_intf.cmi file for this module, and no .cm[ox] .

dune seems like it strongly recommends against doing that, because of something to do with the OCaml compiler not really supporting it? Or is that no longer the case? I mean, this would be ideal otherwise.

Yes, but it is worth noticing that you can avoid these caveats with the following idiom: include module type of struct include X end .

I did wonder what this idiom was for… it seems like there are way too many little tricks in the OCaml module system to learn right the first time!

At the moment, I’m leaning towards keeping my *_intf.ml files for now, but switching from include to open with a view towards eventually renaming them to something a bit more sensible. I don’t really have enough time (or knowledge!) to clean up act properly yet, but I’ll slowly get around to it.

thierry-martinez · May 23, 2019, 11:02am

I don’t know whether dune recommends it or not, but there is a stanza dedicated for that: (modules_without_implementation <modules>).

The OCaml compiler itself use modules without implementation: for instance, Asttypes and Parsetree are only defined by a .mli file.

One of the caveats of doing so is that you cannot include these modules (because there is no implementation to include), but you can use module type of on them (and you can even use the idiom module type of struct include <> end on them, because the include disappears after typing).

MattWindsor91 · May 24, 2019, 2:18pm

Ah, I think I misremembered dune’s warning: it’s actually this, emphasis mine:

(modules_without_implementation <modules>) specifies a list of modules that have only a .mli or .rei but no .ml or .re file. Such modules are usually referred as mli only modules . They are not officially supported by the OCaml compiler, however they are commonly used. Such modules must only define types. Since it is not reasonably possible for dune to check that this is the case, dune requires the user to explicitly list such modules to avoid surprises. <modules> must be a subset of the modules listed in the (modules ...) field.

I’m not sure whether to read this as ‘the OCaml compiler could break support of this at any time’, or ‘dune does something to make this work’, or indeed read anything into it at all. Does dune only support this because people already do it and it needs to support the pattern, or is it ok to start doing it as part of the solution to problems like this?

ivg · May 24, 2019, 5:25pm

I’m not sure how to interpret the “are not officially supported by the OCaml compiler” wording. In my interpretation it is simply not true. The OCaml system compiler supports mli only modules, uses them itself (or at least used), and will always support. Ok, the last could be an overstatement, as there is no legal bonding that will enforce OCaml developers to keep things as they are. So they may change whatever they would like. Maybe this is what “official support” meant

The mli only modules are indeed a little bit confusing and people are arguing whether it is worthwhile to have them at all. To understand them we need first to understand how OCaml compiler translates source code of a program into a binary (or bytecode). A program in its source code representation consists of types and definitions. Types are instructions to compiler, that constrain possible interpretations of definitions. Types are used for checking that a program definitions are consistent and for generating efficient machine code. The types are erased when a program is translated to the machine code by the OCaml compiler.

The OCaml compiler implements a separate compilation system, where each file is compiled into a compilation unit and then all compilation units that constitute a program are linked together into one big executable or archive. Compilation units contain only machine code, if you will create a file that doesn’t have any definitions, but only types (your typical _intf.ml file), the resulting compilation unit will be quite bogus, it will contain only one function called the entry point, which will have the following, pardon my x86, code

camlExample__entry:
	movq	$1, %rax
	ret

In other words, it will contain 9 bytes 0x48,0xc7,0xc0,0x01,0x00,0x00,0x00,0xc3, again sorry for x86, which will be automatically executed by the CPU when your program is loaded. Ok, there will be also some overhead during dynamic linking, as the camlExample_entry should be relocated and put into the symbol table.

However, we don’t need to pay this price, since types do not have runtime representation why do we need to create code for them. Indeed, the compilation system doesn’t require it. When you reference a type (or modtype, or class type) you are using the dot notation, X.elt, and what compiler is doing, and this is well documented and supported, it will look for the x.cmi (or X.cmi) file in the search path, which starts with the local folder and is controlled with the -I argument. CMI files, which stand for “compiled module interface” files, serve as repositories of types and are used only during compilation. They are of no purpose at runtime and therefore are never linked into the binary. However, when your binary is a library, those cmi files provide means to access definitions in your library, so they have to be installed if you want others people to be able to develop their own applications that use your library. They serve as .h files in C runtime, except that OCaml will not allow you to access a value in a library if you don’t have the corresponding cmi file. But for the runtime and linking (even dynamic) there are still not needed, only for compilation.

Therefore, the idea was, why should we pack our types in compilation units if we don’t need to. Instead, we can put our type definitions into an mli file, compile it to a cmi file and let the compiler to read our types from it. This will reduce the linking time, loading time, and startup time. Negligible, but still neat.

Of course, the build automation system has to provide some support for it, e.g., it should not try to find a corresponding ml file, it should install it along the rest of the library interface, and so on. There is also a small problem with module aliases, which, depending on whom you are asking, could be seen as no problem at all, a bug in the compiler, a problem with mli only files. I personally, treat this as a not-a-bug and not-a-problem. Just a little bit of abuse of syntax, introduced with module aliases, which is confusing.

To summarize, it is safe and ok to use mli-only files, if you understand what they are. And now you are. You won’t win a lot from using them though. Like a little bit faster linking time, smaller binaries, faster startups, and, the most important, awe from young programmers

Topic		Replies	Views
Avoiding duplicated definitions in module implementation and interface Learning	4	2667	February 9, 2018
Duplicated definition of signature Learning	16	1079	June 15, 2020
Including a signature from a separate file Learning compiler , module-type	2	587	March 11, 2023
Has there been a syntax proposed for combining .mli into .ml? Ecosystem syntax	27	1189	September 3, 2024
ICFP and improvement of module language Learning	19	838	October 29, 2024

De-duplicating module signatures that depend on abstract data types

Related topics