Extend existing module

ocaml
module

#1

If I want to extend List with additional functions, I can just do:

include module type of List

val flat_map : ('a -> 'b list) -> 'a list -> 'b list
include List
let flat_map f l = flatten (map f) l

and that’s fine. But if I want to extend Format, this simple trick doesn’t work so well, because the new module will have distinct types. So I need:

include module type of Format
  with type formatter = Format.formatter
   and type formatter_out_functions = Format.formatter_out_functions
   and type formatter_tag_functions = Format.formatter_tag_functions

val foo : …
include Format
let foo = …

which is much less satisfactory because if Format gets new types, I have to remember to alias them properly.
It would be much simpler to have a form of include in signatures, that would automatically alias the existing types… Thoughts on that?


(WIP) Frequently Asked Questions
#2

Note that using module type of may lead to versioning difficulties and possible breakage: Serious OPAM package quality issues


#3

The issue of strengthening module type of (or of having two versions) has been discussed on Mantis, see MPR#5241 and MPR#6307. If I remember correctly, @lpw25 has opinions about it.


#4

Easy peasy! There is an OCaml idiom for that:

include module type of struct include Format end

To understand why that works, you need a bit of knowledge on the module type system. The basic idea is the following: the struct include X end dance forces the typechecker to “strengthen” the module signature. Strenghtening adds a manifest back to all the types declarations, which means that type t = <decl> is turned into type t = M.t = <decl>.

I’m not convinced module type of shoudl strengthen, since it would make impossible to get an non-strengthened type out of the module.


#5

This is true if you use module type of ... in a contravariant position. Using it in a covariant position, as @c-cube does, is very common to extend existing modules, and is not nearly as problematic.


#6

I’d like to nominate this for FAQ status @yawaramin. Extending a module while keeping full type compatibility is an issue that, while it may not come up every day, is tricky to figure out when it does come up.


#7

really neat, thank you! :crazy_face:


#8

I’m not convinced module type of shoudl strengthen, since it would make impossible to get an non-strengthened type out of the module.

The problem is that the idea of a “non-strengthened type” has no consistent definition. Trying to pretend that it does causes most of the problems with both module type of and with module. I’m hoping to change the behaviour of module type of in 4.07 (with an attribute to get back the old behaviour to help with the transition) along with a number of other strengthenings in the module system.

(I tried out making the change for 4.06, but without doing the other changes simultaneously there were too many annoying places that needed changing in existing code. With the additional changes I think the change should not be too painful)


#9

I do not understand this. It has a pretty clear definition as far as I’m concerned, at least on the theoretical level. Arguably, the exact theory behind with module is not particularly clear. Can you clarify what you mean (and the changes you want to make) ?

Still, if we were to have all the module operation strengthen signatures by default, we would still need an operation to “unstrengthen” signatures, which doesn’t sound particularly straightforward.
That being said, It would make strengthening even more invisible to the users than it is, which is probably a good thing.


#10

It has a pretty clear definition as far as I’m concerned, at least on the theoretical level

The underlying issue is that it produces different module types for modules with the same type. It is making distinctions between different syntactic representations of the same type – which breaks fundamental properties of the language’s static semantics.

Consider a module M and an alias to it N:

module M = struct type t end
module N = M

both these modules have type =M, or without module aliases sig type t = M.t end. You should be able to freely replace references to M with references to N. However, any notion of “unstrengthened type” will try to give sig type t end for M and sig type t = M.t end for N.

Similarly, consider the modules A and B:

type a
module A = struct type t = a end

module B = struct type t end
type b = B.t

Both these modules have type sig type t = a end (well = b for B) yet A has unstrengthened type sig type t = a end whilst B has unstrengthened type sig type t end. Here again two (pairs of) definitions that are equivalent are being distinguished based on their syntax.

It is worth noting that this is a separate issue from the fact that module type of is fragile in contravariant positions. That is also a case of module type of breaking nice properties of the language, but there the property is more related to inference than the core static semantics.


#11

Ok, I see your point, you want some module types to be indistinguishable, but currently module type of allows you to observe their differences. I understand the idea, and I agree these signatures should probably not be distinguishable.

That being said, I’m a bit worried that we will lose some expressivity on the way. As Jacques said in one of the mantis tickets, with type/module only allows you to be less abstract. If you strengthen by default, you get non-abstract types, and can’t use type substitution anymore. Do you have a new operation in mind that would allows us to recover that?

P.S. The second exemple is, imho, better, as It doesn’t rely on aliases/singletons to demonstrate the issue. :wink:


#12

I like your discussion about the “issue” of module type of, and I find your last example easier to follow. But, nevertheless, I want to be sure I really understand the point.

module type S1 = sig
  type a
  module M : sig type t = a end
end

module type S2 = sig
  module M : sig type t end
  type a = M.t
end

Here you say that these two signatures are equivalent, and indeed they are:

module A : S1 = struct
  type a
  module M = struct type t = a end
end;;
module A : S1

module B = (A : S2);;
module B : S2

module C = (B:S1);;
module C : S1

but module type of don’t give the same result for their inner module M:

module type S = sig include module type of A.M end;;
module type S = sig type t = A.a end

module type S = sig include module type of B.M end;;
module type S = sig type t end

is this the problem?


#13

Do you have a new operation in mind that would allows us to recover that?

I have a couple of ideas, although my personal suggestion when people want an “unstrengthened” type is that they should give the module type a name. If they want two modules to have the same module type then clearly the module type means more than just the type of a particular module – if it has a meaning then it can reasonably be given a name. Of course that answer doesn’t satisfy everybody.

One option to help people who refuse my advice is to have a foo module type available in the initial environment for any foo.cmi in scope. This gives the equivalent of the unstrengthened module type of all top-level modules.

Another option would be to relax some of the restrictions on destructive substitution to let people weaken module types. The operations of destructive substitution are already pretty ad-hoc, and it was recently extended so that it needs to check the consistency of its result sometimes, so this doesn’t seem a huge stretch.

However, in the short-term my plan is to just leave the old semantics of module type of available using an [@weak] attribute. That is the simplest change for people to make to their code to keep it working, and gives a strong hint to people that such uses might not be the best idea.

is this the problem?

Essentially, yes.


#14

The problem is not only with the module type of construct but also, and mainly, with the with construct to specify a type implementation. You previously said that one of your concern is:

but this is also the case with these two signatures:

module type S1 = sig
  type a
  module M : sig type t = a end
end

module type S2 = sig
  module M : sig type t end
  type a = M.t
end

S1 and S2 are semantically equivalent but can be distinguished based on their syntax:

module type A = S1 with type a = int;;
module type A = sig type a = int module M : sig type t = a end end


module type A = S2 with type a = int;;
Error: In this `with' constraint, the new definition of a
       does not match its original definition in the constrained signature:
       Type declarations do not match:
         type a = a
       is not included in
         type a = M.t

module type A = S2 with type M.t = int;;
module type A = sig module M : sig type t = int end type a = int end

#15

That is not really the same – that’s why I said “essentially” and also why I explicitly did not talk about module types in my examples. There isn’t really an issue with S1 and S2 being different but mutually convertible types. For a start, the order of their fields differ which means that they are not equal from OCaml’s perspective. However there is a problem with the contexts you get from open S1 and open S2 being distinct because the ordering of definitions and the directions of equalities should not distinguish contexts – however module type of allows you to observe the difference.

That being said this issue with with is also a bit unpleasant. Ideally S2 with type a = int would succeed by just producing S2 with type M.t = int – I’m not sure how hard that would be to implement though.


#16

That’s not really the same, but the signatures S1 and S2 are the ones infered by the compiler for your examples:

module A = struct
  type a
  module M = struct type t = a end
end

module B = struct
  module M = struct type t end
  type b = M.t
end

The compiler infers signature S1 for A and signature S2 for B. What you want to express is in SML syntax (not sure if it’s still the case, but it was at the time Xavier Leroy wrote his article Manifest types, modules, and separate compilation):

module type S = sig
  type a
  module M = sig type t end
  sharing type a = M.t
end

But it seems that the approach of sharing type constraints leads to undecidability of signature elaboration (§2.6 of Xavier’s paper).

Yes, but it is also the case for your second example, and that the reason why module type of A.M is different from module type of B.M, even if theirs “strengthened” signatures are similar.

I don’t know if the internal of the compiler has changed, but in the above paper Xavier Leroy defines strengthening relatively to a given path (§3.4) in such a way that a module can share type constraints with itself. In this case, we can define the strengthened signature of a module M its signature strengthened relatively to its path, which will give this signature for my example:

module type A_strengthened = sig
  type a = A.a
  module M : sig type t = A.M.t end
end

module type B_strengthened = sig
  module M : sig type t = B.M.t end
  type b = B.b
end

Is it this signature that you want to be returned by module type of ?


#17

Firstly, I would like to reemphasize that my example did not include the modules A and B in your example. In terms of your example, I was discussing the types of the two Ms with the different definitions of t and M in scope, not the types of A and B. These are subtly different things.

but the signatures S1 and S2 are the ones infered by the compiler for your examples

Yes, but it is also the case for your second example, and that the reason why module type of A.M is different from module type of B.M, even if theirs “strengthened” signatures are similar.

You are making a circular argument here, and I think that is where you are missing my point a bit. You are saying something equivalent to:

S1 is the result of module type of for A and S2 is the result of module type of for B, since S1 and S2 are not equal types then the types of A and B are different and so it is fine that module type of behaves differently for A and B.

The key point here is that S1 is not the type inferred for A, the type inferred for the module expression A is:

sig type a = A.t module M : sig type t = a end

i.e. the strengthened type that you asked about above. This is indeed the correct result for module type of. S1 is merely the current result of module type of on A.


#18

I don’t think so, but sometimes I have some difficulties to express clearly my thought in english. I’ll try to be more precise.

I disagree with that:

$ cat /tmp/test.ml 
type a
module A = struct type t = a end

module B = struct type t end
type b = B.t

$ ocamlc -i /tmp/test.ml 
type a
module A : sig type t = a end
module B : sig type t end
type b = B.t

and if I wrap these definitions in modules (this is more convenient for use in the toplevel):

module M = struct
  type a
  module A = struct type t = a end
end;;
module M : sig type a module A : sig type t = a end end

module N = struct
  module B = struct type t end
  type b = B.t
end;;
module N : sig module B : sig type t end type b = B.t end

the signature S1 and S2 (or some equivalent) are the ones inferred by the compiler. I guess that the compiler is using the rules described in section 3.2 Typing rules of Xavier Leroy’s article.

In my turn, I would like to emphasize that, according to section 3.4 Sharing constraints and type strengthening of Xavier Leroy’s article, the type inferred for module M.A is not its strengthened type which is:

sig type t = M.A.t end

but we can infer (using rules of section 3.3) that the two types are equivalent, and that’s why he wrote:

Now, I only note that when we call module type of on a path it returns the type of the path in the current environment. That’s why my argument is not circular, this is not even an argument but only a remark.

If I understand your point correctly, you consider this behavior as an issue. For my part, I have no strong opinion about this. Nevertheless, I’d like to point out this behavior:

(* a module which "extends" M.A *)
module A = struct  include M.A end;;
module A : sig type t = M.a end

(* `module type of' applied to the definition of A *)
module type S = module type of struct include M.A end;;
module type S = sig type t = A.t end

Therefore, I don’t see exactly why you consider that the solution of this topic is an issue. On the other hand, this solution may not seem intuitive to the user, and I think (can you confirm this?) that the compiler must first evaluate the struct include M.A end before computing its type, which could be inefficient. Maybe a solution to obtain the signature you want without adding any keyword to the language could be this syntax:

module type of include A

which will only be valid if applied to a path, in such a way that we can’t write module type of include struct type t end.

As illustrations, we would have these results:

module A = struct type t end

module type of A
(* returns *)
sig type t end

module type of include A
(* returns *)
sig type t = A.t

(* and if we have a concrete type *)
module A = struct type t = A | B end

module type of A
(* returns *)
sig type t = A | B end

module type of include A
(* returns *)
sig type t = A.t = A | B end

I think this could be a lightweight addition to the language without backward incompatibilities. The algorithm should be simple: retrieve the signature of the path in the environment and add the manifest types where appropriate.


#19

module M = struct
type a
module A = struct type t = a end
end;;
module M : sig type a module A : sig type t = a end end

Here you are being mislead by OCaml. Since it is already printing the name of the module (M) it does not print the = M.a and = M.A.t required to show the full type of the module. This is also true about OCaml’s internal representation in the type-checker – it does not make these equations explicit until they are needed. However, the equations are still part of the type of the module. For example,

module M = struct type a module A = struct type t = a end end

module F (X : sig type a = M.a module A : sig type t = a end end) = struct end

module N = F(M)

Xavier’s paper is the origin of some of this confusion. Since it calls the “unstrengthened module type” the “module type” it causes people to think that something which is really an implementation detail is actually the type.

I’d like to point out this behavior

module type of struct include M.A end

I am aware of this trick. Unfortunately, it doesn’t quite work correctly because module type of aggressively removes module aliases, but I’m going to fix that soon.

However, the point is not whether we can obtain the module’s proper type, but that we should not be able to obtain it’s “unstrengthened type”.


#20

I think this is mostly a matter of definition of words, but your definition has a good advantage: with it, a module and its aliases have all the same module type.

I think I get your point now, and I’m pretty agree with you. If I want two distinct modules have the same signature with an abstract type, then this signature should be an important concept (in my mind the words type and concept are synonymous) for my purpose and I should give it a name. This is why you previously wrote:

For instance, if I want to define the algebraic concept of a group (with an additive connotation), I have to rely on the module system and I proceed as follow:

module type GROUP = sig
  type t
  val zero : t
  val add : t -> t -> t
  val neg : t -> t
end

and then I can define a particular group structure on the type int that way:

module Int_Group : GROUP with type t = int = struct
  type t = int
  let zero = 0
  let add = (+)
  let neg = (~-)
end

Int_Group.(add 1 (neg 2));;
- : int = -1

If I understand correctly, your point is that obtain the “unstrengthened” type of a module to define module types (which are concepts) is a bad design practice and that the language should not support such bad practices. From this point of view, your proposition to add an [@weak] attribute, to emphasise that is not a good idea, is indeed a good idea.