Nested functors: Compiler can't determine type equality

I have the following modules/functors:

module Number = struct
  module type S = sig
    type t

    (* ... *)
  end

  module OrderedVector (Num : S) = struct
    type t = Num.t Iarray.t

    let rec compare (a : t) (b : t) : int =
      (* ... *)
  end

  module Q : S with type t = Q.t = Q (* From [zarith] package. *)
end

module Polytope = struct
  module Make (Num : Number.S) : sig
    val h_to_v :
      (* ... *) ->
      Set.Make(Number.OrderedVector(Num)).t
  end
end

And a test case where I do the following:

module LinalgQ = Linalg.Make (Number.Q)
module PolytopeQ = Polytope.Make (Number.Q)
module VectorQ = Number.OrderedVector (Number.Q)
module VectorSetQ = Set.Make (VectorQ)

(* Tests follow here. *)

I got the following error:

173 |     (PolytopeQ.h_to_v ~simplify:false);
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This expression has type
         Number.Q.t Polytope.constr list ->
         Number.Q.t Polytope.constr list ->
         Set.Make(MyProject.Number.OrderedVector(MyProject.Number.Q)).t
       but an expression was expected of type
         Number.Q.t Polytope.constr list ->
         Number.Q.t Polytope.constr list -> VectorSetQ.t
       Type
         Set.Make(MyProject.Number.OrderedVector(MyProject.Number.Q)).t
       is not compatible with type VectorSetQ.t

But when I make the following change, it works:

 module LinalgQ = Linalg.Make (Number.Q)
 module PolytopeQ = Polytope.Make (Number.Q)
 module VectorQ = Number.OrderedVector (Number.Q)
-module VectorSetQ = Set.Make (VectorQ)
+module VectorSetQ = Set.Make (Number.OrderedVector (Number.Q))

Note that I simply replaced VectorQ with Number.OrderedVector (Number.Q), which is how I defined VectorQ. Now my code compiles. Why?

I tried to find a simple example to reproduce this behavior, but the following code compiles too:

module type T = sig
  type t
end

module M = struct
  type t = int
end

module F (A : T) = struct
  type t = A.t
end

module G (B : T) = struct
  type t = B.t
end

module F0 = F (M)
module G1 = G (F0)
module G2 = G (F (M))

type t0 = F0.t
type t1 = G1.t
type t2 = G2.t

let foo01 : t0 -> t1 = fun x -> x
let foo02 : t0 -> t2 = fun x -> x
let foo12 : t1 -> t2 = fun x -> x

So how exactly are the following lines different:

module VectorQ = Number.OrderedVector (Number.Q)
module VectorSetQ = Set.Make (VectorQ)

versus

module VectorQ = Number.OrderedVector (Number.Q)
module VectorSetQ = Set.Make (Number.OrderedVector (Number.Q))

I would assume they both should be the same, right? But they are not.

Any help is appreciated.


I assume it must have to do with me using VectorQ in my test.

let testable_qvec =
  Alcotest.testable
    (fun ppf vec -> Format.pp_print_array Q.pp_print ppf (Iarray.to_array vec))
    (Iarray.for_all2 Q.equal)

let testable_qvecset = Alcotest.slist testable_qvec VectorQ.compare

let testable_qvecset_triple =
  let x = testable_qvecset in
  Alcotest.triple x x x

And then later:

  let verify (name : string) func =
    let vertices = VectorSetQ.to_list @@ func basis_equations constraints in
    Alcotest.check testable_qvecset name expected_vertices vertices
  in
  verify "Vertices equal to expectation when using h_to_v without simplify."
    (PolytopeQ.h_to_v ~simplify:false);

But still, isn’t a module instantiated twice with the same functor argument exactly the same module?

I think I could find a toy example where I run into the same problem:

module type T = sig
  type t
end

module M = struct
  type t = int
end

module F (A : T) = struct
  type t = A.t
end

module G (B : T) = struct
  type t
end

module F0 = F (M)
module G1 = G (F0)
module G2 = G (F (M))

type t0 = F0.t
type t1 = G1.t
type t2 = G2.t

let foo : t1 -> t2 = fun x -> x

Notice how I made type t of functor G abstract (as Set.Make.t is an abstract type as well).

File "./nestedfunctor.ml", line 25, characters 30-31:
25 | let foo : t1 -> t2 = fun x -> x
                                   ^
Error: The value x has type t1 = G(F0).t
       but an expression was expected of type t2 = G(F(M)).t
1 Like

I don’t have a complete answer for you, but from Practical example of applicative vs generative functors? , that’s true if it’s an applicative functor vs. a generative one, and what you might be running into is discussed in Generative Functors , where there is a distinction to a functor over a module path vs. a functor over a module expression.

2 Likes

Thank you, I will try to understand.

Interestingly, the following compiles:

module type S = sig end

module SomeModule = struct end

module Functor (Arg : S) = struct
  type t (* Note that [t] is abstract. *)
end

module Instance1 = Functor (SomeModule)
module Instance2 = Functor (SomeModule)

let foo : Instance1.t -> Instance2.t = fun x -> x

From the second linked thread:

So I think this is the minimal example of the issue I ran into:

module M = struct end

module Func (Arg : sig end) = struct
  type t
end

module Intermediate = Func (M)
module M1 = Func (Intermediate)
module M2 = Func (Func (M))

let foo : M1.t -> M2.t = fun x -> x

This results in:

File "./nestedfunctor.ml", line 11, characters 34-35:
11 | let foo : M1.t -> M2.t = fun x -> x
                                       ^
Error: The value x has type M1.t = Func(Intermediate).t
       but an expression was expected of type M2.t = Func(Func(M)).t

In my real-world use case, I think the problem is that when instantiating Set.Make, I need to make sure the exact same path is used. I think that will help me to avoid this problem in future.

Thank you so much!!

1 Like

The types created by a functor application are equal only if the argument of the functor have the same identity which is in fact a quite strong restriction (aka F(X).t = F(Y).t only if X==Y where M==N means the module M and N share the same identity).

For instance, in

module F(X:sig end) = struct type t end
module G = F

module A = struct end
module B = A
module Not_a = struct include A end 

module C = F(G(A))
module D = G(F(B))
module E = F(F(Not_a))
let eq (x:C.t) : D.t = ()

we have C.t = D.t because F == G and A == B . Contrarily, even if the contents of
Not_a is the same as A, it is not an identical module and thus we have E.t ≠ C.t

Note that you can look at the types of module to learn which modules are identical:

module F : (X : sig end) -> sig type t end
module G = F
module A : sig end
module B = A
module Not_a : sig end

However, functor application currently (this may change in future versions) always creates fresh new identity for modules. Thus the module C and D have distinct identities, and thus
the three functor applications

module X = F(C)
module Y = F(D)
module Z = F(F(F(A))))

create three mutually distinct types X.t, Y.t and Z.t.

If those two statements were true, then I don’t understand why M5.t and M6.t are the same types in the following example:

module M = struct end

module Func (Arg : sig end) = struct
  type t
end

module M1 = Func (M)
module M2 = Func (M)

(* (M == M) => (M1.t = M2.t) *)
let eq : M1.t -> M2.t = Fun.id

(* However, even if M1.t = M2.t, this doesn't mean M1 = M2.
   In fact, M1 != M2, as we see in the following. *)
module M3 = Func (M1)
module M4 = Func (M2)

(* Consequently, the following line does not compile. *)
(* let eq : M3.t -> M4.t = Fun.id *)

(* But if [Func (M)] and [Func (M)] create modules with
   different identity (note that M1 != M2), then why is
   M5.t = M6.t ? *)
module M5 = Func (Func (M))
module M6 = Func (Func (M))

(* This compiles, but why? *)
let eq : M5.t -> M6.t = Fun.id

Apparently, storing Func (M) in M1 and M2 makes those two functor applications distinct while if I write both Func (M) applications inline, then they seem to refer to the same module.

Which part am I missing?

Generally, I’m interested in understanding both the current rules and which part of the current rules is guaranteed by the language specification. The manual section on Generative Functors didn’t help me understand the rules.

Ah, my description wasn’t clear: naming a functor application creates a new identity, because it not possible to take an alias over a functor application. Thus when you write

module M = F(X)

this is equivalent to

module M = struct include F(X) end

rather than a hypothetical

module M == F(X)

and the new identity of M is no longer linked to the functor application.
This is not the case for a module expression like U(F(X)(G(W))) which is transparent in term of identity.

2 Likes

Re-thinking about this, it is a bit unfortunate (in my opinion). But I’m not sure. Let me share my use-case (where I don’t have a problem per-se, but I’m worried about perhaps running into problems in the future).

I have the following modules:

(** Generic number arithmetic. *)

(** Interface for generic numbers. *)
module type S = sig
  type t

  val equal : t -> t -> bool
  (** Compare numbers for equality. May fail reflexivity. *)

  (* ... *)

  val compare : t -> t -> int
  (** Negative if first number is less than second number, positive if first
      number is greater than second number, zero if numbers are equal. Must
      define a total order, i.e. [compare x x] must be [true]. *)

  (* ... *)
end

(** Generic immutable arrays of numbers with total order (to support sets). *)
module OrderedVector (Num : S) = struct
  type t = Num.t Iarray.t
  (** Vector type *)

  (** Total order of vectors of equal length. *)
  let rec compare (a : t) (b : t) : int =
    let len = Iarray.length a in
    if len <> Iarray.length b then
      invalid_arg "Cannot compare vectors of different length";
    let rec aux i =
      if i < len then
        let c = Num.compare (Iarray.get a i) (Iarray.get b i) in
        if c <> 0 then c else aux (i + 1)
      else 0
    in
    aux 0
end

(** Sets of vectors. *)
module VectorSet (Num : S) = Set.Make (OrderedVector (Num))

Note how I define an alias VectorSet. I don’t run into issues yet, even if I mix the two types Set.Make(OrderedVector(Num)).t and VectorSet.t because those types are equal even though the two modules Set.Make (OrderedVector (Num)) and VectorSet do not share the same identity. As I understand now, the two types are equal because OrderedVector (Num) and OrderedVector (Num) share the same identity as long as I don’t assign those to a new module identifier (attempting to create an alias, which does’t really create an alias, as I learned now).

But what if I later pass the VectorSet module into another future functor? Then any abstract types generated by that future functor would be incompatible when I instead of VectorSet pass Set.Make (OrderedVector (Num)) to that future functor.

My point is, creating supposedly an alias (which isn’t really an alias) is tempting:

module VectorSet (Num : S) = Set.Make (OrderedVector (Num))

But what happens here (if I understand right) is:

  • I create a new and distinct module VectorSet that will be distinct from any other module.
  • The generic type VectorSet.t, however, will not always be distinct. Specifically, it is equal to Set.Make(OrderedVector(Num)).t.
  • Yet I have to be cautious now to not interchange VectorSet and Set.Make (OrderedVector (Num)) when I pass that to further functors.

Isn’t that highly confusing? :sweat_smile:

What I am curious about: Should I generally refrain from creating named modules to avoid having to write down functor application repeatedly? What is the best strategy to not run into problems? Is this behavior planned to be changed soon and/or are their advantages coming with the current behavior?


Edit/Amendment: Maybe there is no real problem as long as I’m aware of the rules on each functor application. Defining an “alias” (that isn’t really an alias, as shown) doesn’t pose a problem per-se. Just for every functor application, one must pay attention to the actual arguments used. Maybe it’s not so difficult. But still somewhat difficult to remember the rules.

Perhaps adding some documentation on that behaviour would be nice (unless it’s going to change anytime soon).

My personal rules is rather than functor should not instantiate public instance of other functors. For instance, you get in troubles because your Polytope.Make functor is instantiating a public instance of Set.Make. If you lift all public instances of functor, you will have something like:

module VectorQ = Number.OrderedVector (Number.Q)
module VectorSetQ = Set.Make (VectorQ)
module LinalgQ = Linalg.Make (Number.Q)(VectorQ)
module PolytopeQ = Polytope.Make (Number.Q)(VectorQ)(VectorSetQ)

and then all modules derived from Number.Q have one canonical instance which means that there are no risks to lose any module identity by accident.

I think I see how that suggestion helps me avoiding the problem, but it feels like this impedes usability a bit. Also, if rules are changing in future, perhaps I’m unnecessarily adding extra overhead to my API?

Consider:

module type S = sig
  type t

  val compare : t -> t -> int
  val zero : t
end

module ProblematicFunctor (Elt : S) = struct
  let returns_set (x : Elt.t) : Set.Make(Elt).t =
    let module EltSet = Set.Make (Elt) in
    EltSet.of_list [ Elt.zero; x ]
end

module BetterFunctor (Elt : S) (EltSet : Set.S with type elt = Elt.t) = struct
  let returns_set (x : Elt.t) : EltSet.t =
    EltSet.of_list [ Elt.zero; x ]
end

module ButDoesThisScale
    (Elt : S)
    (EltSet : Set.S with type elt = Elt.t)
    (EltSetSet : Set.S with type elt = EltSet.t)
    (EltSetSetSet : Set.S with type elt = EltSetSet.t) =
struct
  let returns_set_of_set_of_set (x : Elt.t) : EltSetSetSet.t =
    EltSetSetSet.singleton @@
      EltSetSet.singleton @@
      EltSet.of_list [ Elt.zero; x ]
end

I think I would feel better about this if I actually understood the semantic reason for being explicit here with regard to instantiated modules. In my case (I think) I don’t need to be generic about different implementations of “sets”. I would like to be generic about numbers, not about anything else.

So I’m not sure if the following is (from a semantic point of view) really what I want:

module PolytopeQ = Polytope.Make (Number.Q)(VectorQ)(VectorSetQ)

Maybe it’s the idiomatic way because of technical limitations? Are those limitations expected to change? Or do they serve a specific purpose?

First beware that functor application is still a function application, which may construct large records. For instance, in

the functor application is allocating a record with more than 40 fields in order to construct a small set.

My personal point of view is that instantiating (large) functors should be seen as building libraries. And while it is fine to have some duplication of libraries, commonly used libraries should be explicitly shared. In particular, with

module PolytopeQ = Polytope.Make(Number.Q)(VectorQ)(VectorSetQ)

I expect that the PolytopeQ should use the common instance of ℚ-vectors rather than defining its own. Similarly, if there are multiple users of sets of vectors (which seems to be a subtle notion if one expects to have robust sets beyond fixed-point numbers or ℚ).

Some of the rules for aliasing functor applications and arguments are expected to be relaxed in some future versions of the language (5.6? 5.7? maybe even later?), but I would expect that sharing common instance of functor applications to be still more readable.

But at compile-time, right? There is no run-time overhead, or is there?

I really like how I can create abstract code in OCaml without referring to a specific type class or trait that has an “identity”. Signatures in OCaml do not have an identity, and that enables a lot of flexibility in my opinion.

For example, the Q module doesn’t need to know anything about the Set module. Even if both wouldn’t rely on the same naming conventions (t and compare), one could easily create something like:

module Qcompat =
  type t = Q.ty (* if Q would use this weird naming scheme *)
  let compare = Q.comp (* if Q would use this weird naming scheme *)
end

This makes Set being universally useful and entirely generic! We do not need a common “interface” for sortable types. In fact, even Set and Map come with their own:

In other languages, I used to often end up with the newtype construct, which I really dislike (see also newtype in Haskell and newtype in Rust), because it creates ongoing friction when using a library by having to convert types all the time.

In OCaml this feels much better.

Now as long as there’s just one level of functor application, it seems alright.

But maybe when there are multiple levels of functor application, things become more tricky because of potentially contradictory goals. I would like to be able to be generic with regard to both following aspects:

  • Have some generic code that creates the necessary data types used for my library (based on a single given generic numeric library/module).
  • Be generic over the data types used by my library.

If I want to achieve the latter, I cannot use the first (at least not without externalizing some of that code in a third module).

But do I always want to be generic over how the data types of my libraries are being used? I don’t think I want that, and compile-time performance aside, let’s look at two examples:

module type S = sig
  type t

  val compare : t -> t -> int
  val zero : t
end

module OkayishModule = struct
  module IntSet = Set.Make (Int)

  let returns_int_set (x : int) : IntSet.t =                        
    IntSet.of_list [ 0; x ] 
end

module ProblematicFunctor (Elt : S) = struct
  module EltSet = Set.Make (Elt)

  let returns_set (x : Elt.t) : EltSet.t =
    EltSet.of_list [ Elt.zero; x ]
end

Is OkayishModule to be considered bad practice? I assume it’s alright because it’s not a functor and it’s all concrete. I work with specific types: int and Set.Make(Int).t = IntSet.t. When I use this module with some other code, that other code needs to use the same types. It needs to use int (and not, for example, Z.t or int64). It also (effectively) needs to use Stdlib.Set and not, for example, Thirdparty.SomeFancySet.

But if OkayishModule really is alright and if ProblematicFunctor really is problematic, then what does it mean:

Do I either have to be completely concrete or be completely generic? Why can’t I be generic exactly when and where I would like to be generic? I don’t believe (currently) that whenever I deal with sets of some sort, it would be considered bad practice to not turn everything into a functor, or is it really bad practice to not do that? I.e. does a function that returns a set always need to reside within a functor that generalizes about the set implementation being used if I want to follow good coding style?

I would presume that the actual reason why OkayishModule is okay and ProblematicFunctor is problematic is rather a technical limitation (which is fine, but I’m curious to understand the true nature behind this).

Implementation complexity aside and following my arguments above, I think allowing aliasing functor applications would be a real advantage for enabling certain generic programming. Right now, if I decide to be generic in some ways (specifically as in “generically describing how concrete types are being constructed”), I will be at risk of running into the issues discussed in this thread.

But maybe I still don’t understand the implications fully.

Yes and no. Normally, all modules have a runtime representation, and a functor application results in a real function application at runtime. But there are two major exceptions that make functors seem almost compile-time:

  • Type (and module type) expressions do not have any runtime representation, so functor applications that occurs in a type expression do not introduce a runtime functor application. As an example, let x : F(X).t = ... doesn’t apply the functor F at runtime because its application only occurs in the type annotation for x.
  • Modules and functors are often defined at the toplevel of compilation units, and in those cases the compiler can usually compute the functor applications at compile time as an optimisation. This is highly dependent on the compiler’s optimisation flags though.
2 Likes

Oh, okay, that is something I was not aware of. I wrongly assumed this is all compile-time stuff.

I just tried myself, and indeed f2 below is slower than f1 (both using ocaml and ocamlopt):

module type S = sig
  val hello : unit -> unit
end

module World = struct
  let hello () = ()
end

module F (A : S) = struct
  let f () = A.hello ()
end

module G = F (World)

let f1 () = G.f ()

let f2 () =
  let module Glocal = F (World) in
  Glocal.f ()

let _ =
  for i = 0 to 100000000 do
    f1 () (* Replacing [f1] with [f2] will result in slower execution. *)
  done

Considering all that (for me new) information, I can see why I might want something like module PolytopeQ = Polytope.Make (Number.Q)(VectorQ)(VectorSetQ).

But the following question still puzzles me. Is the following alright:

Specifically:

How do I “canonically” return a set of int’s? Is there a canoncial way at all, or should I always work with functors here (i.e. turn OkayishModule into a functor)? Or would I create IntSet outside my module? But what if there is no “further outside layer”?

I’ve been following the discussion and I learned a lot of not so obvious things that I think would be useful to a wide audience. It’d be great to have something like a “Functor style guide” or “Best practices” or “How not to shoot yourself in the foot with functors” tutorial somewhere that is a bit more verbose than a manual section but more concise than this thread.

If you are using sets of integers in a project, you should define an IntSet module in a common place and use it from here

module Int_set = Set.Make(Int)

Then the fact that the functor is applicative will guarantee that this set of integers will be compatible with other instances of Set.Make(Int) for instance from other libraries. If you have a local use of a sets for a specific type it is probably fine to define it locally (yes, there is a balance to be found between sharing every instance of functor applications, and having locally useful instance that are not shared).

If you need to return a set for an arbitrary types in a single function, one option currently would be to use first-class module

let singleton (type a) (module S: Set.S with type elt = a) (x:a) = S.singleton x

but soon this form will be mostly superseded by module-dependent functions in OCaml 5.5

let singleton (module S: Set.S) x = S.singleton x

I would tend to use a functor dependent on the set implementation only if the functor body is really pervasively and meaningfully dependent on the set implementation, or if I am building a very generic functor-based library.

1 Like

I would like to attempt to summarize at least. (Disclaimer, I’m not much experienced with OCaml yet, but I want to give it a try.)

  • When creating a module Output thorugh a functor F by invoking F with some input module Input, i.e. F (Input), then the abstract types of the Output module will be equal to abstract types of another module if and only if that other module is created by passing the same Input to F. Specifically the module identity of the input needs to be equal.
type t1 = F(Input).t
type t2 = F(Input).t (* [t1] and [t2] are equal types. *)

type t3 = F(A).t
type t4 = F(B).t
(* [t3] and [t4] are equal if [A] and [B] have the same module identity. *)
  • A new module identity is created if

    • we pass a distinct input to the functor,
    • we pass an anonymous structure to the functor,
    • or if we store the result of a functor application in a module binding. E.g. module M1 = F (Input) and module M2 = F (Input) currently creates two distinct modules M1 and M2, even though they share the same abstract types because in both types we pass the same input module to F. This behavior may change in future versions of OCaml such that M1 and M2 may have the same module identity in future.
  • If we perform functor application and don’t bind the result to a module name, then the resulting modules have the same identity, i.e. even if M1 and M2 have different identities, calling F (Input) in one place and F (Input) in another place creates the same module. Only storing it under a specific name will generate a new identity for those modules.

module M1 = F (Input)
module M2 = F (Input)
(* Now [M1], [M2], and [F(Input)] have three different identities,
   but invoking [F(Input)] in one place and [F(Input)] in another place later,
   gives two modules with the same identity. 
  • Functor application may come at a runtime cost. So when several modules use the same functor output (e.g. several modules share a Set.Make (Int)), you should create that module once, store the result (e.g. as Int_set), and then use that module which has been created once. See:
  • As stored modules may have a distinct identity, it may (as of today’s OCaml) matter that we re-use exactly those created/stored modules, because if we re-create them elsewhere with the same input, they will not have the same module identity as our stored modules. Because functor application comes with a runtime cost, re-using already instantiated functors is advisable anyway. (Following the argumentation of @octachron here.)

  • When some functors are instantiated only internally, it may be okay to do that only locally and not use a common (outside) instantiation.

I hope this more-or-less summarizes the state of the discussion, and, of course, feel free to correct me.

I’m not sure how I feel about it. As of now, I think I would prefer a more functional rather than generative approach. I believe it would support generic programming better. Specifically, I would like to be able to make module generation (using functors and input modules) be part of a (generic) library without paying a runtime-penalty, running into problems with distinct abstract types, or having to externalize that module generation. But these wishes may be rooted in ignorance about how functors are actually implemented. Always evaluating everything at compile-time probably comes at a price too (if even possible).

1 Like

About the runtime cost of functions. If I do something like this

let append_stuff buffer stuff =
  let open module BufferOps(struct v = buffer end) in
  add_string stuff.name1;
  add_char ',';
  add_string stuff.name2;
  ..

there is a runtime penalty, right ? First a structure is build with all the functions of the “template” module copied and only after that those functions are called?