Question: Exposing types from inside a submodule of a module type

Howdy howdy howdy!

I’m an amateur OCaml fan, and I have a question about type-exposure in submodules.
This is a pretty lengthy question with a bunch of examples of my own reasoning, so…

TLDR:
Is there a way to expose the definition of a type from inside a submodule of a module type without:

  • declaring a new type in the top level of the implementing module or

  • declaring explicit signatures for submodules?

That is, something to the effect of:

module type Interface = sig
  module Inner : sig type t end
end

module Implementor : 
  Interface with type Inner.t = A | B | C
= struct ... end

Now, I’ll get into what I mean and why I’m curious.

Let’s say that I define a module type with a submodule like so:

module type Outer = sig 
  module Inner : sig 
    type t
    val transform : t -> t
  end
end

Then I might create a module that implements this module type:

module Example1 : sig
  include Outer
end = struct ... end

This is shorthand for:

module Example1 : sig
  module Inner : sig 
    type t
    val transform : t -> t
  end
end = struct ... end

Now let’s say I want to expose some details about the type Inner.t.
If Inner.t is some already-existing type (say, int), I can do this:

module Example2 : sig
  include Outer with type Inner.t = int
end = struct ... end

This is shorthand for:

module Example2 : sig
  module Inner : sig 
    type t = int
    val transform : t -> t
  end
end = struct ... end

I should also clarify that I’m getting these resulting module type annotations via the OCaml Platform language extension for VSCode, and it looks like they agree with what utop tells me.

Heck! I can even alias t out of the type entirely:

module Example2B : sig
  include Outer with type Inner.t := int
end = struct ... end

This is shorthand for:

module Example2B : sig
  module Inner : sig 
    val transform : int -> int
  end
end = struct ... end

Now there can be NO confusion what the inner type is.

But let’s say that t is some type I defined just for this one module, and I still want to expose its definition.

Maybe type t = A | B | C.

Now, I can’t do this:

module Fail : sig
  include Outer with type Inner.t = A | B | C
end = struct ... end
(* error from ocamllsp: 
    A type variable is unbound in this type 
    declaration. In type 'a the variable 'a 
    is unbound 
*)

I assume the problem here is that this attempted inline variant type definition is being interpreted as a polymorphic variant type. If I was trying to do that, the syntax here would be all wrong. Fair enough.

So what can I do? Well, I can add a type declaration to the top-level of the module signature:

module Example3 : sig
  type new_type = A | B | C
  include Outer with type Inner.t = new_type
end = struct ... end

This is shorthand for:

module Example3 : sig
  type new_type = A | B | C
  module Inner : sig 
    type t = new_type
    val transform : t -> t
  end
end = struct ... end

Again, I can also alias out t entirely:

module Example3B : sig
  type new_type = A | B | C
  include Outer with type Inner.t := new_type
end = struct ... end

This is shorthand for:

module Example3B : sig
  type new_type = A | B | C
  module Inner : sig 
    val transform : new_type -> new_type
  end
end = struct ... end

I don’t find this solution very attractive; it moves the declaration of t, and introduces a new type name.

If I had used t as the type name in the top-level, I would have gotten another error:

In module Inner1: The type t is required but not provided

Of course, I can achieve what I really want by writing out the module type completely:

module Example4 : sig
  include Outer
  module Inner : sig 
    type t = A | B | C
    val transform : t -> t
  end
end = struct ... end

This works in this instance, but I feel that it kind of goes against the purpose of defining a module type in the first place. Now the include Outer line is largely vestigial. It isn’t necessary if I want to use this module where I expect to use a module of type Outer, nor does it prevent me from accidentally using an inappropriate signature:

module Example4B : sig
  include Outer
  module Inner1 : sig
    type t = A | B | C

    val transform : int -> int (* whoops! *)
  end
end = struct ... end
(* No compiler error, UNTIL I try to use this as an input to a functor that expects a module of type Outer *)

And now let’s say that I want my module type Outer2 to have more than one submodule, and let’s say that these submodules are not independent of one another:

module type Outer2 = sig
  module Inner1 : sig 
    type t
    val transform : t -> t
  end

  module Inner2 : sig
    val consume : Inner1.t -> unit
  end
end

Now, the previous solution only works if I also include the intended signature of Inner2:

module Example5 : sig
  include Outer2

  module Inner1 : sig 
    type t = A | B | C
    val transform : t -> t
  end

  module Inner2 : sig 
    val consume : Inner1.t -> unit
  end
end = struct ... end

If I were to leave that out:

module Example5 : sig
  include Outer2

  module Inner1 : sig 
    type t = A | B | C
    val transform : t -> t
  end
end = struct ... end
(* error from ocamllsp: 
    Illegal shadowing of included module Inner1/2
    by Inner1.
    The module Inner2 has no valid type if 
    Inner1/2 is shadowed.
*)

So now I need to return to the previous solution, where we add a new type definition at the top-level:

module Example5 : sig
  type newtype = A | B | C
  include Outer2 with type Inner1.t = newtype
end = struct ... end

Again, I’m not a fan, but it’s certainly less out-of-control than having to write every submodule signature explicitly.

The nuclear option would be to refactor Outer2 itself, to allow for this pattern:

module type RefactoredOuter = sig
  type t

  module Inner1 : sig 
    val transform : t -> t
  end

  module Inner2 : sig
    val consume : t -> unit
  end
end

Then, do:

module Example5 : sig
  type t = A | B | C
  include RefactoredOuter with type t := t
end = struct ... end

which is shorthand for:

module Example5 : sig
  type t = A | B | C

  module Inner1 : sig 
    val transform : t -> t
  end

  module Inner2 : sig
    val consume : t -> unit
  end
end = struct ... end

and at least that has the same structure as RefactoredOuter, with no unnecessary new type names.

Still, I don’t love the idea of refactoring a module type based on the exhibitionist whims of an implementing module.

And if I’m not the writer of the module type in the first place, I can’t do that anyway.


In case anyone is curious, here’s the situation I found myself in that brought me to this question.
Please keep in mind that I am a hobbyist OCaml-er, and that this particular project is an exercise I devised for myself. So if anything seems oddly abstracted, that’s probably because it is.

I was interested in making a framework for quick-and-dirty programming language definitions. So, I defined a module type for languages that could be compiled in a three step process:

source code (character sequence) → token sequence → abstract syntax tree → compiled program

The idea was that I wanted to leave the definitions of ‘token’, ‘ast’, and ‘compiled program’ up to the particular implementation, as well as any types representing error states from any given point in the process. So, these are all left as abstract types by default.

Additionally, I wanted to represent these distinct-but-interrelated steps as separate submodules within a given implementation-module.

(Note: the name of the esoteric programming language BrainF*ck has been censored to protect innocent eyes, should they be unfortunate enough to stumble across this perverse thread.)

Here are the signatures in question:

module type Language = sig 
  module Lexing : sig 
    type token
    type lexing_err

    val lex_chars : char Seq.t -> (token Seq.t, lexing_err) result
  end

  module Parsing : sig 
    type ast
    type parsing_err

    val parse_tokens : Lexing.token Seq.t -> (ast, parsing_err) result
  end

  module Compiling : sig 
    type program
    type semantics_err

    val compile_ast : Parsing.ast -> (program, semantics_err) result
  end
end

module BrainF_ck : sig
  include Language

  module Lexing : sig
    type token
    type lexing_err = |

    val lex_chars :
      char Seq.t -> (token Seq.t, lexing_err) result
  end

  module Parsing : sig
    type ast
    type parsing_err

    val parse_tokens :
      Lexing.token Seq.t -> (ast, parsing_err) result
  end

  module Compiling : sig
    type program = unit -> unit
    type semantics_err = |

    val compile_ast :
      Parsing.ast -> (program, semantics_err) result
  end
end = struct ... end

As you can see, the explicit-signatures-solution produces some unwieldy boilerplate in this instance. All I really want is to expose the following equalities:

type BrainF_ck.Lexing.lexing_err = |
type BrainF_ck.Compiling.semantics_err = |
type BrainF_ck.Compiling.program = unit -> unit

This way, I’m informing the compiler that I can, for example, safely unwrap a value of type ('a, BrainF_ck.Lexing.lexing_err) result:

open BrainFuck.Lexing
let unwrap : (token Seq.t, lexing_err) result -> token Seq.t = 
  fun (Ok x) -> x
(* Statically confirmed not to raise, unlike Result.get_ok *)

The same goes for BrainF_ck.Compiling.semantics_err. This is because a Brainf*ck program can only fail to compile in my ast-generation step (where loops are handled). There aren’t any possible invalid tokens in Brainf*ck sourcecode or additional static semantics to consider in a Brainf*ck program.

Also, I’m revealing to the user (me) how I can use the type BrainF_ck.Compiling.program. In this case, it’s a unit thunk that runs the input program via stdin/stdout, or whatever. But maybe I want to compile to a string of C code, or to a file handle to a newly-created executable.

If I were to accept that this is an intractible issue, I could just do something like:

module type Language = sig 
  type program
  
  module Lexing : sig 
    type token
    type lexing_err

    val lex_chars : char Seq.t -> (token Seq.t, lexing_err) result
  end

  module Parsing : sig 
    type ast
    type parsing_err

    val parse_tokens : Lexing.token Seq.t -> (ast, parsing_err) result
  end

  module Compiling : sig 
    type semantics_err

    val compile_ast : Parsing.ast -> (program, semantics_err) result
  end
end

module BrainF_ck : sig
  type empty = |

  include Language
  with type program = unit -> unit
  with type Lexing.lexing_err = empty
  with type Compiling.semantics_err = empty
end = struct ... end

…since it’s very likely that an implementor (me) of the Language module type would desire to expose details about the program type in order to make it at all usable. Any additional type exposure is up to the implementor (me) to figure out.


A very long question with a lot of examples, I know. I just wanted to be sure that I covered everything I tried, and to (hopefully) be as clear as possible in what I’m going for.

I assume that I’m not the first person to want to do this kind of idealized type exposure, but I also don’t really know how to look up this question. So if this is a duplicate of another thread, I would greatly appreciate being pointed to that one.

Thank you for your time!

Theo C.

This part is more likely a bug in Merlin’s error recovery since your code is syntactically invalid. It has nothing to do with polymorphic variant.

Overall, your problem is that type declarations cannot moved around by with constraints. Thus if you have a module type with a type declaration, you need to build your module type starting with this declaration. This can be done by defining smaller module type components:

module type inner1_content = sig
   type t
   val transform: t -> t
end
module type outer_extension = sig
   module Inner1: inner1_content
   module Inner2: sig
       val consume: Inner1.t -> unit
   end
end

With those definitions, you can build your Outer2 starting from the definition of t:

module type Outer2 = sig
   module Inner1: sig
      type t = A | B | C
      include inner1_content with type t := t
   end
   include outer_extension with module Inner1:=Inner1
end
1 Like

Thank you so much for your answer, @octachron! (Apologies for the late reply.) This gets me exactly what I was looking for. The with module ... construct in particular allows much more flexible signature specification, once you extract the inner module signatures into dedicated definitions. By analogy with your example code, I ended up doing something along these lines:

module type inner1_content = sig
   type t
   val transform: t -> t
end
module type inner2_content = sig
   type t
   val consume: t -> unit
end
module type outer_extension = sig
   module Inner1: inner1_content
   module Inner2: inner2_content with type t := Inner1.t
end
module Outer: sig
   module Inner1: sig 
      type t = A | B | C
      include inner1_content with type t := t
   end
   module Inner2: inner2_content with type t := Inner1.t
   include outer_extension
      with module Inner1 := Inner1 
       and module Inner2 := Inner2
end = struct ... end

This is great! I can bake the inter-module type dependency into the definition of the outer module type, while still having direct access to the inner module types.

Thanks again!