Howdy howdy howdy!
I’m an amateur OCaml fan, and I have a question about type-exposure in submodules.
This is a pretty lengthy question with a bunch of examples of my own reasoning, so…
TLDR:
Is there a way to expose the definition of a type from inside a submodule of a module type without:
-
declaring a new type in the top level of the implementing module or
-
declaring explicit signatures for submodules?
That is, something to the effect of:
module type Interface = sig
module Inner : sig type t end
end
module Implementor :
Interface with type Inner.t = A | B | C
= struct ... end
Now, I’ll get into what I mean and why I’m curious.
Let’s say that I define a module type with a submodule like so:
module type Outer = sig
module Inner : sig
type t
val transform : t -> t
end
end
Then I might create a module that implements this module type:
module Example1 : sig
include Outer
end = struct ... end
This is shorthand for:
module Example1 : sig
module Inner : sig
type t
val transform : t -> t
end
end = struct ... end
Now let’s say I want to expose some details about the type Inner.t.
If Inner.t is some already-existing type (say, int), I can do this:
module Example2 : sig
include Outer with type Inner.t = int
end = struct ... end
This is shorthand for:
module Example2 : sig
module Inner : sig
type t = int
val transform : t -> t
end
end = struct ... end
I should also clarify that I’m getting these resulting module type annotations via the OCaml Platform language extension for VSCode, and it looks like they agree with what utop tells me.
Heck! I can even alias t out of the type entirely:
module Example2B : sig
include Outer with type Inner.t := int
end = struct ... end
This is shorthand for:
module Example2B : sig
module Inner : sig
val transform : int -> int
end
end = struct ... end
Now there can be NO confusion what the inner type is.
But let’s say that t is some type I defined just for this one module, and I still want to expose its definition.
Maybe type t = A | B | C.
Now, I can’t do this:
module Fail : sig
include Outer with type Inner.t = A | B | C
end = struct ... end
(* error from ocamllsp:
A type variable is unbound in this type
declaration. In type 'a the variable 'a
is unbound
*)
I assume the problem here is that this attempted inline variant type definition is being interpreted as a polymorphic variant type. If I was trying to do that, the syntax here would be all wrong. Fair enough.
So what can I do? Well, I can add a type declaration to the top-level of the module signature:
module Example3 : sig
type new_type = A | B | C
include Outer with type Inner.t = new_type
end = struct ... end
This is shorthand for:
module Example3 : sig
type new_type = A | B | C
module Inner : sig
type t = new_type
val transform : t -> t
end
end = struct ... end
Again, I can also alias out t entirely:
module Example3B : sig
type new_type = A | B | C
include Outer with type Inner.t := new_type
end = struct ... end
This is shorthand for:
module Example3B : sig
type new_type = A | B | C
module Inner : sig
val transform : new_type -> new_type
end
end = struct ... end
I don’t find this solution very attractive; it moves the declaration of t, and introduces a new type name.
If I had used t as the type name in the top-level, I would have gotten another error:
In module Inner1: The type t is required but not provided
Of course, I can achieve what I really want by writing out the module type completely:
module Example4 : sig
include Outer
module Inner : sig
type t = A | B | C
val transform : t -> t
end
end = struct ... end
This works in this instance, but I feel that it kind of goes against the purpose of defining a module type in the first place. Now the include Outer line is largely vestigial. It isn’t necessary if I want to use this module where I expect to use a module of type Outer, nor does it prevent me from accidentally using an inappropriate signature:
module Example4B : sig
include Outer
module Inner1 : sig
type t = A | B | C
val transform : int -> int (* whoops! *)
end
end = struct ... end
(* No compiler error, UNTIL I try to use this as an input to a functor that expects a module of type Outer *)
And now let’s say that I want my module type Outer2 to have more than one submodule, and let’s say that these submodules are not independent of one another:
module type Outer2 = sig
module Inner1 : sig
type t
val transform : t -> t
end
module Inner2 : sig
val consume : Inner1.t -> unit
end
end
Now, the previous solution only works if I also include the intended signature of Inner2:
module Example5 : sig
include Outer2
module Inner1 : sig
type t = A | B | C
val transform : t -> t
end
module Inner2 : sig
val consume : Inner1.t -> unit
end
end = struct ... end
If I were to leave that out:
module Example5 : sig
include Outer2
module Inner1 : sig
type t = A | B | C
val transform : t -> t
end
end = struct ... end
(* error from ocamllsp:
Illegal shadowing of included module Inner1/2
by Inner1.
The module Inner2 has no valid type if
Inner1/2 is shadowed.
*)
So now I need to return to the previous solution, where we add a new type definition at the top-level:
module Example5 : sig
type newtype = A | B | C
include Outer2 with type Inner1.t = newtype
end = struct ... end
Again, I’m not a fan, but it’s certainly less out-of-control than having to write every submodule signature explicitly.
The nuclear option would be to refactor Outer2 itself, to allow for this pattern:
module type RefactoredOuter = sig
type t
module Inner1 : sig
val transform : t -> t
end
module Inner2 : sig
val consume : t -> unit
end
end
Then, do:
module Example5 : sig
type t = A | B | C
include RefactoredOuter with type t := t
end = struct ... end
which is shorthand for:
module Example5 : sig
type t = A | B | C
module Inner1 : sig
val transform : t -> t
end
module Inner2 : sig
val consume : t -> unit
end
end = struct ... end
and at least that has the same structure as RefactoredOuter, with no unnecessary new type names.
Still, I don’t love the idea of refactoring a module type based on the exhibitionist whims of an implementing module.
And if I’m not the writer of the module type in the first place, I can’t do that anyway.
In case anyone is curious, here’s the situation I found myself in that brought me to this question.
Please keep in mind that I am a hobbyist OCaml-er, and that this particular project is an exercise I devised for myself. So if anything seems oddly abstracted, that’s probably because it is.
I was interested in making a framework for quick-and-dirty programming language definitions. So, I defined a module type for languages that could be compiled in a three step process:
source code (character sequence) → token sequence → abstract syntax tree → compiled program
The idea was that I wanted to leave the definitions of ‘token’, ‘ast’, and ‘compiled program’ up to the particular implementation, as well as any types representing error states from any given point in the process. So, these are all left as abstract types by default.
Additionally, I wanted to represent these distinct-but-interrelated steps as separate submodules within a given implementation-module.
(Note: the name of the esoteric programming language BrainF*ck has been censored to protect innocent eyes, should they be unfortunate enough to stumble across this perverse thread.)
Here are the signatures in question:
module type Language = sig
module Lexing : sig
type token
type lexing_err
val lex_chars : char Seq.t -> (token Seq.t, lexing_err) result
end
module Parsing : sig
type ast
type parsing_err
val parse_tokens : Lexing.token Seq.t -> (ast, parsing_err) result
end
module Compiling : sig
type program
type semantics_err
val compile_ast : Parsing.ast -> (program, semantics_err) result
end
end
module BrainF_ck : sig
include Language
module Lexing : sig
type token
type lexing_err = |
val lex_chars :
char Seq.t -> (token Seq.t, lexing_err) result
end
module Parsing : sig
type ast
type parsing_err
val parse_tokens :
Lexing.token Seq.t -> (ast, parsing_err) result
end
module Compiling : sig
type program = unit -> unit
type semantics_err = |
val compile_ast :
Parsing.ast -> (program, semantics_err) result
end
end = struct ... end
As you can see, the explicit-signatures-solution produces some unwieldy boilerplate in this instance. All I really want is to expose the following equalities:
type BrainF_ck.Lexing.lexing_err = |
type BrainF_ck.Compiling.semantics_err = |
type BrainF_ck.Compiling.program = unit -> unit
This way, I’m informing the compiler that I can, for example, safely unwrap a value of type ('a, BrainF_ck.Lexing.lexing_err) result:
open BrainFuck.Lexing
let unwrap : (token Seq.t, lexing_err) result -> token Seq.t =
fun (Ok x) -> x
(* Statically confirmed not to raise, unlike Result.get_ok *)
The same goes for BrainF_ck.Compiling.semantics_err. This is because a Brainf*ck program can only fail to compile in my ast-generation step (where loops are handled). There aren’t any possible invalid tokens in Brainf*ck sourcecode or additional static semantics to consider in a Brainf*ck program.
Also, I’m revealing to the user (me) how I can use the type BrainF_ck.Compiling.program. In this case, it’s a unit thunk that runs the input program via stdin/stdout, or whatever. But maybe I want to compile to a string of C code, or to a file handle to a newly-created executable.
If I were to accept that this is an intractible issue, I could just do something like:
module type Language = sig
type program
module Lexing : sig
type token
type lexing_err
val lex_chars : char Seq.t -> (token Seq.t, lexing_err) result
end
module Parsing : sig
type ast
type parsing_err
val parse_tokens : Lexing.token Seq.t -> (ast, parsing_err) result
end
module Compiling : sig
type semantics_err
val compile_ast : Parsing.ast -> (program, semantics_err) result
end
end
module BrainF_ck : sig
type empty = |
include Language
with type program = unit -> unit
with type Lexing.lexing_err = empty
with type Compiling.semantics_err = empty
end = struct ... end
…since it’s very likely that an implementor (me) of the Language module type would desire to expose details about the program type in order to make it at all usable. Any additional type exposure is up to the implementor (me) to figure out.
A very long question with a lot of examples, I know. I just wanted to be sure that I covered everything I tried, and to (hopefully) be as clear as possible in what I’m going for.
I assume that I’m not the first person to want to do this kind of idealized type exposure, but I also don’t really know how to look up this question. So if this is a duplicate of another thread, I would greatly appreciate being pointed to that one.
Thank you for your time!
Theo C.