Understanding type quantification and locally abstract types

Hi, I would like to understand (polymorphic) type quantification / annotation as well as locally abstract types better.

I have read the following, but still don’t understand well:

As far as I understand, explicitly polymorphic annotations and locally abstract types are two independent (orthogonal) features. However, there exists some syntactic sugar to combine both.

I tried this:

let f0 : 'a -> 'a = fun x -> x;;
let f1 : 'a. 'a -> 'a = fun x -> x;;
let f2 (type a) : a -> a = fun x -> x;;
let f3 : type a. a -> a = fun x -> x;;
let f3' : 'a. 'a -> 'a = fun (type a) -> (fun x -> x : a -> a);;

All of those end up as identity functions with a type displayed as 'a -> 'a. That actually means ∀a. a → a, right?

My understanding is (and please correct me if I’m wrong):

f0

The annotation : 'a -> 'a is “incomplete”, as in the compiler may replace 'a with a concrete type. Like:

# let inc : 'a -> 'a = fun x -> x + 0;;
val inc : int -> int = <fun>

f1

The annotation : 'a. 'a -> 'a is an explicitly polymorphic annotation. I.e. the following does not compile:

# let wrong : 'a. 'a -> 'a = fun x -> x + 0;;
Error: This definition has type int -> int which is less general than
         'a. 'a -> 'a
Click here for some side observation / question.

Here the type in the error message isn’t displayed as 'a -> 'a but as 'a. 'a -> 'a, even though utop displays f1 as of type 'a -> 'a:

# let f1 : 'a. 'a -> 'a = fun x -> x;;
val f1 : 'a -> 'a = <fun>

So what is the canonical way to describe the type of f1 here? 'a -> 'a or 'a. 'a -> 'a? Or does it matter on context which form to use?

f2

The syntax let f2 (type a) : a -> a = … for a locally abstract type confuses me a bit as apparently (type a) is not part of the “type” after the colon? The manual says, “The expression […] introduces a type constructor […] which is considered abstract in the scope of the sub-expression, but then replaced by a fresh type variable.”

I would understand this as in the part, I can refer to the type using the name a.

But this ability to reference the type doesn’t seem to be the only difference from f0. Consider that the following don’t work:

# type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind;;
type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind

# let zero_no_annotation = fun x -> match x with I -> 0 | F -> 0.0;;
Error: This pattern matches values of type (float, string) kind
       but a pattern was expected which matches values of type
         (int, string) kind
       Type float is not compatible with type int

# let zero_wrong_annotation : ('a, 'b) kind -> 'a = fun x -> match x with I -> 0 | F -> 0.0;;
Error: This pattern matches values of type (float, string) kind
       but a pattern was expected which matches values of type
         (int, string) kind
       Type float is not compatible with type int

# let zero_correct_annotation (type a) : (a, 'b) kind -> a = fun x -> match x with I -> 0 | F -> 0.0;;
val zero_correct_annotation : ('a, string) kind -> 'a = <fun>

Both zero_no_annotation and zero_wrong_annotation fail to compile, but zero_correct_annotation works. But I do not refer to type a at any point other than in the type signature itself.

Note that this issue is not related to explicit polymorphism, as the following also fails:

# let zero_polymorphic_annotation : 'a. ('a, 'b) kind -> 'a = fun x -> match x with I -> 0 | F -> 0.0;;
Error: This pattern matches values of type (float, string) kind
       but a pattern was expected which matches values of type
         (int, string) kind
       Type float is not compatible with type int

So why exactly do I need the (type a) and what exactly does it cause? According to the stackoverflow article, assuming I understand right, it has something to do with making the compiler to not “unify” the type in the expression fun x -> match x with I -> 0 | F -> 0.0, but I don’t really understand yet.

f3

As far as I understand, my example f3 (or f3' without syntactic sugar) is a combination of f1 (explicitly polymorphic annotation) and f2 (locally abstract type). Is there a use-case where you want/need to use f2 but not f3 or f3 but not f2?

I find it also a bit strange that in this syntax, the annotation is now completely after the colon, while the locally abstract type annotation is made before the colon.


Maybe someone can help me to understand all these different annotations and the rationale behind them.

7 Likes

Below are some non-expert answers; am sure an expert will eventually provide more and fuller detail.

Correct. Type variables in computed signatures are implicitly quantified over.

Correct. Type variables in user annotations are not quantified over in general; they are “unification” variables (they can unify with other type expressions).

Correct.

The more accurate way would be val f1 : 'a. 'a -> 'a but this is not legal OCaml syntax today (explicitly polymorphic annotations in signatures are not supported). So technically only 'a -> 'a is possible and one must rely on the compiler for the quantification over 'a.

The (type a) introduces an abstract type into the environment, and its scope extends over everything that follows the (type a). There is no requirement that a be part of the type of the function or anything else really. For example:

let f (type a b) x = let module M = struct type t = a * b end in x + 1;;

is a (admittedly, useless) valid function. The point is that types are allowed in places where type variables are not (eg the definition of t in the example above). Another crucial use of locally abstract types is when pattern matching on GADTs, as locally abstract types appearing in the type of the scrutinee are allowed to be refined in each branch of the pattern matching (see OCaml - Generalized algebraic datatypes), which is why your zero_correct_annotation typechecks.

f3 but not f2 is easy: when you need polymorphic recursion, that is when you need the function to be polymorphic in the body of the function (see OCaml - Polymorphism and its limitations).

f2 but not f3: f3 is strictly more polymorphic than f2 so anything you can do with f2 you can do with f3, so you won’t find an example where you want f2 but not f3. But there may be cases where f2 suffices and f3 may not be needed (when you don’t need polymorphism).

Hope that helps!

Cheers,
Nicolas

6 Likes

And just to add one more comment: don’t feel bad about having a hard time grokking this subject. I believe there is wide consensus that there is a usability issue around these different kinds of type variables and annotations (see eg better treatment of rigid type annotations · Issue #14253 · ocaml/ocaml · GitHub).

In other words, thanks for posting this – I’m sure many more share your puzzlement !

Cheers,
Nicolas

4 Likes

It actually is! This compiles

module M : sig val f : 'a. 'a -> 'a end = struct let f x = x end

This might be a recent addition though because I could have sworn this wasn’t valid as well.

'a in a signature is technically also just a unification variable and not an implicitly quantified type variable. This isn’t usually visible because it is generalized and immediately turns into a type variable but there are still ways to expose it.

For example, this compiles

type 'a t constraint 'a = int
module M : sig val f : 'a t -> 'a -> 'a end = struct let f _ x = x end;;

and defines

module M : sig val f : int t -> int -> int end

But this throws a type error as expected

module M : sig val f : 'a. 'a t -> 'a -> 'a end = struct let f _ x = x end
2 Likes

Indeed! Thanks for the correction. It looks like the feature was introduced in 4.14 in fact (in https://github.com/ocaml/ocaml/pull/10437), but it went somewhat unnoticed I think.

Cheers,
Nicolas

1 Like

Polymorphism is too often a tricky subject for advanced users, and a black box wrapped in mystery for beginners. First off, as @nojb said, don’t feel bad if this topic confuses you: OCaml syntax can be quite obscure, especially regarding type manipulations and type signatures. I’ll try my best to complement previous answers and explain how explicit polymorphism and locally abstract types work, but keep in mind it is a complex subject.

TL;DR

  • 'a. t is a polymorphic type with quantified type variable 'a, which can appear anywhere within the type t. It can only type truly polymorphic terms, defined either at the top-level or in let expressions.
  • 'a -> 'a can stand for either a truly (implicit) polymorphic function type or a function type of as-yet-unknown argument and return types. The difference is that the first types terms at generalization points (top-level and let-level terms), while the second can be used to annotate terms in a more “it should have this general shape, but I don’t know the specifics”
  • (type a) introduces a “rigid” type variable a which can be used by terms and cannot be made equivalent (through unification) to other types. You can think of them as any other fixed type such as int, bool or string. They are only quantified over (i.e, become a quantified type variable in a polymorphic type) if they appear in the type signature of inferred terms.
  • While the 'a in 'a. 'a -> 'a governs abstraction of types over types, but is not usable by terms, the (type a) in let f (type a) x y = ... governs abstraction of types over terms, but is not usable (cannot be referenced) by its inferred (or given) type.

OCaml type checker is based on a type inference algorithm which broadly works as follows, for a term let f x = <body>:

  1. introduce a placeholder type for x (a so-called unification variable, 'a), so that if <body> mentions x we can work with its type.
  2. infer the type of <body>. During this process, the type checker infers type information from the shape of the <body> term and any of its sub-terms. Furthermore, it may improve type information. For instance, if at any point it finds proof that the type of x should be some concrete type t, it will replace the placeholder type of x by t for the entire expression.
  3. find out any type variables free in the <body> term. I’m omitting a lot of detail here, but these type variables (no matter their origin) will become “specialized” in order to make the inferred type of f as polymorphic as possible. For instance, if after type-inference, the type of <body> is int -> 'a and the type of x is still 'a, then 'a is the only free variable of int -> 'a and will be used to create the polymorphic type signature ∀a. a → int → a.

Notice how I did not express the polymorphic type of f using OCaml syntax. This is because OCaml can represent this type in two distinct, although equivalent (to my knowledge), manners:

  • explicit polymorphic type signature: 'a. 'a -> int -> 'a
  • implicit polymorphic type signature: 'a -> int -> 'a

Even when you explicitly provide an explicit type signature to a term, OCaml will present the type of the term as an implicit type signature.

Also notice that the previous implicit polymorphic type signature can also be used as a regular type in a type annotation, something like (f: 'a -> int -> 'a) "Hello" 1. This ambiguity is an unfortunate reality of the language. The nature of a type variable name 'a is context dependent: in explicit polymorphic and inferred implicit polymorphic type signatures and such as above, it is regarded as a quantified type variable in the body of the type signature (the 'a -> int -> 'a part); otherwise, it should be interpreted as a regular unification variable.

So, what happens when a binding term such as f comes with a coupled type annotation? Well, it depends on whether the annotated type is an explicit polymorphic type signature or not. Very broadly:

  • let f: 'a. 'a -> 'b -> 'a = fun x -> <body>: the type checker will verify that the resulting type for the right-hand side can be generalized to the polymorphic type. Type variable 'a is “rigid”, as it cannot be equated to other concrete types, while 'b is treated as a unification variable and can thus be equated to any other type. Thus, the term let f: 'a. 'a -> 'b -> 'a = fun x (_: int) = x will result in the type 'a -> int -> 'a
  • let f: 'a -> 'b = fun x -> <body>: the type checker treats the type as a regular non-polymorphic type, and thus all type variables which look like are being quantified are actually being interpreted as unification variables. Thus, the resulting type of this expression could be 'a -> int -> 'a as much as it could be int -> int.

Furthermore, the type variables introduced by type signatures are not usable in the expressions they type. What you think is a type variable bound by a type signature is actually a new unification variable. In the term let g: 'a -> 'a = fun (x: 'a) -> (x: 'a), although it may seem both type variables 'a in the body of g are quantifiers brought by an implicit polymorphic signature, they are actually unification variables. Ain’t all this look-alike type variables and subtleties a lot of fun to disambiguate over :smiley: ?

Now, the type checker is often times smart, but it is not infallible. Sometimes, you want some of these unknown, placeholder types to not be equated to any other type, thus remaining unknown and abstract throughout the type checking procedure of a particular term. This is what happens with GADTs and polymorphic recursion. This is what the (type a) annotations do. They introduce a brand-new, unknown and abstract type which cannot be replaced by other types, but can be equated to them. Unlike unification type variables which are really not bound anywhere, these locally abstract types can be used by multiple terms while representing the exact same, unknown-but-general-enough type.

Let’s now go over the various functions from your kind example, and see why they work/don’t work:

let zero_wrong_annotation : ('a, 'b) kind -> 'a = fun x -> match x with I -> 0 | F -> 0.0
  1. The type checker interprets 'a and 'b as unification variables, and so the resulting type could replace them by any type, or quantify them and generalize the type to a polymorphic one.
  2. The type checker introduces x with type ('a, 'b) kind into scope (note that 'a and 'b are still unification variables)
  3. The first branch of pattern matching observes that the type of x should be one supporting the constructor I of type (int, string) kind. Thus, ('a, 'b) kind = (int, string) kind, resulting in 'a=int and 'b=string. The type checker then infers the body of the branch with no issues
  4. The next branch observes that the type of x should support the constructor F of type (float, string) kind. Thus, ('a, 'b) kind = (float, string) kind, resulting in 'a=int and 'b=string
  5. Although 'b=string is valid in both branches, the equations 'a=int and 'a=float cannot be both accepted and remain valid, since int =/= float. Thus, type checking fails for the entire expression
let zero_polymorphic_annotation : 'a. ('a, 'b) kind -> 'a = fun x -> match x with I -> 0 | F -> 0.0
  1. The type checker interprets 'a as a quantified type variable and 'b as an unification variable. This means the inferred type should have a polymorphic type such as 'a 'b. ('a, 'b) kind -> 'a (explicit) or ('a, string) kind -> 'a (implicit), but can never have non-polymorphic types such as (int, string) kind -> int.
  2. The type checker introduces x with type ('c, 'b) kind into scope (note that 'c and 'b are still unification variables. The type checker does not use 'a, as it turns quantified variables into unification ones for the purposes of type checking)
  3. The first branch of pattern matching observes that the type of x should be one supporting the constructor I of type (int, string) kind. Thus, ('c, 'b) kind = (int, string) kind, resulting in 'c=int and 'b=string. The type checker then infers the body of the branch with no issues
  4. The next branch observes that the type of x should support the constructor F of type (float, string) kind. Thus, ('c, 'b) kind = (float, string) kind, resulting in 'c=int and 'b=string
  5. Although 'b=string is valid in both branches, the equations 'c=int and 'c=float cannot be both accepted and remain valid, since int =/= float. Thus, type checking fails for the entire expression
let zero_correct_annotation (type a) : (a, 'b) kind -> a = fun x -> match x with I -> 0 | F -> 0.0
  1. The type checker notices this is a term which locally binds the abstract type a. It thus adds this type into scope, making it usable for any type or term on its right-hand side.
  2. The type checker notices the type of zero_correct_annotation (type a) should be (a, 'b) kind -> a. It thus interprets a as the local abstract type bound previously, and 'b as an unification variable. Note this does not make zero_correct_annotation completely polymorphic as of now: were it recursive, its type in any recursive call would be the monomorphic (a, 'b) kind -> a (a cannot change, because it is still locally bound!) instead of the (explicitly) polymorphic 'a. ('a, 'b) kind -> 'a (where 'a can be any other type)
  3. The type checker introduces x with type (a, 'b) kind into scope (note that only 'b is a unification variable).
  4. The first branch of pattern matching observes that the type of x should be one supporting the constructor I of type (int, string) kind. Thus, (a, 'b) kind = (int, string) kind, resulting in 'b=string only, since a is an abstract type, and can thus be equated to any other type. The type checker then infers the body of the branch with no issues.
  5. The next branch observes that the type of x should support the constructor F of type (float, string) kind. Thus, (a, 'b) kind = (float, string) kind, resulting in 'b=string, since a is abstract and thus can also support float. The type checker then infers the body of the branch with no issues.
  6. Because the two resulting equations are compatible, type checking proceeds, with equation ``b=string. The resulting type for zero_correct_annotationis the correct(a, string) kind → atype, which hasaas a free (locally abstract) type variable. This type is thus quantified overa, resulting in the (explicit) polymorphic signature 'a. ('a, string) kind → 'a`.

I understand this is a lot to maul over. I tried to leave most of the type theory behind, since this is already confusing as it is, but I could also delve into that if you’d like. Anyway, hope this helps, and please try to not feel discouraged because of these OCaml quirks: it’s a bit of a hurdle, but you can get over it!

5 Likes

In this expression, the unification variable 'a is the same in all of its four appearance. More generally, unification variables are scoped at the level of toplevel let definitions. For instance,

let f x =
   let y (w:'a) = 0 + w in
   (x:'a)

yields the type int -> int for the function f.

1 Like

Oops, indeed. I mixed two ideas there. Fixed :slight_smile:

Thanks, I was not aware that that was supported.

Just for completeness, another version of this workaround wraps the function in a record type:

type poly_f = { f : 'a. 'a -> 'a }

Hi all, thank you so much for your responses. I’ll go through some of your replies below.

(Sorry this got a bit long.)

The explicitly polymorphic annotation in a signature seems to be syntactic “noise” though (as in not changing the signature). This gives an error:

# module M : sig val f : 'a -> 'a end = struct let f _ = 0 end;;
Error: Signature mismatch:
       Modules do not match:
         sig val f : 'a -> int end
       is not included in
         sig val f : 'a -> 'a end
       Values do not match:
         val f : 'a -> int
       is not included in
         val f : 'a -> 'a
       The type 'a -> int is not compatible with the type 'a -> 'a
       Type int is not compatible with type 'a

While the following compiles:

# let f : 'a -> 'a = fun _ -> 0;;
val f : int -> int = <fun>

Also, what @Ben_Janes said:

So what 'a -> 'a actually means depends on context. As a type annotation in a struct, it is a “function type of as-yet-unknown argument and return types” while in a sig, it is always a polymorphic function type (where a prefix 'a. can be added for clarity but is just “noise”).

I think what I’m still struggling with is the difference between a “type variable” and an “abstract type”. I think the difference has to do with how the type inference works. I feel like (and not sure if I’m right), (type a) actually causes two different things:

  • It makes the type (constructor) referrable by a name (here a) in the function’s body. (edit, see near end of this post: not sure if that’s really specific to locally abstract types)
  • It affects type inference such that values with this abstract type may actually be of different types within the same expression.

Getting back to my example:

Let’s look at the expression x in match x with I -> 0 | F -> 0.0. What is the type of x? It can’t be 'a. ('a, string) kind, because it is not polymorphic: Even though it is either (int, string) kind or (float, string) kind, it can’t be used where a polymorphic 'a. ('a, string) kind value is expected. Let’s verify:

# type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind;;
type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind

# type poly = { poly : 'a. ('a, string) kind };;
type poly = { poly : 'a. ('a, string) kind; }

# let does_not_work (type a) : (a, 'b) kind -> a = fun x -> let _ = { poly = x } in match x with I -> 0 | F -> 0.0;;
Error: This field value has type (a, string) kind which is less general than
         'a. ('a, string) kind

It also can’t be ('a, string) kind for a specific type 'a, because if 'a is int, then the second branch of the match is wrong, and if 'a is float, then the first branch of the match is wrong.

I guess that is where you can say that it is of type (a, string) with some “abstract” type a? But what does it truly mean? Sounds pretty “abstract” to me. :sweat_smile:

Also interesting, what is the “outside” visible type of zero_correct_annotation? Here, it is polymorphic. I take that val zero_correct_annotation : ('a, string) kind -> 'a from above actually means val zero_correct_annotation : 'a. ('a, string) kind -> 'a. Let’s test that by passing some hypothetical value of type (unit, string) kind:

# type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind;;
type ('a, 'b) kind = I : (int, string) kind | F : (float, string) kind

# let zero_correct_annotation (type a) : (a, 'b) kind -> a = fun x -> match x with I -> 0 | F -> 0.0;;
val zero_correct_annotation : ('a, string) kind -> 'a = <fun>

# let cheat () : (unit, string) kind = raise (Failure "Runtime failure");;
val cheat : unit -> (unit, string) kind = <fun>

# zero_correct_annotation (cheat ());;
Exception: Failure "Runtime failure".

Note that even though the last statement fails at runtime, the compiler is satisfied, i.e. zero_correct_annotation is truly polymorphic as it does accept any ('a, string) kind argument (for an arbitrary type 'a).

Another way to see that zero_correct_annotation is polymorphic is by explicitly annotating it with a polymorphic type (according to case f1) while introducing the locally abstract type in the body:

# let zero_polymorphic_annotation : 'a. ('a, 'b) kind -> 'a =
    fun (type a) ->
    fun (x : (a, _) kind ) : a -> match x with I -> 0 | F -> 0.0;;
val zero_polymorphic_annotation : ('a, string) kind -> 'a = <fun>

So I would concluce that:

  • From the “outside” perspective, zero_correct_annotation is polymorphic, i.e. of type 'a. ('a, string) kind -> 'a (where the 'a. is noise that is usually omitted but implied).
  • From the “inside” perspective, the argument passed to zero_correct_annotation is of type (a, string) kind, where a is some sort of “abstract” type that I still don’t fully grasp.

I guess this also matches what you said here:

And here:

I guess the notion of “general-enough” is somehow between polymorphic and monomorphic? :thinking:

Okay, so with recursion, it is sometimes necessary to explicitly mark a function as polymorphic. And if I also need a “general-enough type”, I see how I need f3 or f3'.

I am not sure. I said above:

I feel like (and not sure if I’m right), (type a) actually causes two different things: […] It makes the type (constructor) referrable by a name (here a) in the function’s body. […]

So I guess there may be cases where I need to refer to the type, while the function is not (and cannot be) polymorphic. I tried the following:

# let swap (a : int) (b : int) (xs : 't array) =
  let temp = xs.(a) in
  xs.(a) <- xs.(b);
  xs.(b) <- temp;
  print_endline xs.(b);;
val swap : int -> int -> string array -> unit = <fun>

This function cannot be polymorphic because of the print_endline at the end. But let’s assume that for some reason, I would like to give let temp an explicit type annotation. I tried this, but it fails:

# let swap (type t) (a : int) (b : int) (xs : t array) =
  let temp : t = xs.(a) in
  xs.(a) <- xs.(b);
  xs.(b) <- temp;
  print_endline xs.(b);;
Error: This expression has type t but an expression was expected of type string

I don’t fully understand why. I guess this has to do with this “general-enoughness”, that string can’t fulfill?

I’m not sure if I can follow here. Yet another word where I don’t fully follow what it means: “rigid”. If you say that a cannot be made equivalent to other types, then I guess this means the compiler cannot say a is the same as string for example. So I guess that’s why my swap example above fails to compile (but why the GADT examples do compile). Is there a usecase that actually requires (type a) where a doesn’t become a quantified type variable in a polymorhic type (same question as above: are there cases in which I need f2, but where f3 would be illegal)?

I also have a hard time understanding this. The bits:

  • “abstraction of types over types” I don’t understand
  • “not usable by terms” I think I do understand: it means I canot refer to the type in an expression, right?
  • “abstraction of types over terms” again, I don’t understand
  • “but is not usable (cannot be referenced) by its inferred (or given) type” I also don’t understand. Does it mean that even if a happens to be a string, I cannot use a value of type a in places where a value of type string is expected? (Failure of my swap example above?)

Yes, that I understand. So for example the type 'a. unit -> 'a gets displayed as 'a as in here:

# module T = struct let x : 'a. unit -> 'a = fun () -> raise (Failure "X") end;;
module T : sig val x : unit -> 'a end

But not always. A counter-example:

# type t = { t : 'a. unit -> 'a };;
type t = { t : 'a. unit -> 'a; }
# type some_record = { some_field : 'a. unit -> 'a };;
type some_record = { some_field : 'a. unit -> 'a; }
# let _ = { some_field = T.x };;
- : some_record = {some_field = <fun>}

Note how OCaml / utop writes: type t = { t : 'a. unit -> 'a; }. So there are places where the type is reported with an explicit 'a. , in particular where it’s also required as an input, as the following is illegal to write:

# type some_record = { some_field : unit -> 'a };;
Error: The type variable 'a is unbound in this type declaration.

Though I’m not sure what “term” means. I didn’t find the term “term” in the language specification.

Except when in a field type as above, I guess?

I am not sue what this means. I actually thought it means that if I have a function with a type signature 'a -> 'a, then I cannot use 'a in the function body (or if I do, then it’s a different type variable). But that doesn’t seem to be true, as the following does not compile:

# let g : 'a -> 'a = fun (x: 'b) -> let _ : 'a = 42 in "Hello " ^ x;;
Error: The value x has type int but an expression was expected of type string

Whereas the following compiles:

# let g : 'a -> 'a = fun (x: 'b) -> let _ : 'c = 42 in "Hello " ^ x;;
val g : string -> string = <fun>

So I guess I have to revise what I wrote above:

I feel like (and not sure if I’m right), (type a) actually causes two different things:

  • It makes the type (constructor) referrable by a name (here a) in the function’s body. This seems to also work just using 'a without (type a)!?
  • It affects type inference such that values with this abstract type may actually be of different types within the same expression.

Again, some bits that I don’t understand: “This is what the (type a) annotations do. They introduce a brand-new, unknown and abstract type which cannot be replaced by other types, but can be equated to them.” I think I understand what “cannot be replaced by other types” mean, i.e. the type inference can’t just replace 'a with int when it finds one occurrence. But I don’t know how this differs from equating. Does this mean that a in (type a) can be an int and can be a string (but not be polymorphic), yet “equating” it to int in one place doesn’t mean it is equal to int? This again sounds to me like something in between “polymorphic” and “monomorphic”. :face_with_spiral_eyes:

But you write, “unification type variables […] are really not bound anywhere”. I don’t understand that. I thought it means that if I have 'a in a type signature, then I cannot refer to that 'a type in the function body. But as shown above, that doesn’t seem to be true. (But not sure if I understand your wording right, or what “being bound” means.

I think I do understand those explanations.

But overall, some questions remain. I feel like I (vaguely) understand the main point of locally abstract types as follows:

[…] a in (type a) can be an int and can be a string (but not be polymorphic), yet “equating” it to int in one place doesn’t mean it is equal to int?

Is that, more or less, the key point?

I’d be happy to understand the theory too, but not sure if I’m able too. But I’d be willing to give it a try.

I think it does help, even if some things still remain a bit unclear to me. I’m sorry my response got a bit long (and I adjusted some of my viewpoints while writing it), but I wanted to go through your responses thoroughly and really try to understand everything. (And maybe this also helps other readers of this thread.)

P.S.: I’m not discouraged at all. I find OCaml’s type system very interesting, specifically because of using functors instead of type classes.

In the context of OCaml, “abstract type” means that a is a type constructor for which a definition has not being given (ie it would be declared in a signature simply as type a instead of type a = <something>). So this is a specific type (not a type variable - it cannot be unified), about which the compiler knows nothing (so absent any other information, it will be unequal to every other type in scope).

Where it gets more complicated is that when pattern matching GADTs, the compiler is able to make use of extra type equalities introduced by the different GADT constructors. These type equalities are only valid in the corresponding branches of the pattern matching and are “forgotten” outside of that scope.

In your example zero_correct_annotation, x has type (a, string) kind where a is a specific, albeit “abstract” (= the compiler knows nothing about) type. However, upon entering the first pattern matching branch, the compiler learns about the quality a = int induced by the I constructor, while the second one introduces the equality a = float. The upshot is that the type of every branch is a (modulo the equalities introduced by each GADT constructor), and the whole function can now be given the type (a, string) kind -> a.

Cheers,
Nicolas

So I know abstract types from module types. Let me try to write a simple example:

module type SHIFT =    
  sig 
    type t  
    val zero : t
    val shift : t -> int -> t
  end;;

Here t is abstract. If I provide an implementation (while keeping the abstract interface), I get:

# module FloatModulo7 : SHIFT =
  struct
    type t = float 
    let zero = 0.0
    let shift f i = Float.rem (f +. Float.of_int i) 7.0
  end;;
module FloatModulo7 : SHIFT
# #show FloatModulo7.t;;
type t

Here, FloatModulo7.t is really abstract. We don’t know it’s a float internally. We also see that when using the module in utop:

# #show FloatModulo7.t;;
type t
# (FloatModulo7.shift FloatModulo7.zero 2);;
- : FloatModulo7.t = <abstr>

I.e. utop can’t display the result as a float, because it must not know that it is a float.

Yet internally it is equal to float, isn’t it? At least within the module’s implementation. So for me, an abstract type is a type that is a specific type but unknown for the outside user.

But I still don’t really understand how this relates to locally abstract types. To whom is the type kept abstract (thus deserving the name “abstract type”)? And a locally abstract type also doesn’t seem to be a specific type, as in the previous examples of let zero_correct_annotation (type a) : (a, 'b) kind -> a, the a sometimes seems to be an int and sometimes a float. So it is not a single, specific type, is it? But in the FloatModulo7 it is a specific type (a float in that example).

I don’t understand how these two cases, “abstract type in a module” and “locally abstract type in a function”, are similar. To me, they seem different. What are the common features that makes us call them “abstract”? I assume there is a commonality, but I just don’t see it yet. (Sorry if my questions seem a bit ignorant. I’m trying my best to understand.)

But this is something specific to locally abstract types? Or to abstract types in general?

That is not correct, I think. The type of zero_correct_annotation is polymorphic and no longer related to abstract types, i.e. the type is 'a. ('a, string) kind -> a (where the 'a. is implicit). This is what I tried to show here:

From the outside p.o.v., the “locally abstract type” is not visible (not even as an abstract type), but the function is simply polymorphic.

This is a good intuition, but note that a type can be abstract (= lacking a definition) by itself (ie you can declare type a in a module and this will define an abstract type), without referencing any “outside” view.

To the subexpression that follows (type a) (including the body of the function).

It is. But “is” is a tricky word :slight_smile: The name a always refers to same one type. But in each branch of the pattern matching, type equalities are introduced in the environment, so a becomes equal to int in the first branch, and a becomes equal to float in the second branch. It is as if in each branch of the pattern matching you learned more information about a and then promptly forgot it when passing to the following branch.

Simply that they are type constructors without a definition.

This is a general feature of GADT pattern matchings, which in OCaml is implemented by making use of the machinery of locally abstract types (which are also useful for other things, such as working with local modules).

The function is generalized and becomes polymorphic after it is defined, but this mechanism is completely general (it is the let-polymorphism typical of ML-style languages) and orthogonal to what we are discussing, I think.

No need to apologize!

Cheers,
Nicolas

Uhm, I guess so. I can’t do much with it though, I guess?

# module M :      
  sig
    val run : unit -> unit
  end 
  =
  struct
    type t         
    let left (a : t) (_ : t) = a                
    let run () = print_endline (left "Hello" "XY")    
  end;;
Error: This constant has type string but an expression was expected of type t

Is there any use-case where I can use a (non-locally) abstract type in a module implementation rather than in its signature?

Well, an easy use case is when using the FFI (eg with C or Javascript): you can declare a type which is opaque to OCaml, even in the implementation:

type t
external foo : unit -> t = "foo_stub"

Then the foreign stub is free to choose a suitable representation for t while your OCaml code is perfectly insulated from this choice.

Cheers,
Nicolas

Is this a correct way to rephrase this answer?

"At each application of zero_correct_annotation, only one branch is taken. So within that single call, a is indeed a single, specific type, and for each application it is true that if the argument was of type (a, _) kind then the return value is of type a. "

In some way this notion of polymorphism seems more natural than the usual polymorphic type variables introduced with 'a. .... The locally abstract type relates to what happens at each call site (each time with a different instance of the type a) whereas the polymorphic type variable requires that in different branches of the pattern matching it is the same type, even though at every application, only one branch is realized. Would it be possible to replace every 'a. ... with a (type a) ? What would be lost?

(disclaimer: absolutely non-expert)

Well, on the face of it, no, because (type a) does not produce a polymorphic function, you actually need to combine both constructs/syntaxes together to get a polymorphic function with locally abstract types, but there is a (yet another) dedicated syntax for this: type a. ... (see OCaml - Language extensions). As to replacing one form by the other, I think the devil is in the details, but as I understand it, it is the gist of proposals such as Treat universal variables as abstract in expected types by lpw25 · Pull Request #12732 · ocaml/ocaml · GitHub, so definitely a relevant idea.

Cheers,
Nicolas

Thanks for this perspctive, and I have been re-thinking about it also. I wonder if the following is a correct way to phrase it (maybe not formally correct, but let me try):

There exist different classes of “variable types”:

  • A type variable 't that may be replaced by type inference once during compilation with a concrete type or, if not replaced, will be polymorphic.
  • A weak type variable '_weak1, which must be replaced by type inference once during compilation with a concrete type.
  • A quantified type variable 't. 't, which cannot be replaced by type inference and is always polymorphic.
  • An abstract type type t or locally abstract type (type t) t that can be replaced with multiple different types depending on context (using with type t = x or pattern matching, respectively). Locally abstract types may also be replaced (once and forever) by type inference with a concrete type (which is a different mechanism) or be polymorphic (like 't).
  • The combination of a quantified type variable with locally abstract types:
    A locally abstract type with a polymorphic annotation type t. t, that can be replaced with multiple different types depending on context when used, like (type t) t, but behaves as polymorphic to the caller of the function, like 't. 't. i.e. it cannot be replaced by the type inference mechanism with a single type for all cases.

Did I get this (more or less) right?

Well, if I understand things right, then (type a) allows type inference to replace a with a single type or allow it to be polymorphic:

  • 't may be polymorphic
  • '_weak1 cannot be polymorphic
  • 't. 't is always polymorphic
  • (type t) t may be polymorphic
  • type t. t is always polymorphic

Sometimes the compiler needs to know that it is forbidden to apply type inference, which is why you need the explicit 't. or type .t quantifications (examples given in section “Explicitly polymorphic annotations” as also linked in my OP).

If I understand right, then this would mean treating 't. 't the same way as type t. t?

1 Like