Strange grammar in the real world ocaml book(chapter GADTs)

I met many problems in the process of learning “real world ocaml”

  1. Why is there _ between type and value? What’s the difference between this and 'a
type _ value =
  | Int : int -> int value
  | Bool : bool -> bool value

type _ expr =
  | Value : 'a value -> 'a expr
  1. Why is value after int and bool? Is it a recursion?
  | Int : int -> int value
  | Bool : bool -> bool value
  1. what does ‘type a.’ mean
let eval_value : type a. a value -> a = function
  | Int x -> x
  | Bool x -> x;;

This means that the expression Int x (with x of type int) is of type int value. Similarly, Bool x (with x of type bool) is of type bool value. Note that the fact that the same type (int or bool) appears on both the lhs and rhs of the arrow is just a coincidence. You could just as well have a third constructor

| Foo : string * float -> unit value

As mentioned above, the type foo value depends on the actual constructor. So, no single type can be put in place of foo, not even 'a, hence the underscore _ there. That is the main strength of a GADT.

It is just a way of giving a proper name to the type variable. In most contexts, type a. a value -> a can be interpreted as 'a value -> 'a.

1 Like

You’re probably familiar already with the standard variant syntax:

type 'a expr =
| Value of 'a value

The GADT syntax is more general, and every case written in normal syntax can be automatically converted to GADT syntax by replacing of with : and appending -> <type_decl> at the end, where <type_decl> is the name of the type being defined with its type parameters (the rule for constant constructors is slightly different, since in this case there is no of keyword to replace, and you need to insert : <type_decl> instead of the arrow version). So, on our example you get:

type 'a expr =
| Value : 'a value -> 'a expr

What you should note (and can be very confusing) is that in a GADT constructor definition, all type variables are local to the constructor: the occurrences of 'a in Value : 'a value -> 'a expr are linked together, meaning that when x has type foo value then Value x has type foo expr, but they are actually unrelated to the 'a in type 'a expr =. You would get an equivalent type is you used this definition:

type 'a expr =
| Value : 'b value -> 'b expr

This is unlike regular variants, where type variables occurring after the of keyword must be actually declared as parameters of the type.

Because of this scoping rule, when using GADT constructors it is very common to use _ instead of named type parameters, to avoid any possible confusion. Note that if you want to declare a type with multiple parameters, you will still have to be explicit about the number of parameters:

type (_,_) foo =
| Foo : 'a * 'b -> ('a, 'b) foo

You can interpret _ as a generic type variable that you know is not referenced directly.

To answer your second question, the value after int and bool is not recursion. You can see it as the return type of the associated constructor. Since you’re defining the type value, you’re not allowed to return anything that isn’t an instance of value, but you are allowed to restrict the parameters, and this is one of the main features of the GADT constructors.

And you can see this in action in your third question: each of the branches uses the information from the associated constructor to locally constrain the type of the parameter. In the Int branch, the parameter (which has type a value has its type refined to int value, meaning that locally we can treat int and a as equivalent. So returning x, which in this branch has type int, is allowed for the expected return type of a. Similarly, in the Bool branch a becomes equivalent to bool and we can return x which has type bool.
The reason for the type a. syntax is that this local equivalence cannot be done with normal type variables. Instead, we need a special kind of variable (technically, a local abstract type) on which local equations can be added, and outside the function’s scope it will look like a normal variable again (the function will end up with type 'a value -> 'a).

But I agree that the example is confusing.
It would have been more interesting to start with a more complicated but more generic example, such as:

type 'result computation =
| Add : int computation * int computation -> int computation
| Is_equal : int computation * int computation -> bool computation
| If_then_else : bool computation * 'a computation * 'a computation -> 'a computation
| Int_literal : int -> int computation
| True : bool computation
| False : bool computation

(You’ll note that instead of _, I used 'result as the type parameter; it’s equivalent, but the name can sometimes give a better idea of what the parameter stands for.)
Then you can write the equivalent of eval_value:

let rec compute : type a. a computation -> a = fun computation ->
  match computation with
  | Add (left, right) -> (* Here a is int *) (compute left) + (compute right)
  | Is_equal (left, right) -> (* Here a is bool *) Int.equal (compute left) (compute right)
  | If_then_else (cond, then_, else_) ->
    (* Here a is not known completely, but from the [If_then_else] constructor
       we still learn that then_ and else_ are also of the same [a computation]
       type. *) 
    if (compute cond) then (compute then_) else (compute else_)
  | Int_literal n -> (* Here a is int *) n
  | True -> (* Here a is bool *) true
  | False -> (* Here a is int *) false

is the period in a. part of the syntax, i.e. type a. a value -> a has to be written that way?

or a. is just a convention and you could write a name without period for the type var if you wanted? e.g. type a b value -> c