You’re probably familiar already with the standard variant syntax:
type 'a expr =
| Value of 'a value
The GADT syntax is more general, and every case written in normal syntax can be automatically converted to GADT syntax by replacing of
with :
and appending -> <type_decl>
at the end, where <type_decl>
is the name of the type being defined with its type parameters (the rule for constant constructors is slightly different, since in this case there is no of
keyword to replace, and you need to insert : <type_decl>
instead of the arrow version). So, on our example you get:
type 'a expr =
| Value : 'a value -> 'a expr
What you should note (and can be very confusing) is that in a GADT constructor definition, all type variables are local to the constructor: the occurrences of 'a
in Value : 'a value -> 'a expr
are linked together, meaning that when x
has type foo value
then Value x
has type foo expr
, but they are actually unrelated to the 'a
in type 'a expr =
. You would get an equivalent type is you used this definition:
type 'a expr =
| Value : 'b value -> 'b expr
This is unlike regular variants, where type variables occurring after the of
keyword must be actually declared as parameters of the type.
Because of this scoping rule, when using GADT constructors it is very common to use _
instead of named type parameters, to avoid any possible confusion. Note that if you want to declare a type with multiple parameters, you will still have to be explicit about the number of parameters:
type (_,_) foo =
| Foo : 'a * 'b -> ('a, 'b) foo
You can interpret _
as a generic type variable that you know is not referenced directly.
To answer your second question, the value
after int
and bool
is not recursion. You can see it as the return type of the associated constructor. Since you’re defining the type value
, you’re not allowed to return anything that isn’t an instance of value
, but you are allowed to restrict the parameters, and this is one of the main features of the GADT constructors.
And you can see this in action in your third question: each of the branches uses the information from the associated constructor to locally constrain the type of the parameter. In the Int
branch, the parameter (which has type a value
has its type refined to int value
, meaning that locally we can treat int
and a
as equivalent. So returning x
, which in this branch has type int
, is allowed for the expected return type of a
. Similarly, in the Bool
branch a
becomes equivalent to bool
and we can return x
which has type bool
.
The reason for the type a.
syntax is that this local equivalence cannot be done with normal type variables. Instead, we need a special kind of variable (technically, a local abstract type) on which local equations can be added, and outside the function’s scope it will look like a normal variable again (the function will end up with type 'a value -> 'a
).
But I agree that the example is confusing.
It would have been more interesting to start with a more complicated but more generic example, such as:
type 'result computation =
| Add : int computation * int computation -> int computation
| Is_equal : int computation * int computation -> bool computation
| If_then_else : bool computation * 'a computation * 'a computation -> 'a computation
| Int_literal : int -> int computation
| True : bool computation
| False : bool computation
(You’ll note that instead of _
, I used 'result
as the type parameter; it’s equivalent, but the name can sometimes give a better idea of what the parameter stands for.)
Then you can write the equivalent of eval_value
:
let rec compute : type a. a computation -> a = fun computation ->
match computation with
| Add (left, right) -> (* Here a is int *) (compute left) + (compute right)
| Is_equal (left, right) -> (* Here a is bool *) Int.equal (compute left) (compute right)
| If_then_else (cond, then_, else_) ->
(* Here a is not known completely, but from the [If_then_else] constructor
we still learn that then_ and else_ are also of the same [a computation]
type. *)
if (compute cond) then (compute then_) else (compute else_)
| Int_literal n -> (* Here a is int *) n
| True -> (* Here a is bool *) true
| False -> (* Here a is int *) false