Tutorial on GADTs

I’ve written a three-part tutorial on GADTs.
The first part shows the very basic. It is intended for anyone who already knows OCaml but hasn’t written a GADT before.
The second part is a collection of different techniques you can use when making your own GADT. It is intended as a follow-up of the first part or for programmers who have written GADTs before but feel like they still have things to learn.
The third part is a gallery of GADTs found in the public OCaml software ecosystem. It’s intended for anyone who wants to see practical-use examples.

I intend to expand the gallery of GADTs (part three). Let me know if you have suggestions.

28 Likes

Thanks for the articles.

In other words, you make a type where the possible emptiness of a list (of integers) can be tracked by the type system.

I found the first example not very convincing. As we can achieve the stated goal quite easily like so:

type 'a nonempty = Non_empty of ('a * 'a list)
let hd (Non_empty (x, _) : 'a nonempty) : 'a = x

We could also add another variant to track the empty list specifically.

So the GADT version appeared more complicated for no real advantage.

Thank you, very nice.

Trivia: doesn’t work very well on my android. Nice to have, not Essential.

I’m still struggling with the concept. Syntactically as I understand things what distinguishes a GADT from a variant is just ->- is that correct?

If that’s a GADT, then a plain old variant type must be an ADT, no?

Whence “Generalized”? We can define a variant without a generic type var but with ->.

You don’t need a -> to have a GADT. With a GADT you can do the following

type _ tag =
  | F : float tag
  | S : string tag

let f (type kind) (case : kind tag) (v : kind) =
  match case with
  | F ->
    (* `F` has type `float tag`,
        thus `case` has type `float tag`,
        thus `kind = float`,
        thus `v` has type `float` *)
    print_float v
  | S ->
   (* same here but with string instead of case *)
   print_string v

let () =
  f F 42.0;
  f S "42";
  f F "42"; (* type error *)
  f S 42.0; (* type error *)

The type of v depends on the value of case !

You could of course do:

type tag =
  | F of float
  | S of string

let f = function
  | F f -> print_float f
  | S s -> print_string s

But then you have some useless boxing going on.

You could do the same without boxing and with Obj.magic but this is unsafe:

type tag =
  | F
  | S

let f case v =
  match case with
  | F -> print_float (Obj.magic v)
  | S -> print_string (Obj.magic v)

let () =
  f F 42.0;
  f S "42";
  f F "42"; (* no type error, will segfault or print garbage *)
  f S 42.0  (* no type error, will segfault or print garbage *)
3 Likes

Ok thanks, but then the distinction is just : ?

That doesn’t sound right. It would make it a Dependent Type. Doesn’t it depend on the type of case?

Pretty sure it depends on the type of the “case” value. Which is very different than “depends on the value of case”.

That doesn’t sound right. It would make it a Dependent Type. Doesn’t it depend on the type of case?

GADTs can be seen as some kind of “lightweight” dependent types.

Pretty sure it depends on the type of the “case” value. Which is very different than “depends on the value of case”.

Yes, but the type of case depends on its value. When it’s F, it has type float tag, when it’s S it has type string tag.

1 Like

Nice post! One thought on Part 2, which alludes to a correspondence between hlists and tuples. Maybe it would be helpful to illustrate how far the similarity goes, as in the example below: for both stringify_hlist and stringify_tuple, the compiler is able to infer the correct argument type, and unlike with a regular list, it knows that both of these pattern-matches are exhaustive.

  type _ hlist =
    | [] : unit hlist
    | (::) : 'a * 'b hlist -> ('a * 'b) hlist

  let stringify_hlist: _ -> string list = function
    | [a; b; c] ->
      [Int.to_string a; b; Float.to_string c]

  let stringify_tuple: _ -> string list = function
    | (a, b, c) ->
      [Int.to_string a; b; Float.to_string c]

I completely agree but also I start the blog-post by “You probably don’t need GADTs” and “you can use some other OCaml features to encode many of of the invariants that you might care about”, and I end the first part with “This GADT is sort of useless” and “A better alternative is to use a private type alias” so I think the tutorial already covers that.

(As a side note: I prefer the private type alias to your nonempty type+constructor. That because you can have zero-allocation casts between the vanilla list type and the non-empty list type. And more importantly because it let’s you use the Stdlib.List module in full rather than to have to rewrite functions.)

The last part of the tutorial shows useful uses of GADTs. The first part is about teaching you the basic syntax and rules.

4 Likes

Yeah it doesn’t work very well on my android either. The code is far too small. I’ve tried to fix the CSS but it turns out I’m not any good at it. If you have a fix to suggest for the CSS I’d be happy to try it out and apply it.

I have to say, I don’t know what the “Generalized” part of the name means. Probably there’s someone on this forum who has the answer.

As for the distinction between ADTs and GADTs it’s not a syntactic one. Indeed, you can write non-generalized ADTs with the newer syntax. The distinction is that the type parameter on the right-hand side of the -> in the different constructor declarations is either

  • repeated all the same on all the constructors:

    (* not a GADT *)
    type _ option =
      | None : 'a option
      | Some : 'a -> 'a option
    
  • different for different branches:

    (* a GADT *)
    type _ v =
      | Char : char v
      | Int : int v
    

Not sure whether maybe you did not do so to avoid more explanations but you could have defined:

type empty
type nonempty

Since you never use the values. Doing so reinforces the idea that this happens mostly at the level of the type language.

Using abstract types as type-level tags is a dangerous anti-pattern, since the typechecker can only prove that they are not equal inside the current module. To make clear that the values are not intended to be used it is better to define private constructor:

type empty = private Empty_tag
type nonempty = private Nonempty_tag

which keeps a proof that the type two types are not equal available to the typechecker because Empty_tagNonempty_tag.

GADTs are generalized because they introduce the possibility to have existentially quantified type variable in the argument of the constructors:

type any_list = Any: 'a list -> any_list

and to refine the constructed type in function of the constructor

type _ numty = 
  | Int: int num
  | Float: float num

In other words, observing a constructor allow you to learn type equality about the type in the corresponding case.

Both features are not possible with a standard ADT . (The two features are partially independent, and before the final design GADTs, they were separate discussions on how to have existential quantification in variant types, but the feature was very easy to add to GADTs as “constructors + type equations” so
both features were introduced in the same package) .

Going back to the type equality aspect, this is why the eq type constructor,

type (_,_) eq =
  | Eq: ('a,'a) eq

is in many way the quintessential GADT, which can be used to encode other GADTs.

For instance, the above numty is equivalent to

type 'a numty' =
 | Int' of ('a,int) eq
 | Float' of ('a,float) eq

To check that the two types are equivalent, we can write a zero function for both types:

let zero: type n. n numty -> n = function
| Int -> 0
| Float -> 0.

let zero': type n. n numty' -> n = function
| Int' Eq -> 0
| Float' Eq -> 0.
7 Likes

Moreover, there was originally some ambiguity in the naming, Guarded Algebraic Data Types is also meaningful: the variants are “guarded” by type equations that apply to the particular variant. (For example: “A Constraint-Based Approach to Guarded Algebraic Data Types” by Vincent Simonet and François Pottier.)

1 Like

That’s the main reason. Something like that is mentioned in the tip/trick about making the property constructors private. But I think it’d be worth mentioning that leaving out the constructors is ok if the types are not exported. I’ll add that later.

Got it. It just felt like there must be some example that explains the benefit straight away, I’ll look at the other tutorials.

Wow :exploding_head:, what a technique. I did miss this on first read

Ok, I think I see see where you’re coming from. Problem is, F and S are not values. They’re unary ctors. So the type of v depends on the type of the arg used to parameterize the ctor. Which again is not the same as depending on the value of case.

Which means, if I’m not mistaken, that the standard notion “Dependent Type” is not adequate for this situation. (See Dependent type - Wikipedia or the HoTT Book)

Which imho is pretty dang interesting. Would this be sufficient to characterize GADTs? Dependent types where the dependency is not just on a value (ctor), but on the type parameter of a ctor.

Well F and S are constructors, but used as an expression they evaluate to values. As in

let x = F
let y = S

and we often say that F and S are values.

In this case, the type of the value that the variable x is assigned to is different from the type of the value that the variable y is assigned to. Different values, different types, no argument parametrising the constructors.

In fact, in @zapashcanon example there are no parameter for the constructors. So you can’t depend on “the type of the arg used to parameterize the ctor”. No such thing exists.

2 Likes

Oh man, I completely misread that. Thank you and sorry about the noise!

The Wikipedia article on GADTs has a Haskell example that suggests to me that “Specializable” would be more informative than “Generalized”. GADT ctors can produce e.g. Expr bool or Expr int, but ADTs can only produce Expr 'a. Does that make sense?