Tutorial on GADTs

raphael-proust · February 8, 2024, 7:57pm

I’ve written a three-part tutorial on GADTs.
The first part shows the very basic. It is intended for anyone who already knows OCaml but hasn’t written a GADT before.
The second part is a collection of different techniques you can use when making your own GADT. It is intended as a follow-up of the first part or for programmers who have written GADTs before but feel like they still have things to learn.
The third part is a gallery of GADTs found in the public OCaml software ecosystem. It’s intended for anyone who wants to see practical-use examples.

I intend to expand the gallery of GADTs (part three). Let me know if you have suggestions.

benjamin-thomas · February 9, 2024, 6:51pm

Thanks for the articles.

In other words, you make a type where the possible emptiness of a list (of integers) can be tracked by the type system.

I found the first example not very convincing. As we can achieve the stated goal quite easily like so:

type 'a nonempty = Non_empty of ('a * 'a list)
let hd (Non_empty (x, _) : 'a nonempty) : 'a = x

We could also add another variant to track the empty list specifically.

So the GADT version appeared more complicated for no real advantage.

mobileink · February 9, 2024, 7:41pm

Thank you, very nice.

Trivia: doesn’t work very well on my android. Nice to have, not Essential.

I’m still struggling with the concept. Syntactically as I understand things what distinguishes a GADT from a variant is just ->- is that correct?

If that’s a GADT, then a plain old variant type must be an ADT, no?

Whence “Generalized”? We can define a variant without a generic type var but with ->.

zapashcanon · February 9, 2024, 8:30pm

You don’t need a -> to have a GADT. With a GADT you can do the following

type _ tag =
  | F : float tag
  | S : string tag

let f (type kind) (case : kind tag) (v : kind) =
  match case with
  | F ->
    (* `F` has type `float tag`,
        thus `case` has type `float tag`,
        thus `kind = float`,
        thus `v` has type `float` *)
    print_float v
  | S ->
   (* same here but with string instead of case *)
   print_string v

let () =
  f F 42.0;
  f S "42";
  f F "42"; (* type error *)
  f S 42.0; (* type error *)

The type of v depends on the value of case !

You could of course do:

type tag =
  | F of float
  | S of string

let f = function
  | F f -> print_float f
  | S s -> print_string s

But then you have some useless boxing going on.

You could do the same without boxing and with Obj.magic but this is unsafe:

type tag =
  | F
  | S

let f case v =
  match case with
  | F -> print_float (Obj.magic v)
  | S -> print_string (Obj.magic v)

let () =
  f F 42.0;
  f S "42";
  f F "42"; (* no type error, will segfault or print garbage *)
  f S 42.0  (* no type error, will segfault or print garbage *)

mobileink · February 9, 2024, 8:36pm

Ok thanks, but then the distinction is just : ?

mobileink · February 9, 2024, 8:41pm

That doesn’t sound right. It would make it a Dependent Type. Doesn’t it depend on the type of case?

Pretty sure it depends on the type of the “case” value. Which is very different than “depends on the value of case”.

zapashcanon · February 9, 2024, 10:42pm

That doesn’t sound right. It would make it a Dependent Type. Doesn’t it depend on the type of case?

GADTs can be seen as some kind of “lightweight” dependent types.

Pretty sure it depends on the type of the “case” value. Which is very different than “depends on the value of case”.

Yes, but the type of case depends on its value. When it’s F, it has type float tag, when it’s S it has type string tag.

conroj · February 10, 2024, 2:00am

Nice post! One thought on Part 2, which alludes to a correspondence between hlists and tuples. Maybe it would be helpful to illustrate how far the similarity goes, as in the example below: for both stringify_hlist and stringify_tuple, the compiler is able to infer the correct argument type, and unlike with a regular list, it knows that both of these pattern-matches are exhaustive.

  type _ hlist =
    | [] : unit hlist
    | (::) : 'a * 'b hlist -> ('a * 'b) hlist

  let stringify_hlist: _ -> string list = function
    | [a; b; c] ->
      [Int.to_string a; b; Float.to_string c]

  let stringify_tuple: _ -> string list = function
    | (a, b, c) ->
      [Int.to_string a; b; Float.to_string c]

raphael-proust · February 10, 2024, 1:34pm

I completely agree but also I start the blog-post by “You probably don’t need GADTs” and “you can use some other OCaml features to encode many of of the invariants that you might care about”, and I end the first part with “This GADT is sort of useless” and “A better alternative is to use a private type alias” so I think the tutorial already covers that.

(As a side note: I prefer the private type alias to your nonempty type+constructor. That because you can have zero-allocation casts between the vanilla list type and the non-empty list type. And more importantly because it let’s you use the Stdlib.List module in full rather than to have to rewrite functions.)

The last part of the tutorial shows useful uses of GADTs. The first part is about teaching you the basic syntax and rules.

raphael-proust · February 10, 2024, 1:42pm

Yeah it doesn’t work very well on my android either. The code is far too small. I’ve tried to fix the CSS but it turns out I’m not any good at it. If you have a fix to suggest for the CSS I’d be happy to try it out and apply it.

I have to say, I don’t know what the “Generalized” part of the name means. Probably there’s someone on this forum who has the answer.

As for the distinction between ADTs and GADTs it’s not a syntactic one. Indeed, you can write non-generalized ADTs with the newer syntax. The distinction is that the type parameter on the right-hand side of the -> in the different constructor declarations is either

repeated all the same on all the constructors:

(* not a GADT *)
type _ option =
  | None : 'a option
  | Some : 'a -> 'a option

different for different branches:

(* a GADT *)
type _ v =
  | Char : char v
  | Int : int v

dbuenzli · February 10, 2024, 1:45pm

Not sure whether maybe you did not do so to avoid more explanations but you could have defined:

type empty
type nonempty

Since you never use the values. Doing so reinforces the idea that this happens mostly at the level of the type language.

octachron · February 10, 2024, 2:06pm

dbuenzli:

Not sure whether maybe you did not do so to avoid more explanations but you could have defined:
type empty
type nonempty
Since you never use the values. Doing so reinforces the idea that this happens mostly at the level of the type language.

Using abstract types as type-level tags is a dangerous anti-pattern, since the typechecker can only prove that they are not equal inside the current module. To make clear that the values are not intended to be used it is better to define private constructor:

type empty = private Empty_tag
type nonempty = private Nonempty_tag

which keeps a proof that the type two types are not equal available to the typechecker because Empty_tag ≠ Nonempty_tag.

GADTs are generalized because they introduce the possibility to have existentially quantified type variable in the argument of the constructors:

type any_list = Any: 'a list -> any_list

and to refine the constructed type in function of the constructor

type _ numty = 
  | Int: int num
  | Float: float num

In other words, observing a constructor allow you to learn type equality about the type in the corresponding case.

Both features are not possible with a standard ADT . (The two features are partially independent, and before the final design GADTs, they were separate discussions on how to have existential quantification in variant types, but the feature was very easy to add to GADTs as “constructors + type equations” so
both features were introduced in the same package) .

Going back to the type equality aspect, this is why the eq type constructor,

type (_,_) eq =
  | Eq: ('a,'a) eq

is in many way the quintessential GADT, which can be used to encode other GADTs.

For instance, the above numty is equivalent to

type 'a numty' =
 | Int' of ('a,int) eq
 | Float' of ('a,float) eq

To check that the two types are equivalent, we can write a zero function for both types:

let zero: type n. n numty -> n = function
| Int -> 0
| Float -> 0.

let zero': type n. n numty' -> n = function
| Int' Eq -> 0
| Float' Eq -> 0.

lukstafi · February 10, 2024, 2:55pm

Moreover, there was originally some ambiguity in the naming, Guarded Algebraic Data Types is also meaningful: the variants are “guarded” by type equations that apply to the particular variant. (For example: “A Constraint-Based Approach to Guarded Algebraic Data Types” by Vincent Simonet and François Pottier.)

raphael-proust · February 10, 2024, 3:11pm

That’s the main reason. Something like that is mentioned in the tip/trick about making the property constructors private. But I think it’d be worth mentioning that leaving out the constructors is ok if the types are not exported. I’ll add that later.

benjamin-thomas · February 11, 2024, 3:18pm

Got it. It just felt like there must be some example that explains the benefit straight away, I’ll look at the other tutorials.

Wow , what a technique. I did miss this on first read

mobileink · February 11, 2024, 8:46pm

Ok, I think I see see where you’re coming from. Problem is, F and S are not values. They’re unary ctors. So the type of v depends on the type of the arg used to parameterize the ctor. Which again is not the same as depending on the value of case.

Which means, if I’m not mistaken, that the standard notion “Dependent Type” is not adequate for this situation. (See Dependent type - Wikipedia or the HoTT Book)

Which imho is pretty dang interesting. Would this be sufficient to characterize GADTs? Dependent types where the dependency is not just on a value (ctor), but on the type parameter of a ctor.

raphael-proust · February 12, 2024, 8:11am

Well F and S are constructors, but used as an expression they evaluate to values. As in

let x = F
let y = S

and we often say that F and S are values.

In this case, the type of the value that the variable x is assigned to is different from the type of the value that the variable y is assigned to. Different values, different types, no argument parametrising the constructors.

In fact, in @zapashcanon example there are no parameter for the constructors. So you can’t depend on “the type of the arg used to parameterize the ctor”. No such thing exists.

mobileink · February 12, 2024, 2:24pm

Oh man, I completely misread that. Thank you and sorry about the noise!

mobileink · February 15, 2024, 7:54pm

The Wikipedia article on GADTs has a Haskell example that suggests to me that “Specializable” would be more informative than “Generalized”. GADT ctors can produce e.g. Expr bool or Expr int, but ADTs can only produce Expr 'a. Does that make sense?

jake · March 5, 2024, 4:02am

Incredible tutorial! I’d already read a couple other materials on the subject (RWO, and that hlist article everyone links to on diff-list typing tricks). This was very clarifying as to why and how we might use GADTs.

Topic		Replies	Views
ChatGPT explains GADTs Learning gadt	4	1130	March 24, 2023
Narrowing variant types alternatives Learning gadt , adt	11	5106	May 21, 2024
Telling the type checker that two GADTs are equal Learning gadt	6	226	March 24, 2025
"Parsing" terms into a well-typed representation: a GADT puzzle Learning gadt	4	2600	October 30, 2021
A pattern for using GADTs to subset cases Learning existential-types , polymorphic-variants , gadts	19	965	September 17, 2024

Tutorial on GADTs

Related topics