I’m not sure what format or how much information you would like (let me know), but here’s a summary. I’m simplifying, and omitting irrelevant details. If you want the “real deal” on the data types discussed below, see the actual lwt.ml
Also, if you want a more general and simpler discussion of GADTs, let me (and the community) know. The discussion below is maybe too focused on the actual types in Lwt, which could be a source of noise.
Using regular variants
An Lwt promise is basically
type 'a promise =
{mutable state : 'a promise_state}
and 'a promise_state = (* a promise can... *)
| Resolved of 'a (* resolve with a value *)
| Failed of exn (* fail with an exception *)
| Pending of ('a promise_state -> unit) list
(* still be pending, with a list of callbacks
to call when it completes *)
Those callbacks are used to implement things like bind, join, etc. They are always called exactly when a promise stops being Pending, i.e. when the state becomes Resolved or Failed. Therefore, those are the only two possible constructors that can be passed to the callbacks.
The problem
However, as you can see, the type promise_state has that third constructor Pending. So, lwt.ml was littered with code like this:
let bind p f =
(* ... *)
(* Ok, we figured out that `p` is pending, so let's add a callback to
it, to do something when it's not pending anymore. *)
let callback =
fun p's_new_state ->
match p's_new_state with
| Resolved v -> (* do something with `f v` *)
| Failed exn -> (* do something with `exn` *)
| Pending _ -> assert false
(* This case is impossible, but the type system doesn't know! *)
in
add_callback callback p;
(* ... *)
This is a simple example, but the large number of undocumented assert false, especially in rather esoteric helpers, was a source of anxiety. For each one:
- It’s clear that the case shouldn’t happen, but not why not.
- The error is a runtime error, so one is never sure if some crazy scenario can trigger it, and that the tests are thorough enough.
Why not use result?
One way to resolve this is to pass ('a, exn) result to callbacks, instead of 'a promise_state. However, this has two drawbacks:
-
It requires allocating an
('a, exn) resultevery time a promise completes. We would like to reuse the information already carried in thepromise_state. We could solve that by having a case| Completed of ('a, exn) resultinstead of separate
ResolvedandFailed, i.e. the Async way, but that’s still two allocations per promise, and it only seems plausible because I am simplifying – see next point… -
This only solves the problem for callbacks. There are other places where we want the type system to know the promise is only
Pending, or satisfies some other condition. Those also haveassert falseexpressions.
The GADT way
We can instead define
type completed
type pending
(* There are no values of these types. Their only purpose is for the
compiler to think that they are different types from each other. *)
type (_, _) promise_state =
| Resolved : 'a -> ('a, completed) promise_state
| Failed : exn -> ('a, completed) promise_state
| Pending :
(('a, completed) promise_state -> unit) list
-> ('a, pending) promise_state
Ok, this is much more verbose, but the only material differences here are:
- We tell the compiler what kind of
promise_stateeach constructor produces. Instead of everything being just one kind of'a promise_state, there is now('a, completed) promise_stateand('a, pending) promise_state. - We tell the compiler that callbacks take only the
completedvariety of promise state.
Now, because the compiler knows that callbacks don’t take a pending state (that’s why we need that type pending to be different from completed), we can do this:
let callback =
fun p's_new_state ->
match p's_new_state with
| Resolved v -> (* ... `f v` ... *)
| Failed exn -> (* ... something with exn ... *)
in
and simply drop the Pending case, because, again, the argument p's_new_state has type ('a, completed) promise_state in callbacks. And, in the place where we call these callbacks in the implementation, the compiler does not allow us to pass a promise_state that even might be pending!
The representation of this GADT in memory, at runtime, is the same as of the original variant shown above. But through the type system, we know that at this point only the first two cases are possible.
In fact, not only can we write the callback that way, but we have to. The compiler will yell cryptic errors at us if we accidentally add a Pending case!
Of course, the actual lwt.ml has more complex scenarios than this. I probably wouldn’t have used a GADT if this was the only thing we wanted to push into the type system. It would be overkill. If you want to see what else the GADTs are used for, take a look in the file. Hopefully, it’s documented clearly enough; otherwise, questions are welcome
As a note, the other major use of GADTs in the new lwt.ml is so-called existential types. That has little to do with eliminating assert false, but I can explain that separately, if desired.