Lwt core refactored and documented to be contributor-friendly

Lwt is making a major effort to become more reader- and contributor-friendly, as well as to unblock future work, such as writing a new manual and porting to multicore effects.

To that end, the Lwt core, lwt.ml, has been extensively commented and reorganized. The new documentation includes a thorough overview of all concepts used in the core. Hopefully, it is sufficient for a reader to become a contributor, or at least a particularly informed user. If not, we encourage you to submit a PR – it will teach us, and everyone involved, about what is missing, and how to explain the mechanics of Lwt better.

We want Lwt to be a work in progress, rather than a static codebase people are afraid to edit. Any review or feedback would be greatly appreciated!

14 Likes

Marking bugs as Easy, Medium or Hard can also promopt more contributors.

You might be interested in this talk.

1 Like

Thanks! I will watch that soon.

Marking the bugs is a good idea. Give me a few days for that :slight_smile:

I’ll also move the project suggestions out of the wiki into the regular issues, since they are kind of invisible. In case someone wants to see them now, they are here:

2 Likes

The legible, human-friendly Lwt core is, I believe, just about ready for merge. If anyone wants to/has time to sneak in some final reviews, please glance at the PR :slight_smile: Its current status is summarized in a recent comment. I hope to merge this thing on Sunday or Monday; stop me if that’s wrong!!

Interestingly, this PR was held up by the Skylake CPU bug. With all the (formerly) undiagnosed segfaults, we couldn’t really tell if the old Lwt and new Lwt were failing in the same way, and therefore had no confidence to merge such a heavy refactoring.

Thanks again to all the persons involved in troubleshooting the Skylake bug for unblocking this! Thanks to all the reviewers.

(P.S. With reference to the video posted by @vramana above, I watched it – thanks again to @vramana; as a result, among other things, Lwt now has a thriving easy issues list).

https://github.com/ocsigen/lwt/pull/354#issuecomment-312036236

3 Likes

Thank you for this wonderful piece of work, @antron. I’ve only started reading lwt.ml, but it’s already a complete joy. If the compiler had been this well documented… we’d be very far along indeed. Thanks again.

1 Like

Excellent work @antron. In the GPR your mentioned the usage of GADT leading to reduction of match cases and ‘assert false’. Would be interesting to read more about it if possible.

2 Likes

I’m not sure what format or how much information you would like (let me know), but here’s a summary. I’m simplifying, and omitting irrelevant details. If you want the “real deal” on the data types discussed below, see the actual lwt.ml :slight_smile: Also, if you want a more general and simpler discussion of GADTs, let me (and the community) know. The discussion below is maybe too focused on the actual types in Lwt, which could be a source of noise.


Using regular variants

An Lwt promise is basically

type 'a promise =
  {mutable state : 'a promise_state}

and 'a promise_state =   (* a promise can... *)
  | Resolved of 'a       (* resolve with a value *)
  | Failed of exn        (* fail with an exception *)
  | Pending of ('a promise_state -> unit) list
                         (* still be pending, with a list of callbacks
                            to call when it completes *)

Those callbacks are used to implement things like bind, join, etc. They are always called exactly when a promise stops being Pending, i.e. when the state becomes Resolved or Failed. Therefore, those are the only two possible constructors that can be passed to the callbacks.


The problem

However, as you can see, the type promise_state has that third constructor Pending. So, lwt.ml was littered with code like this:

let bind p f =
  (* ... *)

  (* Ok, we figured out that `p` is pending, so let's add a callback to
     it, to do something when it's not pending anymore. *)

  let callback =
    fun p's_new_state ->
      match p's_new_state with
      | Resolved v -> (* do something with `f v` *)
      | Failed exn -> (* do something with `exn` *)
      | Pending _ -> assert false
        (* This case is impossible, but the type system doesn't know! *)
  in
  add_callback callback p;

  (* ... *)

This is a simple example, but the large number of undocumented assert false, especially in rather esoteric helpers, was a source of anxiety. For each one:

  • It’s clear that the case shouldn’t happen, but not why not.
  • The error is a runtime error, so one is never sure if some crazy scenario can trigger it, and that the tests are thorough enough.

Why not use result?

One way to resolve this is to pass ('a, exn) result to callbacks, instead of 'a promise_state. However, this has two drawbacks:

  1. It requires allocating an ('a, exn) result every time a promise completes. We would like to reuse the information already carried in the promise_state. We could solve that by having a case

    | Completed of ('a, exn) result
    

    instead of separate Resolved and Failed, i.e. the Async way, but that’s still two allocations per promise, and it only seems plausible because I am simplifying – see next point…

  2. This only solves the problem for callbacks. There are other places where we want the type system to know the promise is only Pending, or satisfies some other condition. Those also have assert false expressions.


The GADT way

We can instead define

type completed
type pending
(* There are no values of these types. Their only purpose is for the
   compiler to think that they are different types from each other. *)

type (_, _) promise_state =
  | Resolved : 'a  -> ('a, completed) promise_state
  | Failed   : exn -> ('a, completed) promise_state
  | Pending  :
    (('a, completed) promise_state -> unit) list
                   -> ('a, pending)   promise_state

Ok, this is much more verbose, but the only material differences here are:

  1. We tell the compiler what kind of promise_state each constructor produces. Instead of everything being just one kind of 'a promise_state, there is now ('a, completed) promise_state and ('a, pending) promise_state.
  2. We tell the compiler that callbacks take only the completed variety of promise state.

Now, because the compiler knows that callbacks don’t take a pending state (that’s why we need that type pending to be different from completed), we can do this:

let callback =
  fun p's_new_state ->
    match p's_new_state with
    | Resolved v -> (* ... `f v` ... *)
    | Failed exn -> (* ... something with exn ... *)
in

and simply drop the Pending case, because, again, the argument p's_new_state has type ('a, completed) promise_state in callbacks. And, in the place where we call these callbacks in the implementation, the compiler does not allow us to pass a promise_state that even might be pending!

The representation of this GADT in memory, at runtime, is the same as of the original variant shown above. But through the type system, we know that at this point only the first two cases are possible.

In fact, not only can we write the callback that way, but we have to. The compiler will yell cryptic errors at us if we accidentally add a Pending case!

Of course, the actual lwt.ml has more complex scenarios than this. I probably wouldn’t have used a GADT if this was the only thing we wanted to push into the type system. It would be overkill. If you want to see what else the GADTs are used for, take a look in the file. Hopefully, it’s documented clearly enough; otherwise, questions are welcome :slight_smile: As a note, the other major use of GADTs in the new lwt.ml is so-called existential types. That has little to do with eliminating assert false, but I can explain that separately, if desired.

24 Likes

I should add that the most anxiety-inducing thing about all those assert false cases was thinking about editing the code of Lwt. Unless you know the proofs of why all those cases can’t happen, you basically can’t edit the code in any but the most trivial ways – unless you’re willing to be irresponsible. And it’s a lot of stuff to know, and I’m not sure if anyone still knew it. And without that, it takes a long while to infer from code.

Moving those proofs as much as possible into the type system allows Lwt contributors to stop worrying about them, and let the compiler do the tedious work of checking their correctness.

Maintainers can easily remember the couple of things not encoded, and check them during review.

3 Likes

Thank you for your illuminating answer. Indeed existential type usage would be my next question.

1 Like

Thank you for this work. I think it is not only beneficial to potential contributors but also to heavy users. In my experience, Lwt is really easy to work with until it’s not anymore.

Under, subtle behavior can start manifesting and often require digging deep into Lwt or at least getting some knowledge of how exactly it works. I’m thinking about stackoverlow caused by excessive waiters list or use of non tail recursive apis, starvation based on queue scheduler semantic, or high memory usage with waiters accumulating. This change is more than welcome and makes digging into the codebase easier for neophytes.

PS. i realize that this bit of knowledge should be documented and or fixed if possible, i will try to make a list and open proper issues.

1 Like

:scream:

We want to fix things like this on the Lwt end as well, e.g. this PR about Lwt_list and this issue partly about non-tail-recursive APIs in the Lwt core. Anything that is inherently problematic should be at least documented, so your list would be quite welcome!

We’ve fixed a few such things recently, but not enough. Making the code clearer is definitely towards being able to both understand and work on it faster :slight_smile:

Putting together something about existential types in the next few hours…

2 Likes

Note that you can do the same without a GADT at the cost of extra indirection for non-pending values:

type 'a promise_state =
  | Done of ('a, exn) result
  | Pending of (('a, exn) result -> unit) list

Yep, it’s mentioned under “Why not use result?”

Why not use polymorphic variants? Is it because of the sometimes needed coercion syntactic noise?

Sorry, I was too sleepy…

1 Like

Actually, I was planning to discuss how to handle this using polymorphic variants (also pointed out by @cristianoc on Discord), so here it is :slight_smile: As a teaser, the syntactic noise of coercion turns out to be a good thing, relative to what we actually resort to :stuck_out_tongue:


With polymorphic variants

One should be able to do:

type 'a completed_state =
  [ `Resolved of `a
  | `Failed of exn ]

type 'a pending_state =
  [ `Pending of ('a completed_state -> unit) list ]
    (* callbacks accept only 2 constructors *)

type 'a promise_state =
  [ 'a completed_state
  | 'a pending_state ]
    (* all 3 constructors included *)

type 'a promise =
  {mutable state : 'a promise_state}

Then, you get nice callbacks like

let callback =
  fun p's_new_state ->
    match p's_new_state with
    | `Resolved v -> (* ... `f v` ... *)
    | `Failed exn -> (* ... something with exn ... *) 

and again no need for a `Pending case. However…


Why the simple approach doesn’t work

There are several subtle difficulties. One is that we actually use an existential type right in the “real” promise_state, but let’s set that aside for the existential type post, and focus on simpler and more immediate things :slight_smile: Let’s try writing the function that takes a pending promise, and completes it:

let complete p new_completed_state =
  match p.state with    (* the existing state *)

  | `Resolved _ -> assert false    (* :( *)
  | `Failed   _ -> assert false    (* :(((((((((( *)

  | `Pending callback_list ->
    p.state <- (new_completed_state :> _ promise_state);
    callback_list |> List.iter (fun f -> f new_completed_state)

The problem here is that we can’t easily write a type for “a promise whose state field can only be `Pending.” So far, we can only put constraints directly on the constructors of promise_state.


Solution: add more type information to promise

One way to solve this is with types like completed and pending from the previous post:

(* Everything is as in the previous GADT post, except that I am explicitly
   showing that type promise now takes a 'state type parameter. *)
type completed
type pending

type ('a, 'state) promise =
  {mutable state : ('a, 'state) promise_state}

and ('a, 'state) promise_state =
  (* ...we would like to link each constructor with a possible value of
     type variable `'state`, i.e. `completed` and `pending`, and that's
     exactly 50% of what GADTs do (the other 50% being existential
     types). So let's just use a GADT: *)
  | Resolved : 'a  -> ('a, completed) promise_state
  | Failed   : exn -> ('a, completed) promise_state
  | Pending  : (('a, completed) promise_state -> unit) list
                   -> ('a, pending)   promise_state

So, with types completed and pending and a GADT, we get to have not only ('a, pending) promise_state, but also ('a, pending) promise, so we can write

val complete :
  ('a, pending) promise -> ('a, completed) promise_state -> unit

and the entire match expression reduces to a let:

let complete p new_completed_state =
  let Pending callback_list = p.state in
  p.state <- Obj.magic new_completed_state;
  callback_list |> List.iter (fun f -> f new_completed_state)

The Obj.magic is a source of major embarrassment; see the last section of this post.


Insisting on polymorphic variants

Another way to get precise typing, but using polymorphic variants, is to use promise_state itself as a type argument to promise:

(* This is the same as at the top of this post... *)
type 'a completed_state =
  [ `Resolved of 'a
  | `Failed of exn ]

type 'a pending_state =
  [ `Pending of ('a completed_state -> unit) list ]

type 'a promise_state =
  [ 'a completed_state
  | 'a pending_state ]

(* ...until here, where the type parameter is the type of the state! *)
type 'state promise =
  {mutable state : 'state}

and now we have

val complete :
  'a pending_state promise -> 'a completed_state -> unit

This actually seems not too bad, but I haven’t explored the consequences for all use cases in lwt.ml. Meanwhile, GADTs immediately fit all of them, and GADTs offer other things we need, like existential types, so GADTs were the clear choice. Their only real drawbacks are (1) relatively few good tutorials (2) the opaque and scary acronym GADT :slight_smile:


Regarding the coercion

In the first attempt with polymorphic variants, there was the line

p.state <- (new_completed_state :> _ promise_state);

This is unpleasant, but it’s good compared to what we do with the GADT solution, or would have to do with the precise polymorphic variant solution.

  1. GADT: if p is an ('a, pending) promise, then it has an ('a, pending) promise_state, and we have to use Obj.magic to assign an ('a, completed) promise_state to it.
  2. Polymorphic variant: if p is an 'a pending_state promise, then it has an 'a pending_state, and we also have to use Obj.magic to assign an 'a completed_state to it.

:scream:

As far as I know, this is an inevitable tradeoff when you have such precise types, but also mutability. What we currently do in lwt.ml is hide the Obj.magic inside a set_state function:

val set_state :
  ('a, _) promise -> ('a, 'state) promise_state -> ('a, 'state) promise

and you are supposed to always call it like this:

(* ... we have a promise `p` ... *)

let p = set_state p some_new_state in

(* ... the name `p` is now shadowed, and the new reference has
       accurate type constraints ... *)

This is a cheap imitation of a restricted form of linear typing. In practice, p is never used in lwt.ml after set_state anyway, but we still call set_state and shadow the reference, in order to be extra careful and paranoid.

I want to note that all of this is purely internal to the implementation of Lwt. No user has to interact with this casting, nor GADTs, etc.

And, if you have a better way, please propose or contribute it :slight_smile:

4 Likes

Hmm… impressive write-up, but somehow I prefer a few ‘assert false’ statements to using Obj.magic.

Put another way: if all your code needs is a little sprinkle of Obj.magic to make it work, it’s ultimately broken IMO. All it takes is the right optimization at the right time to mess it up. One thing possibly keeping this trick fairly ‘safe’ from the wrath of Flambda is the fact that GADTs are unlikely to be optimized too much anytime soon since they’re complex, but my view may be naive. I get that you want linear types, I understand how awesome it would be if it worked… but it doesn’t. I personally don’t think you want this machinery to be at the core of such an industrial-strength system that’s relied on by so much of the community. I could definitely be wrong here though, so I’m waiting to be contradicted.

1 Like

It’s definitely a tradeoff, but the advantage of precise types + Obj.magic is that much more reasoning is encoded in the type system. This saves a lot of human time. See below.


I want to be clear that we never use a naked Obj.magic in lwt.ml. It’s always wrapped in casts that expect values of a precise type, and produce (the same) values of another precise type. These casts actually document state changes and other possibilities in the code with their names. Here is an example:

let callback in_completion_loop p_result =
  let State_may_now_be_pending_proxy p'' = may_now_be_proxy p'' in
  (* ... *)

That whole line is actually Obj.magic, i.e. a no-op, but it tells the reader up-front, at the beginning of a callback, how p'''s state may have changed in between when p'' was first returned to the user, and when the callback was actually called.

It also gives an obvious function definition may_now_be_proxy to search for, where a big comment explains why exactly, and only, that state change is possible, etc.

Before, the state change, and the reasoning, was left for people to guess at each location. Without precise types, there is no need for the cast, so no particular reason to try hard documenting what’s going on.

With precise types, you not only get these explicit documenting casts, but also precise match cases afterwards, with no mysterious assert false.

Obviously, I’d rather do this without Obj.magic, but we need to either come up with something smarter, or OCaml needs better types.


For another example, take the Obj.magic inside set_state, described in the GADT vs. polymorphic variant writeup. It’s actually a bit more complex on the type level than I showed before. This is for safety and documentation purposes. It’s approximately:

type 'a state_may_have_changed =
  | State_may_have_changed of 'a

val set_state :
  ('a, _) promise -> ('a, 'state) promise_state ->
    ('a, 'state) promise state_may_have_changed

So the compiler forces you to use it like this:

let State_may_have_changed p = set_state p new_state in
(* ... *)

i.e. the compiler forces you to acknowledge and document what happened. This is true even if set_state is called deep within a helper, and that helper is returning you the result. Without precise types, nothing forces you to document a state change at all. You could trigger one inside a deep helper and not know it. It might then hit one of the assert false cases, that you or I could easily forget to update while refactoring some code. Perhaps this would happen only on rare occasions, when a user is running in Lwt under heavy load.

The State_may_have_changed constructor is optimized away by the compiler, so the whole thing is a no-op. It’s basically a compiler-checked, compiler-demanded comment.


Also, I want to note that the old lwt.ml had Obj.magic as well, for other reasons (which are still the case in the new one). So, it’s more sin, but sin that was already there :slight_smile: Those existing Obj.magic have to do with the variance of promises and resolvers, and the incompatibility of the variance with the internal representation. Maybe that should be another writeup :slight_smile:

This was edited into the post after I replied.

  • As noted,lwt.ml already used Obj.magic for many years. I do feel a bit dirty adding more, but if optimizations are going to break Obj.magic, that’s not a new threat to Lwt, unless it’s those specific use cases that would be broken. The new use cases all have to do with messing with phantom type parameters, whereas the pre-existing ones are just casts between entirely unrelated types.
  • lwt.ml suffers from immense paralysis, and needs at least better documentation. We could remove all the Obj.magic at the cost of changing the internal representation of promises and resolvers to be slightly less optimal (than it has been for something like a decade), and also inserting a lot more comments, and doing much more thorough review for all PRs.

In addition, Lwt is tested on Flambda. I think we are well-prepared to notice a problem from an overly-optimistic optimization pass, and adapt to it somehow. Lwt doesn’t "need* Obj.magic to make it work.