Multi-domain, multi-scheduler Lwt

I have started working on changing Lwt to allow multiple schedulers to run in parallel in separate domains. The current status is early work-in-progress with some actual tests that actually use the multi-scheduler feature (Draft PR).

This is still far from release. There are some major changes still needed (e.g., signal management, improve callback sending, restoring some broken single-domain tests, etc.). Still, I’d like to gather feedback from users of Lwt, especially those with hefty code bases and those who need to bring their legacy code bases into the ocaml-5 era (but don’t hesitate to contribute even if that’s not the case).

What kind of uses would you make of a multi-scheduler multi-domain scheduler?

“None” is a valid response. Maybe you don’t want to use parallelism in your Lwt-based codebase.

I can see several reasonable uses:

  • a worker pool so you dispatch your server’s requests to different cores (but you don’t really need to rewrite much of your code, you can keep your lwt handlers, just set up a few domains and a few streams to send work around)
  • having mutliple schedulers into which you can run lwt bits of code (e.g., via run_in_domain: Domain.id -> (unit -> 'a Lwt.t) -> 'a, the multi-domain equivalent to run_in_main) from a “proper” multi-domain code

and also some maybe less reasonable uses:

  • writing some mixed lwt and regular old-fashion just-Unix blocking code, and passing it off to a separate domain so it doesn’t block your main scheduler

What parts of Lwt would you expect to be safe to share freely amongst domains? What parts would you expect to have safety checks?

Currently, the WIP version, allows you to attach callbacks to promises regardless of which domain they were created in (safe to “read” any promise). It means that promises are not attached to a particular domain, but callbacks are.

OTOH there are no efforts made to prevent data-races for wakeners (unsafe to “write” the same promise from two different domains). In most cases, Lwt.wakeup (and friends) should only be used to create new Lwt-friendly abstractions (e.g., lache) so there is little reason for them to move across domains. Still, is that something that should be taken into account?

More generally, should the domain-safe abstractions replace the existing ones (e.g., should Lwt_stream make it safe to push/read from the same stream in parallel) or should there be new domain-safe abstractions (e.g., an additional Lwt_stream_par)? And what performance cost for single-domain programs is acceptable in order to make multi-domain programs safe?

Is one scheduler per-domain the right granularity/abstraction?

Would it be better to offer one scheduler per thread? Is there little point in offering unlimited schedulers to begin with?


Finally I want to give quick thanks to @c-cube for offbands discussions relating to the possible futures of Lwt and to @ygrek for the highly technical conversation that led to the prototype.

11 Likes