[ANN] Lwt.6.0.0~beta (direct-style, multi-domain parallelism)

After some feedback and some work, I’m happy to announce the release of lwt.6.0.0~beta00 and lwt_direct.6.0.0~beta00! (2 packages from ocsigen/lwt at 6.0.0~beta00 by raphael-proust · Pull Request #28558 · ocaml/opam-repository · GitHub)

major CHANGES are:

  • direct style mode (see alpha release announce): use await : 'a Lwt.t -> 'a to transform any promises into a simple value. This allows you to break out of the monad which makes it possible to use libraries previously incompatible with Lwt.
  • multi-domain support: run separate schedulers in separate domains (some of the more advanced functions on Lwt are now domain-dependent, e.g., run_in_main becomes run_in_domain and takes one additional parameter)

Feedback is very welcome. Happy beta-testing and good luck with the parallel-programming!

8 Likes

What are the constraints for this to work correctly? I assume Lwt is not thread-safe, so the ’a Lwt.t must’ve been created in the same thread and domain that calls Lwt.await?

For communicating between threads Lwt had Lwt_unix.make_notification. Does that continue to be safe for all aspects of concurrency/parallelism (both thread and domain?).

Is there another (perhaps more efficient) way to wait on Lwt results between different domains using the new multi-domain support?

Don’t know if it is a compiler bug or Lwt bug (tried both 5.3.0 and 5.4+beta2), but I have a reproducible SEGV in the multidomain testcase. I tweaked the testcase a bit to be more reproducible, and now it reproduces in the CI too: Testcase for Lwt 6.0.0-beta00 segfault in multidomain by edwintorok · Pull Request #1080 · ocsigen/lwt · GitHub.
Tried capturing the SEGV under rr (with -c 10000, or --chaos), but no luck so far.

1 Like

As far as I can tell, a ’a Lwt.t can still only be manipulated (>>=, on_xxx callbacks, await, etc.) from the same thread they were created in because they’re still not thread-safe.

I only know of make_notification. In my moonpool/lwt adapter I use the notifications (and hooks) to run a unit → unit inside the main Lwt thread, including scheduling new promises, awaiting them, etc. I think it’d be nice if Lwt could use picos underneath, but I’m not sure it’ll be compatible enough. And it’s a lot of work.

The make_notification function takes a domain as optional parameter defaulting to the current donmain. You can do cross-domain notifications by explicitly setting this parameter.

The more direct way to synchronise between different domains is to just use the promise across domains just like a value. The internals of the promise have been updated:

  • if you call bind/on_*/etc. on a promise which originated from the same domain, it registers a normal callback as before,
  • if you call bind/on_*/etc. on a promise which originated from a different domain, it registers a callback which uses the aforementioned notification mechanism in order to execute the code in the current domain.

This is possible because promises have separate components for their “just data” part and their “conurrency/synchronisation” part. (There’s some kind of complicated metaphor involving particule/wave duality or something in there…)

If you want to see examples of this in action, the test test/multidomain/basic.ml creates two promises in the entry domain, and then spawns two domains each responsible for waking up one of the promise and each needing to wait on the other promise. There are binds (via let*) that cross domain boundaries. There are wakeups that cross domain boundaries.


The await mechanism relies on an effect handler which is installed via Lwt_direct.spawn. The constraint is that you must call await from within the body of that spawn. (Where “within the body” excludes moving across threads/domains/callstacks.)

Thank you! I observe but didn’t manage to reproduce that SEGV reliably. It’ll be helpful to have a repro case :slight_smile:

lwt.6.0.0-beta01 has been released!

With this release comes a change in the title of this thread:

- [ANN] Lwt.6.0.0~beta (direct-style, multi-domain parallelism)
+ [ANN] Lwt.6.0.0~beta (direct-style, runtime-event tracing)

This is likely the last beta before the release of Lwt.6.0.0, please test and share your feedback. The highlights are

  • (compared to previous beta) no more multidomain-multischeduler parallelism
    • it was too buggy,
    • you can still use Lwt_domain
  • (compared to previous beta) runtime-events produce a trace of execution of your lwt program for better debugging
  • (compared to Lwt.5.9) direct-style with Lwt_direct
    • you can write direct-style lwt (within a given scope)

    • e.g., you can interact with libraries that only provide iter : ('a -> unit) -> 'a -> unit such as

      let iter_s f h =
        Lwt_direct.spawn @@ fun () ->
          Hashtbl.iter (fun k v -> Lwt_direct.await (f k v)) h
      

Once again, thanks to @c-cube for the direct-style feature which makes it possible to use Lwt in conjunction with libraries even if they don’t include special amenities for it.

Thanks again for @edwin for the bug report on multi-scheduler-related failures.

7 Likes

It’s so great that I can’t wait for this major release! :sunny:

Do you mean that having one scheduler per domain will never be implemented ?

The short answer is:
it might be implemented in the future but

  • no promises, no roadmap
  • it’s not worth holding up the direct-style work (allowing use of lwt alongside direct-style libraries) so it’s better to release those working features now
  • it’s not worth holding up the rest of the contributions that have been proposed as pull-requests/issues from being merged

The long answer is:
i think the basic idea is workable (the basic idea was that the state of the promise is kept inside of the domain that allocated it, but other domains can attach callbacks on that promise, the callbacks are dispatched across domains so they are executed by the domain that attached them)
the difficulties are:

  • the volume of the changes needed is important, maybe some preliminary refactoring would make this easier, possibly @gasche 's work on eliminating uses of obj.magicx
  • the primary way that the lwt schdeduler communicates with other schedulers is through the notification system which is quite basic
  • there’s not a lot of support in the ecosystem for this kind of work, having to build the underlying libraries adds up to the complexity of the whole project
    • e.g., no way to query the domain-id from the C side (or maybe there is and I couldn’t find it?)
1 Like

One can access both kind of domain ids (sequential/dense and unique/sparse) from the C side. (I’m happy to help on more specific questions about this if/when you have them.)

1 Like

As a preview for what the tracing gets you, here’s the trace captured by the examples/tracing/ code viewed in the perfetto trace viewer.

  1. Spans for the different promises created by let%lwt and other syntax extension nodes (as spans) and ticks of the scheduler (as carets).
  2. Details about the selected span with code location.
  3. Number of unresolved jobs (i.e., system calls and more generally C-side asynchronous IO).
  4. Number of unresolved pause promises.

Do you have a specification about the format used here? I’m really interesting about a “good format” which does not enforce some types of values.

The program that produces a perfetto-compatible file here seems to be using the JSON-based chrome tracing format, but if you’re looking for a decent (binary) tracing format that’s also supported by perfetto there’s the fuchsia trace format.

Lwt doesn’t specify a format for the traces. Lwt emits Runtime_events and exposes the payload types. It’s the program consuming the events which can then translate it to the actual tracing format.

In the example shown above it is done by the tailgate program. You can use whatever format you want, or even plug in some TUI for live monitoring.

There’s a second example which doesn’t render the programs at all but instead monitors the events to detect suspicious pauses between scheduler ticks. lwt/examples/stall_detection/stallerlib.ml at lwt-6 · ocsigen/lwt · GitHub

1 Like