Affect: Composable concurrency primitives for OCaml 5.0

It would be interesting to think about how this could be integrated with domainslib’s tasks domainslib/task.mli at master · ocaml-multicore/domainslib · GitHub.
E.g. it should be possible to have a fiber scheduler running on top of Domainslib tasks, right?
That should provide both concurrency (for IO bound tasks) and parallelism (for CPU-bound tasks), without the user having to deal with how to (safely) integrate the two, i.e. it should be possible to spawn a fiber on one Domain, and wait for its result from another Domain without race conditions.

For that to work well it might be useful to have a way to mark certain fibers as primarily IO bound and to have at least 1 domainslib worker dedicated to running these (it’ll be important for low-latency to react to IO as soon as results become available, don’t want lots of CPU-bound tasks flooding the domainslib workers and starving the IO-bound fibers).
Perhaps this could be in the form of optional attributes that can be interpreted by a scheduler (e.g. one may also want to specify fiber affinity to a domain, or have a priority assigned to fibers, etc.), that way affect doesn’t have to deal with the complexity around those semantics (like avoiding priority inversion): it’ll be entirely up to the scheduler what attributes it’ll define, and a very simple scheduler could provide no attributes at all.
I’d suggest:

type fiber_attribute = ..
val spawn : ?finally:( unit -> unit ) -> ?attribute:fiber_attribute -> ( unit -> 'a ) -> 'a t

(not a list on purpose: this allows the scheduler to define what attributes are valid and how to compose them, if it wants it can add extend the type with a list-like constructor, but we can let the scheduler choose the most efficient representation for it)

I like that one can provide a custom scheduler, it would be interesting to think how one can compose schedulers too, e.g.:

  • a scheduler for tracing (to be used when you want to debug a deadlock or understand a performance issue for example) - similarly in concept to mirage-profile. One would only “pay the price” for tracing when this custom scheduler is active.
  • a scheduler that maintains certain metrics (e.g. using the metrics package or something else)
  • a scheduler that is meant to be integrated with a fuzzer to try to find data races in user code (e.g. it could base its yield decisions on input provided by the fuzzer)

Can one “reraise” effects from within the scheduler to build a hierarchy of schedulers? Does the hierarchy have to be static or can it be changed dynamically? (e.g. you may want to interpose a “tracing” scheduler for the duration of your fiber and any fibers spawned from it, but sibling fibers should use the default scheduler instead)

1 Like