domainslib
provides such an abstraction domainslib 0.5.1 (latest) · OCaml Package
Moonpool (to pick one of the OCaml 5 effect schedulers, but there are others too) also provides a useful abstraction: GitHub - c-cube/moonpool: Commodity thread pools and concurrency primitives for OCaml 5,
This fixed pool of domains is shared between all the pools in moonpool. The rationale is that we should not have more domains than cores, so it’s easier to pre-allocate exactly that many domains, and run more flexible thread pools on top.
Yes, that is one of the problems that XAPI’s set of daemons has too: there are multiple daemons all running inside a VM, that communicate through JSON/XML serialized messages. Moving each process to be its own domain may avoid some of that overhead, but introduce other problems, I haven’t done the measurements yet.
Although if a design other than STW was chosen for the GC then I think we may not have been able to move to OCaml 5 at all with XAPI; I think the current design choice in OCaml 5 is good for backward compat/gradual upgrade scenarios: you only pay the cost for possible data races or these performance issues if you start using Domains, if not then everything is as before with Threads.
IIUC currently OCaml 5 multicore performance is inbetween OCaml 4 threads and OCaml 4 separate processes.
It is much faster than running workloads in separate threads in OCaml 4, because now at least some threads can truly run in parallel; even if they have to occasionally synchronise with each-other. That is still a lot better than what we had before, where you could only run one thread and every other OCaml thread was blocked.
It is also slower than OCaml 4 with separate processes when those processes had an efficient way to share work (i.e. with no or little synchronization and with no or little data sharing).
If your OCaml 4 processes didn’t have an efficient way of sharing data then it is a little less clear whether using OCaml 5 domains would be beneficial, and it depends a lot on the workload. My hope is that it might be beneficial for the kind of workload we have in XAPI, where delays, protocol and serialization overhead between oxenstored
, xenopsd
and XAPI
can cause minute long delays in the current threaded model, all of which can be avoided because we can give direct readonly access to immutable data structures if they are part of the same process without any copying.
And it is only a starting point, I hope that the performance of OCaml 5 can be improved in the future!
Maybe we’ll need a collection of real workloads that are currently affected by OCaml 5 STW delays (microbenchmarks are great of course, but they may not capture all aspects of real applications), and @talex5 's is certainly one of those real workloads.