I looked a bit into the kind of fiber abstraction and concurrency structure I would like to use with the new tools OCaml 5.0 is going to offer. You can find some results in affect’s Fiber module.
This fiber abstraction supports terminating by returning values or abnormally (by aborting or via a spurious exception). Termination of a fiber is aligned on function scopes: all the fibers spawn by a fiber function have to terminate in order for it to terminate.
This means that if your fiber returns a value it waits for its spawns to terminate (in any way) before returning the value. And if your fiber returns abnormally (uncaught eception or explicit abort) it first aborts all its non-terminated spawns before returning abnormally – this provides affect’s notion of cancellation.
Explicit fiber aborts raise the Abort exception in fibers. Combined with a disciplined use of Fun.protect and an optional finally handler specified at fiber spawn, this lets them release the ressources they may hold when it’s time to say goodbye.
The module also provides a generic way of blocking and unblocking fibers that you can use to interface with your favourite event loop. It does so without requiring to fiddle with effects, you just need to make judicious use of Fiber.block and provide a suitable function to Fiber.run's built-in scheduler to let it know about fibers that can be unblocked.
A grab bag of comments:
The first goal of affect is to seek a concurrency and abort structure that are easy to understand, use and compose with event loops. Right now some efficiency and implementation aspects need to be improved. This will likely change the exposed set of primitive effects which doesn’t feel exactly right yet (if you want to build your own scheduler).
I use abort rather than cancel terminology. From my non-native english speaker perspective, cancelling is more about not doing something that was planned but didn’t happen yet. Aborting is more about stopping something that is going on. It also melds better with the uncaught exception case.
Say no to unit soups! Let fibers return values.
At that point I don’t feel the need to add a promise/future abstraction to the toolbox. The whole point of direct style is to get rid of this async madness.
There’s no synchronisation structure yet. Semaphores are always useful for throttling so I’ll certainly add that at some point or a more fundamental primitive like an mvar.
The Funix module has a few fiber friendly Unix module functions for playing with timers and the network, see ping.ml for an example of use. In practice you want to be able to use something else than select(2) though. There are various ways one could go about this, see for example point 6. in these design notes.
The mouse.ml has a basic example on how to interface with the SDL event loop which provides another example on how one goes to interface Fiber with event loops.
I’m not fully convinced by everything yet. It will certainly need one or two more design rounds. If you try it, feel free to comment or make suggestions on the issue tracker.
It would be interesting to think about how this could be integrated with domainslib’s tasks domainslib/task.mli at master · ocaml-multicore/domainslib · GitHub.
E.g. it should be possible to have a fiber scheduler running on top of Domainslib tasks, right?
That should provide both concurrency (for IO bound tasks) and parallelism (for CPU-bound tasks), without the user having to deal with how to (safely) integrate the two, i.e. it should be possible to spawn a fiber on one Domain, and wait for its result from another Domain without race conditions.
For that to work well it might be useful to have a way to mark certain fibers as primarily IO bound and to have at least 1 domainslib worker dedicated to running these (it’ll be important for low-latency to react to IO as soon as results become available, don’t want lots of CPU-bound tasks flooding the domainslib workers and starving the IO-bound fibers).
Perhaps this could be in the form of optional attributes that can be interpreted by a scheduler (e.g. one may also want to specify fiber affinity to a domain, or have a priority assigned to fibers, etc.), that way affect doesn’t have to deal with the complexity around those semantics (like avoiding priority inversion): it’ll be entirely up to the scheduler what attributes it’ll define, and a very simple scheduler could provide no attributes at all.
I’d suggest:
type fiber_attribute = ..
val spawn : ?finally:( unit -> unit ) -> ?attribute:fiber_attribute -> ( unit -> 'a ) -> 'a t
(not a list on purpose: this allows the scheduler to define what attributes are valid and how to compose them, if it wants it can add extend the type with a list-like constructor, but we can let the scheduler choose the most efficient representation for it)
I like that one can provide a custom scheduler, it would be interesting to think how one can compose schedulers too, e.g.:
a scheduler for tracing (to be used when you want to debug a deadlock or understand a performance issue for example) - similarly in concept to mirage-profile. One would only “pay the price” for tracing when this custom scheduler is active.
a scheduler that maintains certain metrics (e.g. using the metrics package or something else)
a scheduler that is meant to be integrated with a fuzzer to try to find data races in user code (e.g. it could base its yield decisions on input provided by the fuzzer)
Can one “reraise” effects from within the scheduler to build a hierarchy of schedulers? Does the hierarchy have to be static or can it be changed dynamically? (e.g. you may want to interpose a “tracing” scheduler for the duration of your fiber and any fibers spawned from it, but sibling fibers should use the default scheduler instead)
Would it be nicer to use a polymorphic variant as the return type? Without looking at the documentation, it’s pretty unclear what None, Some None, etc encode.
This was already mentioned to me but I’m not very fond of the idea. I think if you contrast with join or other return functions it’s not that unclear:
val poll : 'a t -> 'a option option
val join : 'a t -> 'a option
An option is the natural type you’d give to the poll function of an event loop and why prevent us from using all the nice functions of the Option module on that result ?
(I wouldn’t mind defining type 'a Fiber.ret = 'a option if people really feel strongly about that and use that instead of 'a option where appropriate)
Makes sense from a compositionality and consistency point of view, so I think it’s a trade off between those
properties vs readability of code for
people who are not deeply familiar with the API.
Unclear which is more important
I bet you can certainly do funny business by intercepting effects and re-perform them for someone above.
However, at least as it stands for now in affect, it’s the duty of schedulers to maintain state (which fiber value is current, which fiber value is blocked etc.). So I’m a bit doubtful about a notion of “hierarchy of schedulers” as you’d need for them to cooperate on that state which I preferred to let unspecified and is going to be scheduler specific.
There’s a tension between having the Fiber module maintain strong invariants and leave schedulers (resp. clients trying to be smart) more room for interpretation (resp. for screwing up).
For now the module only tries to maintain the termination discipline and you can actually build a Spawn effect that wouldn’t have it – we can of course make it impossible for a client to build a spawn value without going through the module, I didn’t bother at that point.
Initially I had state for the “current” fiber, in the module. One direct effect (yes) of removing it is that the payload of Abort' had to move from E.t to E.t option. It’s also no longer possible to give a simple implementation of Fiber.self_id as is mentioned in point 3. of the design notes. It’s a bit unfortunate but somehow I still like that different part of my program can have a call to Fiber.run without them interfering.