How to "block" in an agnostic way?

My problem is that you stated that the boolean is “true if the task is successfully resumed and false if the task is cancelled”. I don’t understand from your explanation how the implementation can ensure this property. Is that option 3, as suggested by @gasche? Or is it something else?

It is easier to discuss this concretely with some working code. I’ll try to extend the MVar with cancellation in the next few days to capture what I mean.

I read the paper this morning. Remarks:

  • The core data structure is an unbounded queue that corresponds to Eio’s Stream, without a take_nonblocking operation. Producers write values into the queue, and consumers write suspended computations. Synchronized data structures are then built on top of this core queue.

  • For example, you can define a Mutex on top of this, where unlocking the mutex (returns directly, if there are no waiters, or) “produces” a unit value, and locking the mutex (returns directly, if the lock is not held, or) consumes the next value.

  • It’s interesting that the core component corresponds to Eio (unbounded) “Stream” (where producers and consumers put and take concurrently) rather than Eio “Waiters” (where you can only put if there is already a waiter), while Waiters are omnipresent in Eio’s synchronized data structures. They insist that this allows producers to run ahead of consumers, which is important for efficiency: when the consumer finds a result already available, they can continue running without suspending themselves. But in practice most examples they give only put/resume when we know for sure that there is already a consumer.

  • They distinguish two different APIs for cancellations, a “simple” pull-based API where consumers tell if they were cancelled (your 'a -> bool proposal), and a “smart” push-based API where the data-structure libraries (on top of the core queue) provide callbacks to modify themselves as soon as cancellation is learned.

  • The “smart” cancellation mode is complex! Basically it deals with the idea of “if all consumers get cancelled, I need to push my value back into the datastructure” in a generic way.

  • There is an example where the “simple” cancellation mode is not good enough, with readers-writers lock. It’s a lock that can be held by either many readers together, or a single writer. To respect ordering properties, if a writer is waiting on the lock, then other readers trying to get read access need to wait on the writer first. In the case where a set of readers R1 is holding the lock, a writer W is waiting on it, and a set of readers R2 are waiting on W, you want to use the “smart” cancellation API to learn eagerly that W gets cancelled, to be able to add R2 to the reading set. With the “simple” API, what happens is that the lock waits until the R1 are done and released the lock, then it tries W, notices that it has been cancelled, and gives access to the R2.

  • Two comments on their treatment of futures/fibers:

    • They point out that it’s important to erase references to fibers that get cancelled, to avoid data leaks where the GC retains their reachable values for too long. I wonder if OCaml concurrency libraries (Eio for example) are careful about this?
    • They are not giving back control to cancelled fibers to cleanup their resources. (Maybe in Kotlin it’s usual to have resources handled by the GC using finalizers?) I wonder if this is something that would be handled separately, or should be taken into account in these algorithms.
2 Likes

CML has a similar problem, where a given continuation may be referred from multiple channels due to selective communication. When the continuation is resumed from one of the blocked places, the other references must be removed to avoid space leak.

CML solves it with a simple trick. Rather than store the continuation directly in each of the locations, it stores a reference that points to the continuation. This reference is null-ed when the continuation is resumed. When subsequent channel operations observe that the continuation is gone, they skip over them.

When tasks are cancelled, CQS removes the entry from the lock-free doubly-linked list. This is nice, but the underlying data structure is more complicated. Is cancellation so pervasive that leaving the blocks (but not the continuations) hanging in the queue a problem?

1 Like

(For context: CQS (CancellableQueueSynchronizer) is the name of the “core queue data structure” from the arXiv paper.)

I’m not sure either. I find the example of readers-writers lock a more convincing reason for having a smart/eager cancellation API. (But even for this example, there may be a simpler approach, for example checking when a new reader subscribes whether the writers have been cancelled.)

Note that Waiters is an internal API, not exposed to users of Eio. It requires a mutex when used across domains, and I’d be happy to replace it with the algorithm from this paper (though it would need a bit of thinking about).

Yes. Waiters uses a doubly-linked list so that cancelled waiters can be removed promptly. There are several places in Lwt where cancelled callbacks can’t be removed (e.g. switches, timers) and can pile up if you’re not careful. Though I haven’t found that to be a big problem with Lwt in practise; you can leak a lot of memory on a modern computer without causing a noticeable problem! But I like not having to think about that.

2 Likes

I think it would be good to avoid using effects directly here. I doubt the specific effect needs to be baked into the stdlib, and doing so will risk backwards compatibility issues as the use of effects in the language evolves.

Rather than having a version of Lazy.force that performs Suspend if the value is already being forced, I think you can provide a function like:

val force_or_register :
  'a Lazy.t -> (unit -> 'b) -> ('b -> 'a -> unit) -> ('a, 'b) either

such that force_or_register l create write either returns the result of forcing l or it returns the result of calling create and registers write to be called with the result of create and the value from l when l has finished being forced.

I think an interface like this is enough to implement whatever form of suspension that a scheduler wants.

As an example, you could use the following to handle lazy values shared with other domains within Async:

let force l =
  match Lazy.force_or_register l Ivar.create Ivar.fill with
  | Left x -> Deferred.return x
  | Right i -> Ivar.read i

FWIW my understanding of cancellation systems is that they invariably end up being the most complicated part of a system and that they are never used enough to justify this complexity. So I’d be pretty cautious about trying to build one into part of the stdlib.

1 Like

On further thought, I think an interface like this might be better:

type 'a lazy_result =
  | Value : 'a -> 'a lazy_result
  | Suspend : (('a -> unit) -> unit) -> 'a lazy_result

val force_or_suspend : 'a Lazy.t -> 'a lazy_result

The important part is that any proposed effect-based interface can trivially be replaced by a non-effect one that returns ordinary data. This can easily be used with async:

let force l =
  match force_or_suspend l with
  | Value x -> Deferred.return x
  | Suspend register ->
      let i = Ivar.create () in
      register (fun x -> Ivar.fill i x);
      Ivar.read i

or using the Threads library:

let force l =
  match force_or_suspend l with
  | Value x -> x
  | Suspend register ->
      let r = ref None in
      let s = Semaphore.Binary.make false in
      register (fun x -> r := Some x; Semaphore.Binary.release s);
      Semaphore.Binary.acquire s;
      Option.get !r

or by something based on effects:

let force l =
  match force_or_suspend l with
  | Value x -> x
  | Suspend register -> perform (Suspend register)

In fact, I think it would be a good idea to add such functions to all the blocking primitives in the Threads library and move those versions of the primitives into the Stdlib – leaving only the blocking versions in the actual Threads library. That will allow code using different schedulers to interoperate with code using threads. A single mutex can be locked by blocking the current thread, or by creating a Deferred.t that will run once the lock is held, or by suspending the current eio fiber. I guess that might require changing some of the implementations if they are currently relying on the implementations provided by pthreads.

Similarly, it would be good to get e.g mvars and ivars into the stdlib without the actual blocking aspect, so that they can be shared by all the existing and future concurrency libraries.

3 Likes

I like your second proposal.

I’m not sure it’s better than using a builtin Suspend effect. The rationale for your desire to avoid effect is to avoid compatibility issues once we make effects typed. But we’re kind of screwed here anyway if libraries like Eio use effects and start getting wide usage. (Besides, the kind of “breakage” arising from “now I see type errors in blocking code!” maye actually be a good thing rather than a bad thing.)

Your proposal offloads the complexity to users of the library, and is much easier to use by implementers of other concurrency layers than by direct users. In contrast, the Suspend proposal keeps a very simple API (we could even use just force with the usual type); one downside, if the standard library provides a default handler for it that blocks the domain, is that the failure mode for users of dedicated concurrency libraries that forgot to setup their handlers is much less obvious – it could go completely unnoticed.

I think that we could start to build steam around the “complex API for advanced users, no effects” design, it’s easy to add an effect on top if we want to.

1 Like

Note: at the start of the thread (no pun intended) I mentioned ( How to "block" in an agnostic way? - #2 by gasche ) that an alternative could be to just rely on the Threads module. It’s clunkier, but possibly much simpler, and I’m not sure that there would be a performance cost for concurrency libraries on top (presumably they only spawn threads when they notice that the current threads are all blocked (except for the tick thread or something), and such blocking only comes from cross-domain or cross-library interactions so they are the rare case.) I wonder if people have feedback about that – maybe it cannot work for an obvious reason?