How to "block" in an agnostic way?

The semantics are the same as Set or Map but the performance is different. So you only need to be careful if you want it to run quickly. You don’t need to be careful for safety reasons as you must with a hash table.

1 Like

Note: I think that “persistent interface over mutable hashtables” is a rather different discussion from “How to block in an agnostic way”, and Discuss doesn’t support threading so well, so I would encourage you to create a new thread to discuss this.

4 Likes

My current understanding of cancellation, thanks to other thread ( Understanding cancellation (in eio) ) is that it only comes into play if we want to use Suspend to expose low-level asynchronous operations, but not in the case where we use Suspend to wait on a computation happening somewhere else. I think that the latter is a very common use-case for “agnostic blocking”, and it can use your simpler Suspend effect. (Even a cancellation-aware scheduler can use the simple Suspend in this case.)

If we want to later provide a scheduler-agnostic way to implement low-level asynchronous operations with cancellation, we would possibly want a second effect with a different interface, but the users of the first effect would need no change.

1 Like

No, I just picked a bad example! Low-level operations can use a private effect and do whatever they like, but agnostic blocking needs cancellation too. Examples we have so far:

  • Moving Eio’s Promise, Stream and Semaphore to stdlib and using them with other schedulers. All of these need to support cancellation.

  • Communicating with domainslib. We should certainly be able to stop waiting for a domainslib Chan.t, for example. Presumably domainslib itself will support cancellation at some point, and in that case we should let it know about the cancellation too.

The kind of operations that might not need to support cancellation are things like unlinking a file, where the OS might provide no way to stop the operation once it has been submitted. In that case you just have to wait for it to complete.

When we cancel a task/fibre, how quickly do we want to let the synchronization structures know? For example, if a task that is currently blocked on an MVar is cancelled, do we want the MVar to be able to skip over it, and when? Does it remove the task from the MVar at the point of cancellation (strict), or lazily when the MVar attempts to resume the task. For example, the lazy semantics can be obtained by changing the interface of the scheduler to

(* Returns [true] if the task is successfully resumed. [false] if the task was cancelled. *) 
type 'a resumer = 'a -> bool 
type _ eff += Suspend : ('a resumer -> unit) -> 'a eff

Unclear how to do strict removal. If the underlying structures are implemented using lock-free libraries, then the design of the interface should be expressive enough to not restrict where the linearization point will be.

This paper looks quite interesting for this discussion: https://arxiv.org/pdf/2111.12682.pdf.

1 Like

I think “allowing users the freedom to develop their own concurrency libraries” is a big mistake because it can lead to users actually developing their own concurrency libraries. Imagine for a moment what it would be like if we had two incompatible async libraries?

2 Likes

The semantics issue that @talex5 is pointing out, I think, is the following: there is a difference between suspending on shared result and suspending on a unique resource.

  • If I suspend to wait until a lazy computation is available to all its waiters, cancellation is just a question of performance (if my suspension callback gets called but I was invalidated, the scheduler is going to try to invoke me later only to find that it is useless).
  • But if I suspend until a resource is available that I’m planning to consume, for example “take the next Stream element and give it to me”, then the producer I’m suspending onto needs to learn that I was cancelled for correctess reason, not just efficiency. The resource they produced needs to be given to another waiter, or kept, if I was cancelled.

(Again we see a transactional expectation: either the resource is consumed and I run, or I was cancelled and no resource is wasted.)

How does that work? A long time might pass between the moment the resumer function is called and the moment the execution is actually resumed, which means that the task might be cancelled during that period. This opens three choices:

  1. The boolean is only an approximation, for optimization purpose.
  2. The resumer function does not return until the task has actually resumed.
  3. Tasks cannot be cancelled if they are already waiting to be resumed.

I feel like options 2 and 3 are bad ideas, which only leaves option 1. Is there some other option I could have missed?

I believe that (3) is the only sensible option here, at least if you want to provide transactional guarantees (from the suspender side: either the resource we were waiting for was produced, and we run, or cancellation happened and the resource was not produced / not wasted.) I tried to discuss this in Understanding cancellation (in eio) - #16 by gasche

1 Like

I don’t see why this is a problem. We may expect to reason about cancellation using linearizability. The cancellation of a task may take effect before the MVar put or get operation by that task. In this case, the MVar operation by the task should not be matched. Or the cancellation occurs after the MVar operation is matched. In this case, the operation will be added to the scheduler queue to resume it. This case will be handled no differently than handling the cancellation of a task which is currently in the scheduler queue.

We may even ascribe weaker semantics for cancellation that the cancelled tasks are eventually reaped. In this case, the pending MVar operation of a cancelled task may be matched. This semantics is weaker than linearizability and may potentially lead to better performance.

1 Like

My problem is that you stated that the boolean is “true if the task is successfully resumed and false if the task is cancelled”. I don’t understand from your explanation how the implementation can ensure this property. Is that option 3, as suggested by @gasche? Or is it something else?

It is easier to discuss this concretely with some working code. I’ll try to extend the MVar with cancellation in the next few days to capture what I mean.

I read the paper this morning. Remarks:

  • The core data structure is an unbounded queue that corresponds to Eio’s Stream, without a take_nonblocking operation. Producers write values into the queue, and consumers write suspended computations. Synchronized data structures are then built on top of this core queue.

  • For example, you can define a Mutex on top of this, where unlocking the mutex (returns directly, if there are no waiters, or) “produces” a unit value, and locking the mutex (returns directly, if the lock is not held, or) consumes the next value.

  • It’s interesting that the core component corresponds to Eio (unbounded) “Stream” (where producers and consumers put and take concurrently) rather than Eio “Waiters” (where you can only put if there is already a waiter), while Waiters are omnipresent in Eio’s synchronized data structures. They insist that this allows producers to run ahead of consumers, which is important for efficiency: when the consumer finds a result already available, they can continue running without suspending themselves. But in practice most examples they give only put/resume when we know for sure that there is already a consumer.

  • They distinguish two different APIs for cancellations, a “simple” pull-based API where consumers tell if they were cancelled (your 'a -> bool proposal), and a “smart” push-based API where the data-structure libraries (on top of the core queue) provide callbacks to modify themselves as soon as cancellation is learned.

  • The “smart” cancellation mode is complex! Basically it deals with the idea of “if all consumers get cancelled, I need to push my value back into the datastructure” in a generic way.

  • There is an example where the “simple” cancellation mode is not good enough, with readers-writers lock. It’s a lock that can be held by either many readers together, or a single writer. To respect ordering properties, if a writer is waiting on the lock, then other readers trying to get read access need to wait on the writer first. In the case where a set of readers R1 is holding the lock, a writer W is waiting on it, and a set of readers R2 are waiting on W, you want to use the “smart” cancellation API to learn eagerly that W gets cancelled, to be able to add R2 to the reading set. With the “simple” API, what happens is that the lock waits until the R1 are done and released the lock, then it tries W, notices that it has been cancelled, and gives access to the R2.

  • Two comments on their treatment of futures/fibers:

    • They point out that it’s important to erase references to fibers that get cancelled, to avoid data leaks where the GC retains their reachable values for too long. I wonder if OCaml concurrency libraries (Eio for example) are careful about this?
    • They are not giving back control to cancelled fibers to cleanup their resources. (Maybe in Kotlin it’s usual to have resources handled by the GC using finalizers?) I wonder if this is something that would be handled separately, or should be taken into account in these algorithms.
2 Likes

CML has a similar problem, where a given continuation may be referred from multiple channels due to selective communication. When the continuation is resumed from one of the blocked places, the other references must be removed to avoid space leak.

CML solves it with a simple trick. Rather than store the continuation directly in each of the locations, it stores a reference that points to the continuation. This reference is null-ed when the continuation is resumed. When subsequent channel operations observe that the continuation is gone, they skip over them.

When tasks are cancelled, CQS removes the entry from the lock-free doubly-linked list. This is nice, but the underlying data structure is more complicated. Is cancellation so pervasive that leaving the blocks (but not the continuations) hanging in the queue a problem?

1 Like

(For context: CQS (CancellableQueueSynchronizer) is the name of the “core queue data structure” from the arXiv paper.)

I’m not sure either. I find the example of readers-writers lock a more convincing reason for having a smart/eager cancellation API. (But even for this example, there may be a simpler approach, for example checking when a new reader subscribes whether the writers have been cancelled.)

Note that Waiters is an internal API, not exposed to users of Eio. It requires a mutex when used across domains, and I’d be happy to replace it with the algorithm from this paper (though it would need a bit of thinking about).

Yes. Waiters uses a doubly-linked list so that cancelled waiters can be removed promptly. There are several places in Lwt where cancelled callbacks can’t be removed (e.g. switches, timers) and can pile up if you’re not careful. Though I haven’t found that to be a big problem with Lwt in practise; you can leak a lot of memory on a modern computer without causing a noticeable problem! But I like not having to think about that.

2 Likes

I think it would be good to avoid using effects directly here. I doubt the specific effect needs to be baked into the stdlib, and doing so will risk backwards compatibility issues as the use of effects in the language evolves.

Rather than having a version of Lazy.force that performs Suspend if the value is already being forced, I think you can provide a function like:

val force_or_register :
  'a Lazy.t -> (unit -> 'b) -> ('b -> 'a -> unit) -> ('a, 'b) either

such that force_or_register l create write either returns the result of forcing l or it returns the result of calling create and registers write to be called with the result of create and the value from l when l has finished being forced.

I think an interface like this is enough to implement whatever form of suspension that a scheduler wants.

As an example, you could use the following to handle lazy values shared with other domains within Async:

let force l =
  match Lazy.force_or_register l Ivar.create Ivar.fill with
  | Left x -> Deferred.return x
  | Right i -> Ivar.read i

FWIW my understanding of cancellation systems is that they invariably end up being the most complicated part of a system and that they are never used enough to justify this complexity. So I’d be pretty cautious about trying to build one into part of the stdlib.

1 Like

On further thought, I think an interface like this might be better:

type 'a lazy_result =
  | Value : 'a -> 'a lazy_result
  | Suspend : (('a -> unit) -> unit) -> 'a lazy_result

val force_or_suspend : 'a Lazy.t -> 'a lazy_result

The important part is that any proposed effect-based interface can trivially be replaced by a non-effect one that returns ordinary data. This can easily be used with async:

let force l =
  match force_or_suspend l with
  | Value x -> Deferred.return x
  | Suspend register ->
      let i = Ivar.create () in
      register (fun x -> Ivar.fill i x);
      Ivar.read i

or using the Threads library:

let force l =
  match force_or_suspend l with
  | Value x -> x
  | Suspend register ->
      let r = ref None in
      let s = Semaphore.Binary.make false in
      register (fun x -> r := Some x; Semaphore.Binary.release s);
      Semaphore.Binary.acquire s;
      Option.get !r

or by something based on effects:

let force l =
  match force_or_suspend l with
  | Value x -> x
  | Suspend register -> perform (Suspend register)

In fact, I think it would be a good idea to add such functions to all the blocking primitives in the Threads library and move those versions of the primitives into the Stdlib – leaving only the blocking versions in the actual Threads library. That will allow code using different schedulers to interoperate with code using threads. A single mutex can be locked by blocking the current thread, or by creating a Deferred.t that will run once the lock is held, or by suspending the current eio fiber. I guess that might require changing some of the implementations if they are currently relying on the implementations provided by pthreads.

Similarly, it would be good to get e.g mvars and ivars into the stdlib without the actual blocking aspect, so that they can be shared by all the existing and future concurrency libraries.

3 Likes

I like your second proposal.

I’m not sure it’s better than using a builtin Suspend effect. The rationale for your desire to avoid effect is to avoid compatibility issues once we make effects typed. But we’re kind of screwed here anyway if libraries like Eio use effects and start getting wide usage. (Besides, the kind of “breakage” arising from “now I see type errors in blocking code!” maye actually be a good thing rather than a bad thing.)

Your proposal offloads the complexity to users of the library, and is much easier to use by implementers of other concurrency layers than by direct users. In contrast, the Suspend proposal keeps a very simple API (we could even use just force with the usual type); one downside, if the standard library provides a default handler for it that blocks the domain, is that the failure mode for users of dedicated concurrency libraries that forgot to setup their handlers is much less obvious – it could go completely unnoticed.

I think that we could start to build steam around the “complex API for advanced users, no effects” design, it’s easy to add an effect on top if we want to.

1 Like

Note: at the start of the thread (no pun intended) I mentioned ( How to "block" in an agnostic way? - #2 by gasche ) that an alternative could be to just rely on the Threads module. It’s clunkier, but possibly much simpler, and I’m not sure that there would be a performance cost for concurrency libraries on top (presumably they only spawn threads when they notice that the current threads are all blocked (except for the tick thread or something), and such blocking only comes from cross-domain or cross-library interactions so they are the rare case.) I wonder if people have feedback about that – maybe it cannot work for an obvious reason?