[ANN] kcas and kcas_data 0.3.0: Software Transactional Memory

I’m happy to announce that, as of version 0.3.0, kcas can be considered to be a software transactional memory (STM) implementation based on lock-free multi-word compare-and-set (MCAS).

The main feature added in 0.3.0 is the ability to block — in a scheduler friendly manner — awaiting for changes to shared memory locations.

Let’s explore this by writing a short example with the help of MDX. (Yes, I’m actually testing this announcement.)

First we’ll require and open the kcas_data library:

# #require "kcas_data"
# open Kcas_data

kcas_data gives us a number of domain safe and composable data structures and communication and synchronization primitives, such as a Queue and a Promise, for concurrent programming.

Let’s then create a message queue:

# let greeter_queue = Queue.create ()
val greeter_queue : '_weak1 Kcas_data.Queue.t = <abstr>

And spawn a “greeter” domain that responds to messages with a greeting:

# let greeter_domain = Domain.spawn @@ fun () ->
    let rec loop () =
      match Queue.take_blocking greeter_queue with
      | `Close -> ()
      | `Greet (target, resolver) ->
        Promise.resolve resolver (Printf.sprintf "Hello, %s!" target);
        loop ()
    in
    loop ()
val greeter_domain : unit Domain.t = <abstr>

Let’s also create a helper function, greet, to interact with the greeter:

# let greet target =
    let promise, resolver = Promise.create () in
    Queue.add (`Greet (target, resolver)) greeter_queue;
    Promise.await promise
val greet : string -> string = <fun>

Now we can call the greeter, which is running in another domain, as if it was a regular function. So, here is to you:

# greet "fellow concurrent programmer"
- : string = "Hello, fellow concurrent programmer!"

And everyone else:

# greet "the rest of the world"
- : string = "Hello, the rest of the world!"

Let’s not forget to clean up:

# Queue.add `Close greeter_queue
- : unit = ()
# Domain.join greeter_domain
- : unit = ()

The blocking mechanism in kcas does not only work with plain domains and systhreads. It can also work across schedulers such as Eio and Domainslib, which both recently merged support for it and should soon have the necessary support out-of-the-box.

Finally, one might ask — what is the cost of all this?

It turns out that after some careful optimizations, kcas performs pretty much as well as it used to. As a random data point, at the time of writing this, the Queue provided by the latest version of kcas_data can actually be faster than the Michael-Scott queue implementation from the latest version of lockfree:

Kcas_data.Queue : mean = 0.005985, sd = 0.000001 tp=50121937.812194
Lockfree.MSQueue: mean = 0.013976, sd = 0.000001 tp=21465000.358236

Don’t be fooled, however. It is clear that the composability of kcas adds overhead — probably something generally between 1x to 4x in time and space — compared to non-composable lock-free data structures using plain Atomics and it has already been demonstrated that a much faster version of the Michael-Scott queue could be implemented.

Nevertheless, the take home message is that STM could very well be fast enough for your application. The extra nanoseconds are probably not going to be the main bottlenecks in most concurrent programs.

13 Likes

Thanks for the readable example! Also very interesting to read blocking transactions portion especially of GitHub - ocaml-multicore/kcas: STM based on lock-free MCAS !

Can you explain how this is integrated with Eio briefly – I know you have added a link but the discussion there is quite detailed…

Maybe it’s too early for that, but wouldn’t this be a good candidate for
being in the Stdlib? This way various schedulers and libraries could
haev a (pretty minimal) rendez-vous point for blocking, which is really
nice!

This looks like phenomenal work. I’ll definitely be investigating it when we move to start actually using multicore.

Re: inclusion in stdlib, I think it’s great to have and keep these things in userland so as to maximize experimentation and competition among different abstractions, operational controls, and so on. I have some history with Clojure, which shipped with an STM from its earliest days ~2008. Alas, that implementation naturally had its own idiosyncrasies and flaws/tradeoffs, but its presence in the language’s standard library resulted in ~no alternatives being developed in the broader ecosystem, even through to today.

1 Like

What happened with the reagents approach to providing an STM like library? Could you give an impression on how you see people using them? Maybe how they compare to lockfree.

Reagents are largely driven by kcas. kcas is a software solution for executing multiple atomic operations as a single transaction on architectures providing a single-word CAS only. The current implementation of kcas requires k+1 atomic operations for k-location update.

From the project readme it clearly uses KCAS.

I don’t know much about Reagents either. But I would support the use of STM. In some ways STM is simpler to think about in many ways than locks and you could get into more problems with locks. I was introduced to STM in Haskell (where most people are) and it really allows you to do so many things simply.

kcas STM seems very intuitive and approaches Haskell. The tutorial in the README.md is just excellently written – I recommend it.


BTW I was intrigued by the torn reads section in the kcas readme. (It does not affect correctness of the implementation but efficiency, because some transactions would never commit if they had torn reads). I am curious – does the Haskell STM implementation suffer from torn reads? Also how would you compare the kcas STM implementation with the Haskell one. It would be interesting to know because Haskell’s implementation is often hailed as the best STM implementation in a programming language.

I have a further question and suggestion:

Question: OCaml data structures are often mutable or have a ref in them. It might be useful to point out bad interactions between STM and mutability in OCaml. To a large extent mutability is orthogonal to mutability via Loc.t but it would be useful to point out pitfalls and things to remember.

Suggestion I think the library should be called stm. kcas sounds obscure and very technical. Ordinarily I wouldn’t even have paid attention to the library but since your announcement mentioned explicitly STM I explored it (and was rewarded). Even though kcas may be a better description of the library, stm will tell people when to use the library. Of course, there are many primitives (and data strutures) that can be used independently of STM but then experts will know that anyways.

1 Like