I’m happy to announce that, as of version 0.3.0, kcas
can be considered to be a software transactional memory (STM) implementation based on lock-free multi-word compare-and-set (MCAS).
The main feature added in 0.3.0 is the ability to block — in a scheduler friendly manner — awaiting for changes to shared memory locations.
Let’s explore this by writing a short example with the help of MDX. (Yes, I’m actually testing this announcement.)
First we’ll require and open the kcas_data
library:
# #require "kcas_data"
# open Kcas_data
kcas_data
gives us a number of domain safe and composable data structures and communication and synchronization primitives, such as a Queue
and a Promise
, for concurrent programming.
Let’s then create a message queue:
# let greeter_queue = Queue.create ()
val greeter_queue : '_weak1 Kcas_data.Queue.t = <abstr>
And spawn a “greeter” domain that responds to messages with a greeting:
# let greeter_domain = Domain.spawn @@ fun () ->
let rec loop () =
match Queue.take_blocking greeter_queue with
| `Close -> ()
| `Greet (target, resolver) ->
Promise.resolve resolver (Printf.sprintf "Hello, %s!" target);
loop ()
in
loop ()
val greeter_domain : unit Domain.t = <abstr>
Let’s also create a helper function, greet
, to interact with the greeter:
# let greet target =
let promise, resolver = Promise.create () in
Queue.add (`Greet (target, resolver)) greeter_queue;
Promise.await promise
val greet : string -> string = <fun>
Now we can call the greeter, which is running in another domain, as if it was a regular function. So, here is to you:
# greet "fellow concurrent programmer"
- : string = "Hello, fellow concurrent programmer!"
And everyone else:
# greet "the rest of the world"
- : string = "Hello, the rest of the world!"
Let’s not forget to clean up:
# Queue.add `Close greeter_queue
- : unit = ()
# Domain.join greeter_domain
- : unit = ()
The blocking mechanism in kcas
does not only work with plain domains and systhreads. It can also work across schedulers such as Eio and Domainslib, which both recently merged support for it and should soon have the necessary support out-of-the-box.
Finally, one might ask — what is the cost of all this?
It turns out that after some careful optimizations, kcas
performs pretty much as well as it used to. As a random data point, at the time of writing this, the Queue
provided by the latest version of kcas_data
can actually be faster than the Michael-Scott queue implementation from the latest version of lockfree
:
Kcas_data.Queue : mean = 0.005985, sd = 0.000001 tp=50121937.812194
Lockfree.MSQueue: mean = 0.013976, sd = 0.000001 tp=21465000.358236
Don’t be fooled, however. It is clear that the composability of kcas
adds overhead — probably something generally between 1x to 4x in time and space — compared to non-composable lock-free data structures using plain Atomic
s and it has already been demonstrated that a much faster version of the Michael-Scott queue could be implemented.
Nevertheless, the take home message is that STM could very well be fast enough for your application. The extra nanoseconds are probably not going to be the main bottlenecks in most concurrent programs.