Using `[@poll error]` attribute to implement systhread safe data structures

For a long time OCaml has supported lightweight threads exposed via the Thread module. These threads are often called “systhreads”, but I will simply call them “threads” in this post.

The OCaml Stdlib also provides many mutable data structures such as Hashtbl, Queue, and Stack. As the documentation alerts, none of these are thread-safe.

In this post I will very briefly describe an approach to implementing lock-free thread-safe data structures.

In OCaml 4 and in OCaml 5, within a single Domain, only a single thread may run at a time. In other words, threads do not run in parallel except when they run in different domains in OCaml 5. The OCaml runtime schedules threads and semi pre-emptively switches (within a domain) between threads (created within the domain) during “safe points”. In other words, thread switches cannot happen at arbitrary points — they may only happen at safe points. Memory allocations are safe points. Additional safe points (where no actual memory allocation happens) are inserted into various constructs such as loops.

This means that within a block of code where there are no safe points it is possible to make multiple read and write accesses atomically with respect to threads (within the domain).

How does one ensure that a block of code does not include a safe point?

The OCaml compiler provides an annotation [@poll error] that one can use on a function to ensure that the function does not include a safe point.

IOW, using [@poll error] one can essentially create functions that are executed atomically with respect to threads (within a domain).

With a particular application in mind, I have created a lock-free thread-safe (integer keyed) hash table, thread-table.

As mentioned in the README, the implementation has “zero synchronization overhead on lookups”. Indeed, if you look at the find operation implementation

let find t k' =
  let h = k' in
  let buckets = t.buckets in
  let n = Array.length buckets in
  let i = h land (n - 1) in
  find k' (Array.unsafe_get buckets i)

it includes no synchronization. In this case not even a [@poll error] attribute is needed.

For other operations the thread-table implementation uses functions annotated with the [@poll error] attribute (to make atomic updates) and familiar lock-free programming patterns such as retry loops and cooperation to avoid starvation. As an example, see the add operation implementation:

let[@poll error] add_atomically t buckets n i before after =
  t.rehash = 0 && buckets == t.buckets
  && before == Array.unsafe_get buckets i
  && begin
       Array.unsafe_set buckets i after;
       let length = t.length + 1 in
       t.length <- length;
       if n < length && n < max_buckets_div_2 then t.rehash <- n * 2;

let rec add t k' v' =
  let h = k' in
  maybe_rehash t;
  let buckets = t.buckets in
  let n = Array.length buckets in
  let i = h land (n - 1) in
  let before = Array.unsafe_get buckets i in
  let after = Cons (k', v', before) in
  if not (add_atomically t buckets n i before after) then add t k' v'

Compared to e.g. using a Stdlib Mutex to protect a data structure against concurrent accesses by threads, this sort of lock-free implementation can give better performance (especially for read-only operations) and also allows use of the operations in contexts, such as signal handlers, where locks are not appropriate.

Note that this technique is not sufficient for parallelism-safe implementation of data structures.


Thanks for the write-up! I do not remember someone writing about this before.

This trick is used in JaneStreet’s Nano_mutex and Thread_safe_queue. [@poll error] was in fact motivated by these use-cases (and I am surprised not to see them used in the latest version of JaneStreet’s libraries).

As you note, with multicore OCaml, these data structures should never be shared between different domains, but the technique remains valid and useful for data structures designed to stay on a single domain.

Be careful that [@poll error] is a recent addition (OCaml 4.14). Earlier version of OCaml require attributes to disable inlining (among other things), to avoid that polling points could be added during compilation via code transformations. [@poll error] has the correct semantics in this regard in OCaml 4.14, but earlier OCaml versions will disregard the attribute and potentially produce incorrect code in the absence of additional attributes. Also, I would not recommend trying to do without the [@poll error] attribute, because this is error-prone and requires knowledge of the compiler.

[@poll error] is also inoperant in bytecode (which is trickier because it has more polling locations).

Lastly, it should be noted that [@poll error] is very inexpressive in the kind of code that it accepts. The reasoning-about-polling-locations trick is also used in parts of the stdlib, for which [@poll error] is not expressive enough. I proposed a more expressive attribute to handle those cases, but it was not accepted. There is also a proposal to delay the polling with “masking” during critical sections, at runtime (hence even more expressive).

As far as I understand, the usage of [poll error] starts to be interesting when we start to use a mix of Thread and Domain? For instance, if we allocate only Domains, the usage of an Hashtbl into one (and uniquely one, the Hashtbl is not shared between Domains) is “safe”? Moreover, Mutex still is the best practice (regardless Domain or Thread) to protect an Hashtbl against data-race conditon?

Yes, and also when using only Threads (and no Domains).

One might ask why one would use Threads when we have Domains and effects?

My comment here hopefully provides some ideas where threads could still be very useful. The tl;dr is that threads could be used to allow effects based schedulers to effectively share domains and threads could also be used, in part, to e.g. implement IO in such a way that it becomes scheduler independent. If we do use threads, then it will likely be very useful to be able to implement communication between threads within a domain with as little synchronization as possible.

If you mean the Stdlib Hashtbl, then, yes, it is neither thread-safe nor parallelism-safe and one will need to e.g. use a Mutex to protect it when it might be accessed from multiple threads concurrently or from multiple domains in parallel.

As another currently available alternative, the Kcas library comes with a companion package of lock-free and parallelism-safe (and also thread-safe) data structures including a Hashtbl implementation that is designed to be an almost drop-in replacement for the Stdlib Hashtbl. When used in parallel from multiple domains it should provide better performance than Stdlib Hashtbl protected by a Mutex. It is also composable (read from “But why should you care about composability?”), which can make the implementation of more interesting use cases a breeze compared to the use of non-composable concurrent programming techniques.

Did you consider the idea to integrate that directly into the Stdlib’s Hashtbl?

Yes, and not really.

The API of Stdlib Hashtbl is not designed for concurrent programming. In sequential use cases it will be faster than any parallelism safe implementation that supports the full API.

The reason for mimicking the Stdlib Hashtbl API in Kcas is to allow for easier learning curve and to potentially make it easier to port an existing application to parallelism-safe OCaml 5.

In the future I expect there will be data structures that are designed from the start for concurrent programming and will e.g. avoid or de-emphasize features with inherent sequential bottlenecks (such as maintenance of exact length) and operations with inherent risk of starvation (such as being able to (atomically) insert an arbitrary number of elements to a data structure) and provide fused operations that support common use cases (such as get-or-add).

1 Like