Using `[@poll error]` attribute to implement systhread safe data structures

For a long time OCaml has supported lightweight threads exposed via the Thread module. These threads are often called “systhreads”, but I will simply call them “threads” in this post.

The OCaml Stdlib also provides many mutable data structures such as Hashtbl, Queue, and Stack. As the documentation alerts, none of these are thread-safe.

In this post I will very briefly describe an approach to implementing lock-free thread-safe data structures.

In OCaml 4 and in OCaml 5, within a single Domain, only a single thread may run at a time. In other words, threads do not run in parallel except when they run in different domains in OCaml 5. The OCaml runtime schedules threads and semi pre-emptively switches (within a domain) between threads (created within the domain) during “safe points”. In other words, thread switches cannot happen at arbitrary points — they may only happen at safe points. Memory allocations are safe points. Additional safe points (where no actual memory allocation happens) are inserted into various constructs such as loops.

This means that within a block of code where there are no safe points it is possible to make multiple read and write accesses atomically with respect to threads (within the domain).

How does one ensure that a block of code does not include a safe point?

The OCaml compiler provides an annotation [@poll error] that one can use on a function to ensure that the function does not include a safe point.

IOW, using [@poll error] one can essentially create functions that are executed atomically with respect to threads (within a domain).

With a particular application in mind, I have created a lock-free thread-safe (integer keyed) hash table, thread-table.

As mentioned in the README, the implementation has “zero synchronization overhead on lookups”. Indeed, if you look at the find operation implementation

let find t k' =
  let h = Mix.int k' in
  let buckets = t.buckets in
  let n = Array.length buckets in
  let i = h land (n - 1) in
  find k' (Array.unsafe_get buckets i)

it includes no synchronization. In this case not even a [@poll error] attribute is needed.

For other operations the thread-table implementation uses functions annotated with the [@poll error] attribute (to make atomic updates) and familiar lock-free programming patterns such as retry loops and cooperation to avoid starvation. As an example, see the add operation implementation:

let[@poll error] add_atomically t buckets n i before after =
  t.rehash = 0 && buckets == t.buckets
  && before == Array.unsafe_get buckets i
  && begin
       Array.unsafe_set buckets i after;
       let length = t.length + 1 in
       t.length <- length;
       if n < length && n < max_buckets_div_2 then t.rehash <- n * 2;
       true
     end

let rec add t k' v' =
  let h = Mix.int k' in
  maybe_rehash t;
  let buckets = t.buckets in
  let n = Array.length buckets in
  let i = h land (n - 1) in
  let before = Array.unsafe_get buckets i in
  let after = Cons (k', v', before) in
  if not (add_atomically t buckets n i before after) then add t k' v'

Compared to e.g. using a Stdlib Mutex to protect a data structure against concurrent accesses by threads, this sort of lock-free implementation can give better performance (especially for read-only operations) and also allows use of the operations in contexts, such as signal handlers, where locks are not appropriate.

Note that this technique is not sufficient for parallelism-safe implementation of data structures.

8 Likes

Thanks for the write-up! I do not remember someone writing about this before.

This trick is used in JaneStreet’s Nano_mutex and Thread_safe_queue. [@poll error] was in fact motivated by these use-cases (and I am surprised not to see them used in the latest version of JaneStreet’s libraries).

As you note, with multicore OCaml, these data structures should never be shared between different domains, but the technique remains valid and useful for data structures designed to stay on a single domain.

Be careful that [@poll error] is a recent addition (OCaml 4.14). Earlier version of OCaml require attributes to disable inlining (among other things), to avoid that polling points could be added during compilation via code transformations. [@poll error] has the correct semantics in this regard in OCaml 4.14, but earlier OCaml versions will disregard the attribute and potentially produce incorrect code in the absence of additional attributes. Also, I would not recommend trying to do without the [@poll error] attribute, because this is error-prone and requires knowledge of the compiler.

[@poll error] is also inoperant in bytecode (which is trickier because it has more polling locations).

Lastly, it should be noted that [@poll error] is very inexpressive in the kind of code that it accepts. The reasoning-about-polling-locations trick is also used in parts of the stdlib, for which [@poll error] is not expressive enough. I proposed a more expressive attribute to handle those cases, but it was not accepted. There is also a proposal to delay the polling with “masking” during critical sections, at runtime (hence even more expressive).

As far as I understand, the usage of [poll error] starts to be interesting when we start to use a mix of Thread and Domain? For instance, if we allocate only Domains, the usage of an Hashtbl into one (and uniquely one, the Hashtbl is not shared between Domains) is “safe”? Moreover, Mutex still is the best practice (regardless Domain or Thread) to protect an Hashtbl against data-race conditon?

Did you consider the idea to integrate that directly into the Stdlib’s Hashtbl?