Thread Safety in the Standard Library

https://v2.ocaml.org/api/Hashtbl.html:

Unsynchronized accesses to a hash table may lead to an invalid hash table state. Thus, concurrent accesses to a hash tables must be synchronized (for instance with a Mutex.t).

The above comment about the API of Hashtbl in the standard library warns that it is not thread safe. This will lead programmers to create thread-safe variants, possibly making subtle mistakes. Is there a prospect to make the standard library thread safe to avoid duplicated effort?

(post deleted by author)

2 Likes

The original Java JDK Hashtable was synchronized (hence, thread-safe). Later another “Hashmap” was added, which was unsynchronized. Typically you want both, b/c even with the most efficient synchronization primitives (compare-and-swap) you incur locked bus cycles, cache flushes, or something worse – and those are expensive resources on an SMP or NUMA machine. What might be useful is a version of the standard library that wrappered those datastructures that were thread-unsafe to make them safe.

(post deleted by author)

1 Like

David Bacon and I implemented “flat locks” in the JVM in 1998; that went on to become, IIUC, one of the standard ways of implementing pervasive locks/monitors in GCed languages. And yepper, it’s a lock-free technique for uncontended locking. It steals a word from the heap, so no indirection, no extra memory used in the expected (uncontended) case. You’d expect that if that was nearly free, you’d only have Hashtables, not Hashmaps. And yet, you have both, and the former predate the latter.

1 Like

(post deleted by author)

2 Likes

Part of the reason I disbelieve, is this:

People talk about “multicore” here a lot. But in fact, the thing they’re talking about has a different name: “SMP”. That is to say, “symmetric multiprocessing”, meaning that all memory is symmetrically accessible from all CPUs. “multicore” typically assumes NUMA, and that means there’ll be memory banks that are “near” to each CPU, and the rest are 'far" from that CPU. Even >10yr ago it was common that when running on high-performance multicore systems, you would have to “pin” your (Java) process to a particular CPU socket and “pin” your memory pages to the associated memory bank, in order to get adequate performance. Because the inter-socket bus was both slower, and more-contended.

Replacing a cache-local (or near-memory-bank-local) memory access with one that requires a cache-bus-wide transaction of some sort, is going to be slower, and you could see it in JVM applications back then.

Around here people forget that for all Java’s failings (and oh lordy, it had a lot) Java applications were heavily, heavily multithreaded (and successfully so) long before almost any other language had heavily-multithreaded applications in production. I made a career of shooting “interesting” concurrency bugs that appeared in production systems, and … there were a lot of 'em. Bugs that occurred once every six months (literally, I shit you not), stuff like that. So people’s experiences with what they think of as multithreading, is (to my mind) often not very applicable, b/c they haven’t actually experienced the horrors of really heavily multithreaded applications.

Regardless of the technical feud on performance above, the higher level picture as far as I know is that there is currently no active plan to make the standard library thread-safe as whole.

Partially (beware that I am not an expert on this subject) because thread-safety is at best a middle point: as soon as you have a mutable shared data structure, you have a distributed protocol. And functional correctness of operations on contended mutable data is very dependent on the protocol: consider for instance three concurrent user of a hashtable containing the key set K:

  • user A is deleting key a and adding key b
  • user B is printing all keys

Even with a thread-safe hashtable, we have four behaviours possible:

  1. B prints the key set K
  2. B prints the key set K + b
  3. B prints the key set K - a
  4. B prints the key set K - a + b

But maybe, your protocol needs to exclude the behaviour 2. and 3. . And when implementing this “protocol requirement”, you might end up enforcing exclusive access to the hashmap (typically, if I use a Mutex.t to make the whole set of user A operation atomic), which means that you might use a non thread-safe hashtable anyway.

Thus, the current consensus as far as I understand it to let libraries explore the design space; and probably slowly add “universal” thread-safe data structures to the standard library as an alternative to the thread-unsafe ones.

But then I imagine that if we have cases where intensive complexity analysis and benchmarks show that a thread-safe implementation is always as fast as an unsafe one, we could completly switch that data structure to the thread-safe implementation. As a non-expert I don’t know how likely this is to happen, even if I understand from @polytypic’s description that we are possibly quite close from this goal in some cases.

(Another option might to have Thread_safe variants of the standard library modules, in the same way that we have a Labels variant.)

1 Like

(post deleted by author)

5 Likes

(post deleted by author)