[ANN] ThreadSanitizer support for OCaml 5.0.0 – first public release

We are happy to announce the first release of ThreadSanitizer support for OCaml 5.

Motivation in a nutshell

With OCaml 5 comes parallel programming, and, with it, the possibility of introducing data races. A data race is when two or more threads access the same memory location concurrently, and at least one of the accesses is a write. Data races can lead to particularly hard-to-debug problems.

ThreadSanitizer (TSan for short) is an approach developed by Google to locate data races in code bases. It consists in instrumenting your executables to keep a history of previous memory accesses (at a certain performance cost), in order to detect a potential data race, even when this race has no visible effect on the execution. ThreadSanitizer has proved very effective in detecting hundreds of concurrency bugs in large projects.

There is now a new opam switch, 5.0.0+tsan, on which all your libraries and executables will be compiled with ThreadSanitizer instrumentation. On that switch, the ThreadSanitizer runtime will run as part of your executables and emit reports when it detects a potential data race:

==================
WARNING: ThreadSanitizer: data race (pid=4170290)
  Read of size 8 at 0x7f072bbfe498 by thread T4 (mutexes: write M0):
    #0 camlSimpleRace__fun_524 /tmp/simpleRace.ml:7 (simpleRace.exe+0x43dc9d)
    #1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
    #2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
    #3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
    #4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
    #5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)

  Previous write of size 8 at 0x7f072bbfe498 by thread T1 (mutexes: write M1):
    #0 camlSimpleRace__fun_520 /tmp/simpleRace.ml:6 (simpleRace.exe+0x43dc45)
    #1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
    #2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
    #3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
    #4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
    #5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)

  Mutex M0 (0x000000567ad8) created at:
    #0 pthread_mutex_init /home/olivier/other_projects/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1316 (libtsan.so.0+0x3cafb)
    [...]

SUMMARY: ThreadSanitizer: data race /tmp/simpleRace.ml:7 in camlSimpleRace__fun_524
==================
ThreadSanitizer: reported 1 warnings

OCaml executables are instrumented so that ThreadSanitizer will show precise backtraces, including when using exceptions, effect handlers, or C stubs.

ThreadSanitizer support for OCaml is a joint work with Fabrice Buoro, based on the work of Anmol Sahoo.

Usage instructions

At this stage, ThreadSanitizer for OCaml is supported only on the x86_64 architecture, and is not available on Windows.

Internally, the OCaml instrumentation uses the libunwind library, so if you’re on Linux you will need to install it using your system’s package manager. If you’re a macOS user, you have nothing to do: libunwind is installed by default.

Then you simply need to build and run your programs in the 5.0.0+tsan switch:

opam update
opam switch create 5.0.0+tsan

Usage examples and further information can be found in the README of the project.

We are eager to get your feedback! We believe that running your Multicore programs or test suites with TSan can be a huge time-saver in debugging.

24 Likes

A data race is reported every time two memory accesses are made to overlapping memory regions, and: - one of them is a write, and - there is no established happens-before relation between them. More information about TSan’s algorithm on their wiki.

This is using the C memory model to analyze OCaml programs, right?

It works by instrumenting the executables with calls to the TSan runtime whose API follows the C model, yes. That’s why we use the mapping between the OCaml and C11 memory models described in Explain mapping between OCaml memory model and C by stedolan · Pull Request #10995 · ocaml/ocaml · GitHub. As a consequence, every data race reported by TSan should be a data race in the OCaml sense, and conversely all OCaml races (on code paths that are actually executed at runtime) should be reported.

For instance, from the point of view of TSan, all OCaml non-atomic stores are release stores (more precisely a sequence consisting of an acquire fence and a release store).

Interesting, it sounds like some amount of work. The github repo seems to be a fork of the OCaml repo; in which files should I look if I wanted to dig further?

I thought that the release store could be omitted in some cases (e.g. storing an immediate).

How does it deal with the synchronization of initializing writes with reads from another domain? (a release write with no corresponding acquire read; this cannot be explained with the C memory model). I guess it will report a race, which is fair enough for C-like uses of atomics. On the other hand this might flag “interesting” uses of the OCaml memory model.

On this image you can see the git diff --stat with trunk. It’s a relatively self-contained change. The two interesting spots are the instrumentation pass, which operates on the Cmm intermediary represation, in asmcomp/thread_sanitizer.ml[i]; and the new C file runtime/tsan.c, which handles the task of letting TSan know about function entries and exits (which is required to have correct backtraces in all circumstances) when raising exceptions or handling effects. Finally, some of the instrumentation calls are inserted in the assembly routines of runtime/amd64.S.

Now, regarding the gory details (thankfully, understanding any of this is not required to use TSan with OCaml):

Interesting, do you have an example of code that would generate a store of an immediate? OCaml writes in general (except for initializing stores) translate to calls to the C function caml_modify, which we instrumented as well (using the TSan instrumentation provided by C compilers), so that the fence and atomic_exchange that it performs are properly taken into account.

If I’m not mistaken, there are only two ways a value can be shared with another domain:

  1. Through the closure in argument of Domain.spawn;
  2. or through a mutable reference (which implies a call to caml_modify).

Both Domain.spawn and caml_modify contain an acquire fence, so other domains should see the initializing store, and TSan’s internal state should reflect that, too.

[Edit: typos.]

Reflecting and reading more about this, it seems to be a trickier question than I thought.Subtleties of memory models are not my forte. I see in Explain mapping between OCaml memory model and C by stedolan · Pull Request #10995 · ocaml/ocaml · GitHub that publication safety of initializing writes should never be a problem, but I would need more time to fully understand why. All I can say for now is that while working on TSan, we never saw a data race false positive related to an initializing write.

Thanks for the map.

Below is a technical discussion but the gist is essentially that data races are undefined behaviour in C, well-defined behaviour in OCaml, and so encoding the OCaml memory model in terms of the C memory model has to change the meaning of data race. One wants the tool to report data races in the sense of OCaml, even if those have well-defined behaviour in terms of C. In the case of initializing writes made visible to other domains through non-atomic stores, the OCaml memory model relies on some fancy stuff that cannot be expressed in C, and IIUC the tool can identify those as races. They are arguably not false positives, but on the other hand it is an open question whether this well-defined behaviour of initializing writes can be used in practice for programming (as opposed to simply being there to make racy programs memory-safe). My take-away is that it is a bit subtle to translate the memory model in a way that preserves the notion of data race.


I am not sure but my remark stems from this part of the discussion on arm64 for multicore: https://github.com/ocaml/ocaml/pull/10972#issuecomment-1028979992. The following code on arm64 emits the load barrier but not the store barrier in the following case: Arm64 multicore support by ctk21 · Pull Request #10972 · ocaml/ocaml · GitHub. The corresponding x86-64 code does not need a load barrier either (stronger memory model) but on arm64 it is easier to see that this store is involved in maintaining the memory model. I am not an expert of this part of the compiler but it probably arises from the following shortcut for immediate assignments: ocaml/asmcomp/cmm_helpers.ml at be210179503c4a82b72dd4762560e13c408d37b7 · ocaml/ocaml · GitHub.

What is missing is an acquire fence on the side of the reader. Without it (or some understanding of data dependency by TSan), it will not see a synchronization. (Which may very well be what you want.) Maybe you already know the following related discussion: OCaml multicore memory model and C (runtime, FFI, VM) · Issue #10992 · ocaml/ocaml · GitHub.

l can imagine that there are theoretical difficulties, but to take a practical example:

type t = { mutable s : string }

let v = { s = "" }

let () =
  let t1 = Domain.spawn (fun () -> v.s <- "Hello"; Unix.sleep 1) in
  let t2 = Domain.spawn (fun () -> ignore v.s; Unix.sleep 1) in
  Domain.join t1;
  Domain.join t2

This is an OCaml data race. Although it has a well-defined behaviour, presumably users would like to be informed of such races (if the race is benign, it can be silenced away from the reports). Translated to the C11 model, the code above performs an acquire fence; release store and a plain load concurrently, which TSan does report as a data race. So we get what we want.

If there are cases where the C11 model makes it hard to get TSan to detect what we want to detect and no more, I’m hoping that we can adapt the instrumentation to work around it.

1 Like

You’re right, storing an immediate will not go through caml_modify, and will be compiled as e.g. a plain store on x86-64. In this case, our instrumentation inserts a call to __tsan_write8, which is the function to signal a plain store.

I do; but so far I do not quite get how exactly data dependencies save the day and make initializing stores always visible to other domains.

I think we are on the same page here, but there are two ways to see it. (IIUC)

  • You are using a translation that is supposed to send OCaml data races to well-defined C programs, whereas TSan is designed to spot buggy C programs. So you have found independently that the translation is incorrect for what it wants to achieve (it can send OCaml programs to C programs with undefined behaviour per the standard), just as per the issue linked above. In fact, the example you give is exactly the lack of a cross-domain synchronizing read after a synchronizing store.

  • You are using a translation that sends some OCaml data races to buggy C programs, which is good for your purposes since this lets TSan spot such data races. The obvious questions is, what is the design space of translations from the OCaml memory model to the C memory model that sends OCaml data races into C data races?

I would be curious about the point of view of an expert on memory models.

This is a legitimate question, and we had not laid it out in such clear terms before, so thank you for the insight.

I think we are mapping racy OCaml programs to buggy C programs so that TSan reports them; but we need to verify that 1. we are indeed doing this in all cases, 2. that race-free OCaml programs are always mapped to well-defined C programs and hopefully 3. that we signal to TSan all the happens-before relations that are present. I’m going to look into it.