TSan/Thread sanitizer instrumented OCaml not outputting source line info (macos)

Hello! I am trying to use the TSan instrumented OCaml compiler (5.2.1), however I keep getting bogus source location information in the TSan traces (for example: <null>:216698500 instead of <filename>:<linenum> in a TSan report). I am on an arm m1 mac, and my systems C compiler is set to apple’s clang 16, and building my project via dune.

1 Like

Disclaimer: I have limited knowledge on the subject.

I was able to make it output the filenames by adding a line that forced the program to wait for input.
But no line numbers.

In any case, I’d advise you to run the sanitizer on a Linux box, since on apple silicon it crashes (at least in the codebase I ran it on, which has some complexities including C bindings) and you might not get the expected results.

Yeah, I assumed the experience would be smoother on a Linux machine however I do not have one, and also the documentation states that my setup should be supported:

TSan support for OCaml is currently available for the x86_64 architecture, on FreeBSD, Linux and macOS, and for the arm64 architecture on Linux and macOS. Building OCaml with TSan support requires GCC or Clang. Minimal supported versions are GCC 11 and Clang 14. Note that TSan data race reports with GCC 11 are known to result in poor stack trace reporting (no line numbers), which is fixed in GCC 12.

(OCaml - Runtime detection of data races with ThreadSanitizer)

Hi,
Are those lines in OCaml code or C stubs? Is your executable compiled with debug info? (It should be the case if you build with Dune’s default dev profile.)

If you are able to put together a reproducer, I can have a look at it.

1 Like

Hey! yes I could: here is a repo that I’ve been trying to get traces to print with line info (GitHub - yosefAlsuhaibani/eio_race: domains in a pool modifying a `ref`). In the repo I did:

opam switch create 5.2.1+options --package ocaml-option-tsan
opam install eio eio_main
dune exec bin/main.exe

I got the following trace

dune(45468,0x206b00f80) malloc: nano zone abandoned due to inability to reserve vm space.
main.exe(45468,0x206b00f80) malloc: nano zone abandoned due to inability to reserve vm space.
==================
WARNING: ThreadSanitizer: data race (pid=45468)
  Write of size 8 at 0x000109cf2f78 by thread T4 (mutexes: write M0):
    #0 camlDune__exe__Main.incr_391 <null>:94326404 (main.exe:arm64+0x100006a98)
    #1 camlEio__Executor_pool.fun_814 <null>:94326404 (main.exe:arm64+0x1000724b8)
    #2 camlEio__core__Fiber.fun_1446 <null>:94326404 (main.exe:arm64+0x1000a0a78)
    #3 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #4 caml_runstack <null>:94326404 (main.exe:arm64+0x1001f7b74)
    #5 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #6 camlEio_posix__Sched.run_1365 <null>:94326404 (main.exe:arm64+0x10000c400)
    #7 camlEio_posix__Sched.with_sched_1228 <null>:94326404 (main.exe:arm64+0x10000b49c)
    #8 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #9 camlStdlib__Domain.body_735 <null>:94326404 (main.exe:arm64+0x10013267c)
    #10 caml_start_program <null>:94326404 (main.exe:arm64+0x1001f73d0)
    #11 caml_callback_exn callback.c:201 (main.exe:arm64+0x1001b6ce8)
    #12 domain_thread_func domain.c:1215 (main.exe:arm64+0x1001bada8)

  Previous write of size 8 at 0x000109cf2f78 by thread T1 (mutexes: write M1):
    #0 camlDune__exe__Main.incr_391 <null>:94326404 (main.exe:arm64+0x100006a98)
    #1 camlEio__Executor_pool.fun_814 <null>:94326404 (main.exe:arm64+0x1000724b8)
    #2 camlEio__core__Fiber.fun_1446 <null>:94326404 (main.exe:arm64+0x1000a0a78)
    #3 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #4 caml_runstack <null>:94326404 (main.exe:arm64+0x1001f7b74)
    #5 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #6 camlEio_posix__Sched.run_1365 <null>:94326404 (main.exe:arm64+0x10000c400)
    #7 camlEio_posix__Sched.with_sched_1228 <null>:94326404 (main.exe:arm64+0x10000b49c)
    #8 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #9 camlStdlib__Domain.body_735 <null>:94326404 (main.exe:arm64+0x10013267c)
    #10 caml_start_program <null>:94326404 (main.exe:arm64+0x1001f73d0)
    #11 caml_callback_exn callback.c:201 (main.exe:arm64+0x1001b6ce8)
    #12 domain_thread_func domain.c:1215 (main.exe:arm64+0x1001bada8)

  Mutex M0 (0x00010511cda0) created at:
    #0 pthread_mutex_init <null>:94326404 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31470)
    #1 caml_plat_mutex_init platform.c:57 (main.exe:arm64+0x1001e4534)
    #2 caml_init_domains domain.c:943 (main.exe:arm64+0x1001ba168)
    #3 caml_init_gc gc_ctrl.c:353 (main.exe:arm64+0x1001ca170)
    #4 caml_startup_common startup_nat.c:111 (main.exe:arm64+0x1001f6790)
    #5 caml_main startup_nat.c:151 (main.exe:arm64+0x1001f696c)
    #6 main main.c:37 (main.exe:arm64+0x1001d93d0)

  Mutex M1 (0x00010511cc58) created at:
    #0 pthread_mutex_init <null>:94326404 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31470)
    #1 caml_plat_mutex_init platform.c:57 (main.exe:arm64+0x1001e4534)
    #2 caml_init_domains domain.c:943 (main.exe:arm64+0x1001ba168)
    #3 caml_init_gc gc_ctrl.c:353 (main.exe:arm64+0x1001ca170)
    #4 caml_startup_common startup_nat.c:111 (main.exe:arm64+0x1001f6790)
    #5 caml_main startup_nat.c:151 (main.exe:arm64+0x1001f696c)
    #6 main main.c:37 (main.exe:arm64+0x1001d93d0)

  Thread T4 (tid=5064399, running) created by main thread at:
    #0 pthread_create <null>:94326404 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 caml_domain_spawn domain.c:1265 (main.exe:arm64+0x1001baa48)
    #2 caml_c_call <null>:94326404 (main.exe:arm64+0x1001f7240)
    #3 camlStdlib__Domain.spawn_730 <null>:94326404 (main.exe:arm64+0x100132570)
    #4 camlEio_posix__Domain_mgr.fun_1446 <null>:94326404 (main.exe:arm64+0x10001f994)
    #5 camlEio_posix__Sched.fun_1760 <null>:94326404 (main.exe:arm64+0x10000cedc)
    #6 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #7 camlEio_posix__Sched.run_1365 <null>:94326404 (main.exe:arm64+0x10000c400)
    #8 camlEio_posix__Sched.with_sched_1228 <null>:94326404 (main.exe:arm64+0x10000b49c)
    #9 camlDune__exe__Main.entry <null>:94326404 (main.exe:arm64+0x100006e74)
    #10 caml_program <null>:94326404 (main.exe:arm64+0x1000036c4)
    #11 caml_start_program <null>:94326404 (main.exe:arm64+0x1001f73d0)
    #12 caml_startup_common startup_nat.c:132 (main.exe:arm64+0x1001f6894)
    #13 caml_main startup_nat.c:151 (main.exe:arm64+0x1001f696c)
    #14 main main.c:37 (main.exe:arm64+0x1001d93d0)

  Thread T1 (tid=5064396, running) created by main thread at:
    #0 pthread_create <null>:94326404 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 caml_domain_spawn domain.c:1265 (main.exe:arm64+0x1001baa48)
    #2 caml_c_call <null>:94326404 (main.exe:arm64+0x1001f7240)
    #3 camlStdlib__Domain.spawn_730 <null>:94326404 (main.exe:arm64+0x100132570)
    #4 camlEio_posix__Domain_mgr.fun_1446 <null>:94326404 (main.exe:arm64+0x10001f994)
    #5 camlEio_posix__Sched.fun_1760 <null>:94326404 (main.exe:arm64+0x10000cedc)
    #6 camlStdlib__Fun.protect_326 <null>:94326404 (main.exe:arm64+0x10015a840)
    #7 camlEio_posix__Sched.run_1365 <null>:94326404 (main.exe:arm64+0x10000c400)
    #8 camlEio_posix__Sched.with_sched_1228 <null>:94326404 (main.exe:arm64+0x10000b49c)
    #9 camlDune__exe__Main.entry <null>:94326404 (main.exe:arm64+0x100006e74)
    #10 caml_program <null>:94326404 (main.exe:arm64+0x1000036c4)
    #11 caml_start_program <null>:94326404 (main.exe:arm64+0x1001f73d0)
    #12 caml_startup_common startup_nat.c:132 (main.exe:arm64+0x1001f6894)
    #13 caml_main startup_nat.c:151 (main.exe:arm64+0x1001f696c)
    #14 main main.c:37 (main.exe:arm64+0x1001d93d0)

source info for C functions are printed correctly (e.g startup_nat.c:151) but OCaml functions seem to get bad source info (e.g <null>:94326404)

Ocaml code; I did try building it with the initial default dev settings, but the issue persisted so I tried explicitly adding -g + -fsanitize=thread explicitly

Thanks. Opam is being funny right now so I can’t try it out, but I will try to do this soon.

The compiler has a number of tests for TSan’s behaviour. Unfortunately we are not running them regularly on a Mac at the moment. You can run them to see if they pass in your setup:

git clone https://github.com/ocaml/ocaml
cd ocaml
./configure --enable-tsan
make -j
cd testsuite
make one DIR=tests/tsan

@lambda_foo have you ever had issues with location reporting by TSan on macOS?

1 Like

Ran the commands as suggested, and ran the tests in the TSan dir and it seems like I fail 3 test cases

List of failed tests:
    tests/tsan/exn_from_c.ml
    tests/tsan/exn_in_callback.ml
    tests/tsan/exn_reraise.ml

and all three seem to fail in a similar way

> Running action 5/8 (check-program-output)
> Comparing program output to reference 
> Action 5/8 (check-program-output) => failed (program output differs from reference): 
> --- exn_reraise.reference
> +++ ocamlopt.byte/exn_reraise.opt.output
> @@ -4,11 +4,6 @@
>  Entering i
>  Throwing ExnA...
>  Caught an ExnA
> -Raised at Exn_reraise.i in file "exn_reraise.ml", line 28, characters 9-21
> -Called from Exn_reraise.h in file "exn_reraise.ml", line 33, characters 6-10
> -Called from Exn_reraise.g in file "exn_reraise.ml", line 39, characters 2-6
> -Called from Exn_reraise.f in file "exn_reraise.ml", line 44, characters 7-11
> -Leaving f
>  ==================
>  WARNING: ThreadSanitizer: data race (pid=<implemspecific>)
>    Write of size 8 at <implemspecific> by thread T1 (mutexes: write M<implemspecific>):
> @@ -49,4 +44,9 @@
>  
>  SUMMARY: ThreadSanitizer: data race (<systemspecific>:<implemspecific>) in camlExn_reraise$writer_<implemspecific>
>  ==================
> +Raised at Exn_reraise.i in file "exn_reraise.ml", line 28, characters 9-21
> +Called from Exn_reraise.h in file "exn_reraise.ml", line 33, characters 6-10
> +Called from Exn_reraise.g in file "exn_reraise.ml", line 39, characters 2-6
> +Called from Exn_reraise.f in file "exn_reraise.ml", line 44, characters 7-11
> +Leaving f
>  ThreadSanitizer: reported 1 warnings
> 
> )

(for some reason the series of Called from get printed out after the SUMMARY line)

Yes, this is a known flakiness issue that I haven’t had time to fix yet, but it’s minor.

I’m realizing that running these tests like this does not show much, because the location information is ignored by the test. What you can do is running ./ocamltest/ocamltest (relative path from the repo root) on one of them, for example:

./ocamltest/ocamltest -keep-test-dir-on-success testsuite/tests/tsan/array_elt.ml

it should print the commands used to build and run the executable. You should be able to copy-paste the one to run the executable yourself and see the backtrace.

I’m afraid it works for me

WARNING: ThreadSanitizer: data race (pid=2293850)
  Write of size 8 at 0x7f9bbb715b08 by thread T1 (mutexes: write M0):
    #0 camlDune__exe__Main.incr_391 /workspace_root/bin/main.ml:9 (main.exe+0x1fe178)
    #1 camlEio__Executor_pool.fun_814 /workspace_root/lib_eio/executor_pool.ml:31 (main.exe+0x272ce6)
    #2 camlEio__core__Fiber.fun_1446 /workspace_root/lib_eio/core/fiber.ml:21 (main.exe+0x295df6)
    #3 camlStdlib__Fun.protect_326 /home/olivier/.opam/5.2.1+options/.opam-switch/build/ocaml-variants.5.2.1+options/stdlib/fun.ml:34 (main.exe+0x32c77a)
    #4 caml_runstack ??:? (main.exe+0x3d2b43)
    #5 camlStdlib__Fun.protect_326 /home/olivier/.opam/5.2.1+options/.opam-switch/build/ocaml-variants.5.2.1+options/stdlib/fun.ml:34 (main.exe+0x32c77a)
    #6 camlEio_linux__Sched.run_1569 /workspace_root/lib_eio_linux/sched.ml:462 (main.exe+0x2166dc)
    #7 camlEio_linux__Sched.with_eventfd_1810 /workspace_root/lib_eio_linux/sched.ml:506 (main.exe+0x217ee4)
    #8 camlEio_linux__Sched.with_sched_inner_2870 /workspace_root/lib_eio_linux/sched.ml:543 (main.exe+0x21838f)
    #9 camlEio_linux.fun_4123 /workspace_root/lib_eio_linux/eio_linux.ml:286 (main.exe+0x225785)
    ...

Yeah I realized as well that the trace locs are masked; found the command for array_elt.ml and ran the test, the TSan output I get is similar

array_elt.opt(78953,0x200310840) malloc: nano zone abandoned due to inability to reserve vm space.
==================
WARNING: ThreadSanitizer: data race (pid=78953)
  Write of size 8 at 0x000106dffa70 by thread T1 (mutexes: write M0):
    #0 camlArray_elt$writer_280 <null>:82844336 (array_elt.opt:arm64+0x100003bf4)
    #1 camlStdlib__Domain$body_746 <null>:82844336 (array_elt.opt:arm64+0x10002b370)
    #2 caml_start_program <null>:82844336 (array_elt.opt:arm64+0x10007f414)
    #3 caml_callback_exn callback.c:206 (array_elt.opt:arm64+0x1000373b4)
    #4 caml_callback_res callback.c:321 (array_elt.opt:arm64+0x100037d04)
    #5 domain_thread_func domain.c:1263 (array_elt.opt:arm64+0x10003b9c0)

  Previous read of size 8 at 0x000106dffa70 by main thread (mutexes: write M1):
    #0 camlArray_elt$reader_334 <null>:82844336 (array_elt.opt:arm64+0x100003c80)
    #1 camlArray_elt$entry <null>:82844336 (array_elt.opt:arm64+0x100003ea4)
    #2 caml_program <null>:82844336 (array_elt.opt:arm64+0x100002de0)
    #3 caml_start_program <null>:82844336 (array_elt.opt:arm64+0x10007f414)
    #4 caml_startup_common startup_nat.c:127 (array_elt.opt:arm64+0x10007e15c)
    #5 caml_main startup_nat.c:146 (array_elt.opt:arm64+0x10007e234)
    #6 main main.c:37 (array_elt.opt:arm64+0x10005a204)

  As if synchronized via sleep:
    #0 nanosleep <null>:82844336 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2fa24)
    #1 wg_wait <null>:82844224 (array_elt.opt:arm64+0x10002be10)
    #2 camlArray_elt$writer_280 <null>:82844224 (array_elt.opt:arm64+0x100003bc8)
    #3 camlStdlib__Domain$body_746 <null>:82844224 (array_elt.opt:arm64+0x10002b370)
    #4 caml_start_program <null>:82844224 (array_elt.opt:arm64+0x10007f414)
    #5 caml_callback_exn callback.c:206 (array_elt.opt:arm64+0x1000373b4)
    #6 caml_callback_res callback.c:321 (array_elt.opt:arm64+0x100037d04)
    #7 domain_thread_func domain.c:1263 (array_elt.opt:arm64+0x10003b9c0)

  Mutex M0 (0x000106a001f8) created at:
    #0 pthread_mutex_init <null>:82844336 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3181c)
    #1 caml_plat_mutex_init platform.c:59 (array_elt.opt:arm64+0x10006ae34)
    #2 caml_init_domains domain.c:986 (array_elt.opt:arm64+0x10003abe0)
    #3 caml_init_gc gc_ctrl.c:358 (array_elt.opt:arm64+0x10004b49c)
    #4 caml_startup_common startup_nat.c:106 (array_elt.opt:arm64+0x10007e058)
    #5 caml_main startup_nat.c:146 (array_elt.opt:arm64+0x10007e234)
    #6 main main.c:37 (array_elt.opt:arm64+0x10005a204)

  Mutex M1 (0x000106a000b8) created at:
    #0 pthread_mutex_init <null>:82844336 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3181c)
    #1 caml_plat_mutex_init platform.c:59 (array_elt.opt:arm64+0x10006ae34)
    #2 caml_init_domains domain.c:986 (array_elt.opt:arm64+0x10003abe0)
    #3 caml_init_gc gc_ctrl.c:358 (array_elt.opt:arm64+0x10004b49c)
    #4 caml_startup_common startup_nat.c:106 (array_elt.opt:arm64+0x10007e058)
    #5 caml_main startup_nat.c:146 (array_elt.opt:arm64+0x10007e234)
    #6 main main.c:37 (array_elt.opt:arm64+0x10005a204)

  Thread T1 (tid=2906320, running) created by main thread at:
    #0 pthread_create <null>:82844224 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x309d8)
    #1 caml_domain_spawn domain.c:1316 (array_elt.opt:arm64+0x10003b628)
    #2 caml_c_call <null>:82844336 (array_elt.opt:arm64+0x10007f274)
    #3 camlStdlib__Domain$spawn_741 <null>:82844336 (array_elt.opt:arm64+0x10002b280)
    #4 camlArray_elt$entry <null>:82844336 (array_elt.opt:arm64+0x100003e98)
    #5 caml_program <null>:82844336 (array_elt.opt:arm64+0x100002de0)
    #6 caml_start_program <null>:82844336 (array_elt.opt:arm64+0x10007f414)
    #7 caml_startup_common startup_nat.c:127 (array_elt.opt:arm64+0x10007e15c)
    #8 caml_main startup_nat.c:146 (array_elt.opt:arm64+0x10007e234)
    #9 main main.c:37 (array_elt.opt:arm64+0x10005a204)

SUMMARY: ThreadSanitizer: data race (array_elt.opt:arm64+0x100003bf4) in camlArray_elt$writer_280+0x6c
==================
ThreadSanitizer: reported 1 warnings
[1]    78953 abort      ./array_elt.opt

It does look like a bug, sorry that you’re experiencing that. I’ve opened an issue on the compiler, feel free to complete if necessary.

You may want to try and play with some flags [1] such as external_symbolizer_path, or try to run utilities like addr2line on the hexadecimal code offsets directly.


  1. SanitizerCommonFlags · google/sanitizers Wiki · GitHub and ThreadSanitizerFlags · google/sanitizers Wiki · GitHub ↩︎

I wouldn’t be surprised if there are location reporting issues on macOS, though I haven’t noticed it myself. If you can reproduce it, please attach that to the issue opened.

I expect the missing source locations are caused by a lack of DWARF information on macOS. I have a writeup somewhere of exactly what is required, the short version is ocaml needs to write some DWARF 5 information to associate object files (and symbols) with source files and functions.