Hi OCaml community,
I’ve been struggling to run tsan on Semgrep (a large legacy OCaml codebase) and I’d like to ask how I can start diagnosing a crash within the sanitizer that looks thus:
[00.55][INFO]: Executed as: /home/ntaylor/.local/share/virtualenvs/cli-sQcphUvE/lib/python3.10/site-packages/semgrep/bin/semgrep-core-proprietary -json -rules /home/ntaylor/.semgrep/semgrep_rules.json -use_eio -j 8 -targets /home/ntaylor/.semgrep/semgrep_targets.txt -timeout 5 -timeout_threshold 3 -max_memory 5000 -fast -symbol_analysis -pro_inter_file -timeout_for_interfile_analysis 10800 . -debug
[00.55][INFO]: Version: 1.124.0
[00.55][INFO]: Proxy was configured with { Proxy.http_proxy = None;
https_proxy = None;
all_proxy = None; no_proxy = None;
credentials = None }
[00.63][INFO]: Parsing rules in /home/ntaylor/.semgrep/semgrep_rules.json
Program received signal SIGSEGV, Segmentation fault.
-----------------------------------------------------------------------------------------------------------------------[regs]
RAX: 0x0000600000FFFFF8 RBX: 0x000055555A658D89 RBP: 0x00007B6000000C00 RSP: 0x00007FFFFFFFD028 o d I t s Z a P c
RDI: 0x000055555AE7D21D RSI: 0x00007FFFF5DECA00 RDX: 0x00000000000095B0 RCX: 0x200055555AE7D21D RIP: 0x00007FFFF749D880
R8 : 0x00007FFFF5DECA00 R9 : 0x00000FFFD78B16A0 R10: 0x00007FFFCBBFF000 R11: 0x00007FFFFFFFD090 R12: 0x00007FFFCBC1E9C8
R13: 0x00007B6000000CA0 R14: 0x00007FFFFFFFD040 R15: 0x00007FFFE2916EC8
CS: 0033 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 002B
-----------------------------------------------------------------------------------------------------------------------[code]
=> 0x7ffff749d880 <__tsan_func_entry(void*)+112>: mov QWORD PTR [rax],rdi
0x7ffff749d883 <__tsan_func_entry(void*)+115>: add rax,0x8
0x7ffff749d887 <__tsan_func_entry(void*)+119>: mov QWORD PTR [rsi+0xc8],rax
0x7ffff749d88e <__tsan_func_entry(void*)+126>: ret
0x7ffff749d88f <__tsan_func_entry(void*)+127>: nop
0x7ffff749d890 <__tsan_func_entry(void*)+128>: sub rsp,0x400
0x7ffff749d897 <__tsan_func_entry(void*)+135>: call 0x7ffff74a8caf <__tsan_trace_switch_thunk>
0x7ffff749d89c <__tsan_func_entry(void*)+140>: add rsp,0x400
-----------------------------------------------------------------------------------------------------------------------------
0x00007ffff749d880 in __tsan::FuncEntry (pc=0x55555ae7d21d, thr=0x7ffff5deca00) at ../../../../src/libsanitizer/tsan/tsan_rtl.cpp:1039
1039 ../../../../src/libsanitizer/tsan/tsan_rtl.cpp: No such file or directory.
gdb$ bt
#0 0x00007ffff749d880 in __tsan::FuncEntry (pc=0x55555ae7d21d, thr=0x7ffff5deca00) at ../../../../src/libsanitizer/tsan/tsan_rtl.cpp:1039
#1 __tsan_func_entry (pc=0x55555ae7d21d <caml_raise_exception+57>) at ../../../../src/libsanitizer/tsan/tsan_interface_inl.h:104
#2 0x000055555ae7923c in caml_tsan_exit_on_raise (pc=0x55555a658d89, sp=<optimized out>, trapsp=0x7fffcbc1e9c8 "") at runtime/tsan.c:216
#3 0x000055555ae7d21d in caml_raise_exception ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
gdb$
For sure, 0x0000600000FFFFF8
does not feel like an address I should attempt to dereference. It perhaps makes sense that gdb isn’t able to walk the stack if we are unwinding from an exception; but, also, the program counter value seems nonsensical so it’s possible the issue is more fundamental.
I admit I’m not sure how to even begin diagnosing this - the crash is at least deterministic (suggesting perhaps it isn’t owing to a race per se). I’m running on 5.3.0 with the tsan variant on linux, so nothing should be terribly nonstandard here. If you were me, what would your first step be?
Thanks,
Nathan