One option would be to use a per-domain ‘session’ object, e.g. with OCaml library : Domain.DLS. Then OCaml’s runtime lock should protect against concurrent accesses from different threads that are part of the same domain, as long as your C binding doesn’t release the runtime lock.
However that’ll also prevent other threads (in the same domain) from running OCaml code, so it isn’t very nice.
Another option might be to use per-thread state, but there isn’t anything built-in to help with that.
However both of these could cause some confusion on the OCaml side: what if you create the session in one thread and want to use it in another (as long as you’re only using it from one thread at a time)?
If you consider accessing a session object from multiple threads a programming error, then just detecting it might suffice. A Mutex (either on OCaml or C side) might suffice for that if you use a try_lock: when the program behaves correctly the mutex would be uncontended and fast to acquire and release (in theory entirely in userspace), and the only time it might get contended is when you’d raise an exception on a trylock failure.
If the (tiny) potential overhead of that is a worry then this could even be done in a way that a user can turn off if you wrap it in an assert (compile with -noassert and both mutex+unlock is gone, only thing that remains is its allocation):
let session_check_mutex = Mutex.create ()
let foo () =
assert (Mutex.try_lock session_check_mutex);
(* ... calll external ... *)
assert(Mutex.unlock m; true)
Here is a program to measure the overhead of the 2 approaches (mutex vs atomic) using N domains each with its own uncontended mutex/atomic:
let perform _ = ()
let session_mutex_create () = Mutex.create ()
let session_mutex_trylock m = Mutex.try_lock m
let session_mutex_unlock m = Mutex.unlock m; true
let loop_mutex n =
let t = session_mutex_create () in
for i = 1 to n do
assert (session_mutex_trylock t);
Sys.opaque_identity (perform t);
assert (session_mutex_unlock t)
done
let session_atomic_create () = Atomic.make false
let session_atomic_trylock a = Atomic.compare_and_set a false true
let session_atomic_unlock a = Atomic.compare_and_set a true false
let loop_atomic n =
let t = session_atomic_create () in
for i = 1 to n do
assert (session_atomic_trylock t);
Sys.opaque_identity (perform t);
assert (session_atomic_unlock t)
done
let loop_noop n =
let t = () in
for i = 1 to n do
Sys.opaque_identity (perform t);
done
let run_domains f n =
let domains = Array.init (Domain.recommended_domain_count ())
(fun _ -> Domain.spawn @@ fun () -> f n) in
Array.iter Domain.join domains
let () =
Arg.parse
[ ("-a", Arg.Int (run_domains loop_atomic), "loop atomic")
; ("-m", Arg.Int (run_domains loop_mutex), "loop mutex")
; ("-n", Arg.Int (run_domains loop_noop), "loop noop")
] ignore @@ Printf.sprintf "%s [-a <n> | -m <n> | -n <n>]" Sys.executable_name
ocamlfind ocamlopt -package threads.posix -I +threads -linkpkg asserts.ml -o asserts -thread
ocamlfind ocamlopt -package threads.posix -I +threads -linkpkg asserts.ml -o noasserts -thread -noassert
hyperfine './asserts -m 100000000' './asserts -a 100000000' './asserts -n 100000000' './noasserts -m 100000000' './noasserts -a 100000000'
Measured on a 32-core ‘AMD Ryzen 9 7950X’ after having disabled turbo, idle states and using perf scheduler:
sudo sh -c 'echo 0 >/sys/devices/system/cpu/cpufreq/boost'
sudo cpupower idle-set -D 0
sudo cpupower frequency-set -g performance
Results (where noop is not doing any of this):
Benchmark 1: ./asserts -m 100000000
Time (mean ± σ): 1.622 s ± 0.034 s [User: 50.053 s, System: 0.016 s]
Range (min … max): 1.589 s … 1.705 s 10 runs
Benchmark 2: ./asserts -a 100000000
Time (mean ± σ): 789.4 ms ± 5.4 ms [User: 24494.0 ms, System: 10.8 ms]
Range (min … max): 781.8 ms … 797.6 ms 10 runs
Benchmark 3: ./asserts -n 100000000
Time (mean ± σ): 55.4 ms ± 7.6 ms [User: 1547.7 ms, System: 9.5 ms]
Range (min … max): 48.6 ms … 75.2 ms 42 runs
Benchmark 4: ./noasserts -m 100000000
Time (mean ± σ): 53.7 ms ± 5.3 ms [User: 1532.4 ms, System: 8.1 ms]
Range (min … max): 48.1 ms … 66.0 ms 52 runs
Benchmark 5: ./noasserts -a 100000000
Time (mean ± σ): 54.0 ms ± 5.9 ms [User: 1543.1 ms, System: 9.7 ms]
Range (min … max): 48.3 ms … 71.5 ms 44 runs
Summary
'./noasserts -m 100000000' ran
1.01 ± 0.15 times faster than './noasserts -a 100000000'
1.03 ± 0.18 times faster than './asserts -n 100000000'
14.71 ± 1.46 times faster than './asserts -a 100000000'
30.23 ± 3.06 times faster than './asserts -m 100000000'
As you can see with noasserts you get “same” results as with not having any of this checking code at all, and atomics run 2x as fast as mutexes (they both results in function calls and are not inlined).
OTOH the overhead is quite small: ~8ns / check with the atomics. Which might be worth accepting if it saves you hours of debugging a race condition or memory corruption.