Test caml_state and conditionally caml_acquire_runtime_system - good or bad?

rwmjones · June 27, 2023, 11:03am

I’m adding OCaml 5 support to an existing project. The project is just some OCaml bindings and therefore fairly easy to deal with, but it also has callbacks. In other words:

OCaml -> C binding -> callback wrapper -> OCaml

Because most of the C code is long running we release the runtime lock around all C calls. So when we get to a callback wrapper we reacquire the lock around the OCaml code. This part all works fine.

However there is a C atexit function which does some cleanup on process exit. Unfortunately during the exit path, callbacks can also be invoked. The path is:

atexit (runtime lock held)-> callback wrapper -> OCaml

In the callback wrapper we’re attempting to acquire the runtime lock again, resulting in the infamous error:

Fatal error: Fatal error during lock: Resource deadlock avoided

A way to fix this (for OCaml 5 only) is to change the callback wrapper to do:

bool not_acquired = (caml_state == NULL);
if (not_acquired) caml_acquire_runtime_system ();
caml_callback (...)
if (not_acquired) caml_acquire_runtime_system ();

This works, but the question is, is it a good idea? Is there a better way to deal with this case?

nojb · June 27, 2023, 1:08pm

I don’t know the answer to your question, but am I right that this used to work in OCaml 4.x? (That is, it was possible to acquire the runtime lock more than once.)

If yes, shouldn’t this be considered a bug in OCaml 5, given that one of its guiding principles is that FFI code that used to work in OCaml 4.x should continue to work in OCaml 5?

Cheers,
Nicolas

octachron · June 27, 2023, 1:28pm

Acquiring twice a lock was an undefined behavior for POSIX threads in OCaml 4. OCaml 5 merely moved to erroring mode for mutex.

kayceesrk · June 27, 2023, 1:29pm

Is it documented that it is safe to acquire the runtime system lock more than once in OCaml 4.x?

nojb · June 27, 2023, 1:46pm

I don’t know of any documentation regarding this point, but I remember seeing something very similar to what is reported by @rwmjones in production code.

Cheers,
Nicolas

gadmm · June 27, 2023, 2:10pm

I have implemented Caml_state_opt for this purpose in OCaml 5.0, due to a similar problem. Per the manual:

The macro Caml_state evaluates to the domain state variable, and checks in debug mode that the domain lock is held. Such a check is also placed in normal mode at key entry points of the C API; this is why calling some of the runtime functions and macros without correctly owning the domain lock can result in a fatal error: no domain lock held. The variant Caml_state_opt does not perform any check but evaluates to NULL when the domain lock is not held. This lets you determine whether a thread belonging to a domain currently holds its domain lock, for various purposes.

For OCaml 4 I use runtime hooks to implement a similar check: see boxroot/ocaml_hooks.c · main · ocaml-rust / ocaml-boxroot · GitLab.

gadmm · June 27, 2023, 2:14pm

It was definitely unsafe, and was in fact part of the motivations to let caml_state be NULL when the domain lock is not held and to add Caml_state_opt to let users check for this.

gadmm · June 27, 2023, 2:21pm

Lastly,

bool not_acquired = (caml_state == NULL);

works but only on platforms where caml_state can be accessed directly (in which case it is equal to Caml_state_opt). Prefer Caml_state_opt for portability.

rwmjones · June 27, 2023, 3:30pm

Yes, it “worked” in OCaml 4 but as several have pointed out the code was likely wrong.

I will try out Caml_state_opt now.

rwmjones · June 27, 2023, 3:35pm

Yes, that works great. Upstream fix: ocaml: Use Caml_state_opt in preference to caml_state · libguestfs/libguestfs@cade0b1 · GitHub

Topic		Replies	Views
Fatal error: Fatal error during lock: Resource deadlock avoided Learning ocaml5	25	1643	June 26, 2023
Trying to understand `caml_acquire_runtime_system` when called from C threads Learning	7	232	May 26, 2025
OCaml 5.0 and C interface Community	6	915	November 25, 2022
Determining if the runtime system is currently acquired from C Learning ffi	0	646	December 7, 2020
Dealing with asynchronous callbacks in ctypes Learning	3	1040	April 11, 2022

Test caml_state and conditionally caml_acquire_runtime_system - good or bad?

Related topics