Fatal error: Fatal error during lock: Resource deadlock avoided

It is true that if you complete your forking before starting any threads your code is consistent with POSIX, so it might be a bug in OCaml (I do not know enough of the internals to know), but since you do fork before starting any threads why not also delay applying caml_startup () until after you have forked as I suggested. That should work fine in your case.

@cvine It is reasonable that hacks in the OCaml runtime should not interfere with sound usages of fork in a multithreaded settings. If there is enough interest, someone should open an issue about avoiding interference of caml_thread_reinitialize with normal usage of fork+execve.

@rwmjones I have not followed the entirety of your issue because the original error you mentioned was EDEADLK as if trying to acquire the domain lock twice from the same thread. But in what you describe now, it is possible that the problem I mentioned is already present merely by linking threads. To test this, you can re-try after removing the call to pthread_atfork in otherlibs/systhread/st_posix.h and see if it solves anything.

Because we need to run OCaml code before the fork. It’s probably best to look at what we’re really trying to do here, see the structure of nbdkit: nbdkit-plugin - how to write nbdkit plugins All of those load/config/… phase calls are implemented as OCaml callbacks.

I don’t think caml_thread_reinitialize was my point, although it is sometime since I looked at the code. My notes say that Unix.execve is itself unsafe in a multithreaded program because before calling C’s execve it applies caml_stat_alloc_noexc which can allocate memory using malloc. That is unsafe.

To keep running OCaml after fork, you bank on the fact that there is a global lock ensuring some consistency. Sure, if the best-effort done by the OCaml runtime does not work you are out of luck. This includes when C threads might be running concurrently, which is a frequent scenario given the purpose of systhreads. So the approach of forking a multithreaded program from OCaml is clearly not recommended. Even if Unix.execve did not call malloc, lots of things can happen at any polling location (e.g. garbage collection, calling finalisers, signal handlers…).

However as a solution you might sensibly want to run fork+execve from a C function without calling back into OCaml, and it is detrimental that the OCaml runtime gets in the way via its pthread_atfork callback.

1 Like

That is a fair point, and all the more reason now we have domains to document the fact the Unix.execv* is not thread safe.

Since the only legitimate use of fork in a program which forks after more than one thread is running is to set up and then exec, the best advice is to use Unix.create_process which since OCaml-4.12 avoids both polling points and allocation in the child process for the explicit purpose of making itself thread safe. Unix.create_process_env is safe since OCaml-4.12 if the underlying OS provides either execvpe or posix_spawn, otherwise not.