Fatal error: Fatal error during lock: Resource deadlock avoided

With OCaml 5 (not 4) my program deadlocks in caml_acquire_runtime_system (aka caml_leave_blocking_section). The example below demonstrates what I mean:

test.ml contains:

Printf.eprintf "hello from OCaml\n%!"

test-ocaml-5.c contains:

#include <stdio.h>
#include <stdlib.h>
#include <caml/misc.h>
#include <caml/callback.h>
#include <caml/threads.h>

int
main (int argc, char *argv[])
{
  fprintf (stderr, "starting up ...\n");
  caml_startup (argv);
  fprintf (stderr, "acquiring ...\n");
  caml_acquire_runtime_system ();
  fprintf (stderr, "acquired\n");
  // here is where I would be calling an OCaml callback
  // omitted for simplicity
  caml_release_runtime_system ();
  exit (0);
}

If I compile this without threads I get the deadlock (pthread_mutex_lock fails with EDEADLK). Note this works with OCaml 4:

$ ocamlopt -g test-ocaml-5.c test.ml -o test-ocaml-5
$ ./test-ocaml-5 
starting up ...
hello from OCaml
acquiring ...
Fatal error: Fatal error during lock: Resource deadlock avoided

Aborted (core dumped)

If I compile it with threads, instead I get the hang (both with OCaml 4 and 5):

$ ocamlopt -g -I +unix unix.cmxa -I +threads threads.cmxa test-ocaml-5.c test.ml -o test-ocaml-5
$ ./test-ocaml-5 
starting up ...
hello from OCaml
acquiring ...

Obviously I’m doing something very wrong here, but I just don’t know what!

Supplemental question.

If my main program uses C threads but not OCaml threads, do I need to use threads.cmxa at all? I never did with OCaml 4 and things have worked out fine (I also never called caml_c_thread_register as it’s quite inconvenient because the real program doesn’t offer any place to register when new threads are created). Does this change in OCaml 5?

Since you have called caml_startup, the lock is already acquired by the current thread. So, you should not try to acquire it again by calling caml_acquire_runtime_system, as it would deadlock. (As a rule of thumb, caml_acquire_runtime_system should always follow caml_release_runtime_system, rather than the converse.)

1 Like

I’ve always wondered whether something like the following would be a much stronger guarantee of correct user code, while hiding the need for caml_c_thread_register():

/* @file: caml/threads.h */

struct caml_ctx_to_call_ocaml;
struct caml_ctx_for_expensive_c;

/* @file: caml/threads.c */

struct caml_ctx_to_call_ocaml {
    /* if [!acquired], do:
         caml_c_thread_register()
         caml_acquire_runtime_system() */
    int acquired;
};
struct caml_ctx_for_expensive_c {
    /* if [!released], do:
       caml_release_runtime_system() */
    int released;
}

/* @file: usercode.c */

struct caml_ctx_to_call_ocaml *ctx1;
//  Use of &ctx and opaque ctx enforces calling correct order.
//  Implementation checks if lock already held.
caml_acquire_runtime_system_for_calling_ocaml (&ctx1);
fprintf (stderr, "acquired\n");
caml_release_runtime_system_for_calling_ocaml (ctx1);

//  reverse semantics
struct caml_ctx_for_expensive_c *ctx2;
caml_release_runtime_system_for_expensive_c(&ctx2);
fprintf (stderr, "released\n");
caml_acquire_runtime_system_for_expensive_c(ctx2);

I really don’t understand how threads are supposed to work in OCaml 5. In my main thread I’m now calling caml_startup and then calling into OCaml callbacks successfully.

However my program also creates (from C) some other threads.

The first time one of these threads calls caml_c_thread_register to register itself, it hangs right there:

#0  0x00007ffff788a049 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff73f73b0 <thread_table+112>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ffff73f73b0 <thread_table+112>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007ffff788a0cf in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff73f73b0 <thread_table+112>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007ffff788c9e9 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=<optimized out>, cond=0x7ffff73f7388 <thread_table+72>) at pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x7ffff73f7388 <thread_table+72>, mutex=<optimized out>) at pthread_cond_wait.c:618
#5  0x00007ffff7351b40 in st_masterlock_acquire () from tests/test-ocaml-plugin.so
#6  0x00007ffff7352b46 in caml_c_thread_register () from tests/test-ocaml-plugin.so
#7  0x00007ffff7fb9b5d in default_export_wrapper (readonly=0, is_tls=0) at /home/rjones/d/nbdkit/plugins/ocaml/plugin.c:321

where default_export_wrapper is a new C thread which calls caml_c_thread_register as its supposed to.

Looking at the code it appears that C threads are added to OCaml domain 0. Since my main program is already running code in domain 0 and never calls caml_release_runtime_system it’ll always block when trying to add the new C thread. Probably I can fix this by releasing the runtime system around callbacks in the main program …

1 Like

Indeed. caml_c_thread_register always defaults to registering the given thread in domain 0. This was a pragmatic backwards compatible choice. Until your example, we haven’t had the need for registering C threads with other domains. This needs a little bit of work as we really don’t have a user-programmable handle to domains in the C API.

The interaction of domains with systhreads is described in the OCaml manual.

I’m curious whether your suggested solution of releasing the runtime system around callbacks would work for you or would you prefer having a different API?

This patch series (see bottom of the page) is the solution I came up with, and it appears to work:

https://listman.redhat.com/archives/libguestfs/2023-June/thread.html#31879

The final code is upstream here:

I guess this works for us now unless we find issues later.

1 Like

There is an additional problem, but it only happens sometimes and only with OCaml < 5. That is that after fork (called from C, not OCaml), something in the OCaml runtime gets very confused and hangs when entering the thread.

I guess that caml_atfork_handler is not being run to reinitialize the thread state.

I’m also not sure why it works with OCaml 5 - might be just luck.

It might be useful to make an issue for this on the OCaml issue tracker so that we can investigate and fix any issues if we find them.

Do you mean this issue? [Libguestfs] nbdkit | Failed pipeline for master | 9d4b87e0

To be honest although I am able to reproduce the problem easily, I’m not at all sure that it is related to fork, that might merely be a coincidence.

Edit: Let me see if I can make a standalone reproducer first.

1 Like

The only thing you can do formally in the child process after forking in a multi-threaded program is to apply async-signal-safe functions and then exec. Could your issue be related to that?

I can reproduce this in a program without threads, but using fork. NB. This only reproduces with OCaml 4, not OCaml 5.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>
#include <caml/callback.h>
#include <caml/memory.h>
#include <caml/misc.h>
#include <caml/mlvalues.h>
#include <caml/threads.h>

static value f1_fn;
static value f2_fn;

int
main (int argc, char *argv[])
{
  int i;
  pid_t pid;

  fprintf (stderr, "startup & release runtime system ...\n");
  caml_startup (argv);
  caml_release_runtime_system ();

  fprintf (stderr, "call f1 a few times ...\n");
  for (i = 0; i < 8; ++i) {
    caml_acquire_runtime_system ();
    caml_callback (f1_fn, Val_unit);
    caml_release_runtime_system ();
  }

  pid = fork ();
  if (pid == -1) { perror ("fork"); exit (1); }
  if (pid == 0) {
    sleep (1);
    fprintf (stderr, "in child, call f2 a few times ...\n");
    for (i = 0; i < 8; ++i) {
      caml_acquire_runtime_system ();
      caml_callback (f2_fn, Val_unit);
      caml_release_runtime_system ();
    }
    _exit (0);
  }

  wait (NULL);

  exit (0);
}

value
forking_register_callback (value name, value fv)
{
  CAMLparam2 (name, fv);

  fprintf (stderr, "registering callback %s ...\n", String_val (name));

  if (strcmp (String_val (name), "f1") == 0) {
    f1_fn = fv;
    caml_register_generational_global_root (&f1_fn);
  }
  else if (strcmp (String_val (name), "f2") == 0) {
    f2_fn = fv;
    caml_register_generational_global_root (&f2_fn);
  }
  else
    abort ();

  CAMLreturn (Val_unit);
}
open Printf

let () = eprintf "in ocaml, starting up ...\n%!"

let f1 () = eprintf "in ocaml, function f1 ...\n%!"
let f2 () = eprintf "in ocaml, function f2 ...\n%!"

external register_callback : string -> (unit -> unit) -> unit
  = "forking_register_callback"

let () = register_callback "f1" f1 ; register_callback "f2" f2
$ ocamlopt -runtime-variant _pic -I +unix unix.cmxa -I +threads threads.cmxa forking.c forking_callbacks.ml -o forking
$ ./forking
...
in child, call f2 a few times ...
[hangs here]

Actually here’s an even simpler version:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <caml/callback.h>
#include <caml/misc.h>
#include <caml/threads.h>

int
main (int argc, char *argv[])
{
  pid_t pid;

  fprintf (stderr, "startup & release runtime system ...\n");
  caml_startup (argv);
  caml_release_runtime_system ();

  fprintf (stderr, "acquire and release runtime system ...\n");
  caml_acquire_runtime_system ();
  caml_release_runtime_system ();

  pid = fork ();
  if (pid == -1) { perror ("fork"); exit (1); }
  if (pid == 0) {
    sleep (1);
    fprintf (stderr, "in child, acquire and release runtime system ...\n");
    caml_acquire_runtime_system ();
    caml_release_runtime_system ();
    _exit (0);
  }

  wait (NULL);

  exit (0);
}
let () = Printf.eprintf "in ocaml, starting up ...\n%!"

Same ocamlopt command as above.

In your last example, if you don’t link in threads.cmxa it seems to work OK. I’ve not tested your other one.

I have to use threads.cmxa because the real program needs to call caml_c_thread_register.

In which case, since your program is multi-threaded, you cannot apply caml_acquire_runtime_system () and caml_release_runtime_system () in the child process once the program has launched another thread. Those functions are almost certainly not async-signal-safe so POSIX forbids it.

What you could do is do all your forking before any threads (other than the starting main thread) have begun and before caml_startup () has been applied, and then (if you want to link in threads.cmxa) apply caml_startup () separately in each process that executes ocaml code.

The only thing you can do formally in the child process after forking in a multi-threaded program is to apply async-signal-safe functions and then exec. Could your issue be related to that?

This is the crux of the issue as far as the last examples are concerned.

To add more context, OCaml 4 does some best-effort to try to be able to fork a multithreaded program anyway, by doing things that are not POSIX compliant. See Forks and multi-threading not well-supported · Issue #4577 · ocaml/ocaml · GitHub and caml_thread_reinitialize. In any case, this best-effort assumes that fork is called from OCaml.

For OCaml 5 this might work the same but will be limited to programs running on a single domain.

That is unfortunate because the Unix.execv* functions which one might expect to be called after an OCaml Unix.fork are not thread safe: they end up at caml_stat_alloc_noexc(), which in turn may call up malloc() which is definitely not async-signal-safe as a matter of generality (although glibc as an extension will allow a call to malloc() between fork() and exec, whereas musl won’t). Furthermore, Lwt_process, which was included to make forking safe under Lwt (which creates threads from underneath you), isn’t because it applies Unix.execve.

It’s a mess. Whether it is because people routinely don’t understand POSIX requirements for forking in a multi-threaded process, I don’t know. It wouldn’t be so bad if the Unix.execv* functions were documented as not usable in a multi-threaded program, but they aren’t. This is particularly tricky because exec is safe in C, so this confounds expectations.

I guess we need to talk about what the real program does, beause the above is only a distillation of that. The real program forks into the background, and does not create any threads before doing that (same as the example). As far as I’m aware this is quite safe, something that is common with normal programs, and does not contravene POSIX in any way.

The example above also doesn’t create threads, although it links with threads.cmxa because if this was the real program we’d be calling caml_c_thread_register after (not before) the fork.

So I believe this is all fine and there’s a bug in OCaml somewhere. (Luckily, I suppose, only in OCaml 4 so maybe we can just make a lot of this conditional on OCaml >= 5).