Trying to understand `caml_acquire_runtime_system` when called from C threads

I am looking at multicore programming in OCaml 5.3. We have multiple threads in play, which are all spawned outside of OCaml.
Currently, they all belong to domain 0 because caml_c_thread_register is hardcoded to do this (we have a patch where the domain can be specified, but that is another topic).

I want to understand how I am supposed to guard OCaml against these many threads. The flow is currently, for all threads:

void *thread (void *arg) {
  caml_c_thread_register();
  caml_acquire_runtime_system();
  // call OCaml block
  caml_release_runtime_system();
}

void main(int argc, char **argv[]) {
  caml_startup(argv);
  caml_release_runtime_system();
  // spawn threads and wait
}

My understanding is that there is at any given point only one thread inside the “OCaml block”, but this does not seem to be the case when testing (see below for example, if needed).
Am I misunderstanding how caml_acquire_runtime_system works (I figured it would involve taking a mutex at some point, which is then released later in caml_release_runtime_system)?

Example:
test.c:

void *thread (void *arg)
{
  pthread_t tid = pthread_self();

  assert(caml_c_thread_register() == 1);
  printf("%lld %lld acquiring\n", time(NULL), tid);
  caml_acquire_runtime_system();
  printf("%lld %lld in\n", time(NULL), tid);
  sleep(10);
  printf("%lld OCaml returned %d\n", time(NULL), Int_val(caml_callback_exn(*caml_named_value("f"), Val_int(tid))));
  printf("%lld %lld releasing\n", time(NULL), tid);
  caml_release_runtime_system();
  caml_c_thread_unregister();
  return NULL;
}

int main (int argc, char *argv[])
{
  caml_startup(argv);
  caml_release_runtime_system();

  pthread_t threads[2];
  assert(!pthread_create(&threads[0], NULL, thread, NULL));
  assert(!pthread_create(&threads[1], NULL, thread, NULL));
  assert(!pthread_join(threads[0], NULL));
  assert(!pthread_join(threads[1], NULL));
}

test.ml:

let f d =
  let () = Format.printf "%f f called with %d\n%!" (Unix.gettimeofday ()) d in
  d + 1

let () = Callback.register "f" f

Gives the following output:

1746795929 68 acquiring
1746795929 64 acquiring
1746795929 68 in
1746795939 64 in
1746795939.441212 f called with 68
1746795949 OCaml returned 69
1746795949 68 releasing
1746795949.441531 f called with 64
1746795949 OCaml returned 65
1746795949 64 releasing

This seems to indicate that initially the lock works (there is a 10 second gap from when 64 tries to enter until something happens).
However, before 68 releases the lock, 64 manages to get in. This seems to be made possible by the caml_callback_exn call done by 68. Is this expected?

Thanks for any help :slight_smile:

Yes, the lock can be released during the execution of OCaml code. This may happen at so-called “poll points” or “safe points” which include function calls, back edges of loops and allocations. At such points a thread may release the domain lock and another one may acquire it.

The output you observe can be explained by hypothesizing that thread 68 released the lock when allocating a float to store the return value of Unix.gettimeofday (), for instance; then 64 was able to acquire it and call back into OCaml too; but then control switched back to 68; and so on.

1 Like