Trying to understand `caml_acquire_runtime_system` when called from C threads

I am looking at multicore programming in OCaml 5.3. We have multiple threads in play, which are all spawned outside of OCaml.
Currently, they all belong to domain 0 because caml_c_thread_register is hardcoded to do this (we have a patch where the domain can be specified, but that is another topic).

I want to understand how I am supposed to guard OCaml against these many threads. The flow is currently, for all threads:

void *thread (void *arg) {
  caml_c_thread_register();
  caml_acquire_runtime_system();
  // call OCaml block
  caml_release_runtime_system();
}

void main(int argc, char **argv[]) {
  caml_startup(argv);
  caml_release_runtime_system();
  // spawn threads and wait
}

My understanding is that there is at any given point only one thread inside the “OCaml block”, but this does not seem to be the case when testing (see below for example, if needed).
Am I misunderstanding how caml_acquire_runtime_system works (I figured it would involve taking a mutex at some point, which is then released later in caml_release_runtime_system)?

Example:
test.c:

void *thread (void *arg)
{
  pthread_t tid = pthread_self();

  assert(caml_c_thread_register() == 1);
  printf("%lld %lld acquiring\n", time(NULL), tid);
  caml_acquire_runtime_system();
  printf("%lld %lld in\n", time(NULL), tid);
  sleep(10);
  printf("%lld OCaml returned %d\n", time(NULL), Int_val(caml_callback_exn(*caml_named_value("f"), Val_int(tid))));
  printf("%lld %lld releasing\n", time(NULL), tid);
  caml_release_runtime_system();
  caml_c_thread_unregister();
  return NULL;
}

int main (int argc, char *argv[])
{
  caml_startup(argv);
  caml_release_runtime_system();

  pthread_t threads[2];
  assert(!pthread_create(&threads[0], NULL, thread, NULL));
  assert(!pthread_create(&threads[1], NULL, thread, NULL));
  assert(!pthread_join(threads[0], NULL));
  assert(!pthread_join(threads[1], NULL));
}

test.ml:

let f d =
  let () = Format.printf "%f f called with %d\n%!" (Unix.gettimeofday ()) d in
  d + 1

let () = Callback.register "f" f

Gives the following output:

1746795929 68 acquiring
1746795929 64 acquiring
1746795929 68 in
1746795939 64 in
1746795939.441212 f called with 68
1746795949 OCaml returned 69
1746795949 68 releasing
1746795949.441531 f called with 64
1746795949 OCaml returned 65
1746795949 64 releasing

This seems to indicate that initially the lock works (there is a 10 second gap from when 64 tries to enter until something happens).
However, before 68 releases the lock, 64 manages to get in. This seems to be made possible by the caml_callback_exn call done by 68. Is this expected?

Thanks for any help :slight_smile:

Yes, the lock can be released during the execution of OCaml code. This may happen at so-called “poll points” or “safe points” which include function calls, back edges of loops and allocations. At such points a thread may release the domain lock and another one may acquire it.

The output you observe can be explained by hypothesizing that thread 68 released the lock when allocating a float to store the return value of Unix.gettimeofday (), for instance; then 64 was able to acquire it and call back into OCaml too; but then control switched back to 68; and so on.

1 Like

Is there some documentation about these safe points available somewhere? I was also wondering how can another thread run some OCaml code if main thread is doing the same, and built some external synchronization to let the main thread know that other threads want execution time. It runs with huge overhead and I would happily remove all that if OCaml compiler/runtime already do that for me …

Whenever a runtime-owning thread performs a blocking call (they are going to wait for a while, or do long-ish computations without needing the runtime system), it will release the runtime lock to let other threads from the same domain run OCaml runtime code. In addition, the Threads machinery sets up a timer so that each active thread is interrupted regularly even if they don’t make blocking calls. (This also applies when all the threads on a domain come from c_thread_register.)

This applies only to when the thread is running OCaml code, right?
Or can threads be interrupted if they still have the domain lock and are running “pure” C code (so not using the OCaml runtime at all)?

On the C side, my understanding is that thread scheduling is cooperative, in the sense that the thread that has the lock is not preempted, it has to release the lock itself – which can implicitly happen on common FFI operations. So it is possible for a C thread that has the lock to starve other threads on the same domain by never releasing it. This is not the intended or recommended interaction; C code running with the OCaml runtime lock should periodically check for pending runtime actions, for example by calling one of the caml_process_pending_actions* functions from caml/signals.h (see documentation comments).

P.S.: If C code keeps the lock but never processes OCaml runtime actions, it will also block other domains waiting for a stop-the-world event that all runtime-owning threads have to join.

Thank you, this was also my understanding from reading the documentation. I just wanted to confirm that this “interruption behavior” you mentioned was “OCaml-only”.

You are welcome. It’s a bit tricky to describe these in a way that is both simple and precise because the distinction (as you correctly mentioned it in your question) is not “running OCaml code” vs. “running C code”, it’s more like “calling the OCaml runtime system” (from OCaml or C).

(For context: both OCaml and C can call the runtime system. For OCaml code, the compiler produces code that enforces various guarantees about how the runtime is used, in particular the fact that OCaml threads periodically release control – so from OCaml the scheduling feels more preemptive than collaborative. For C code using the runtime there is more flexibility, including various ways it can be misused.)