Coordinating (Java) systhreads / domains

cemerick · April 10, 2025, 12:07pm

I have integrated an OCaml shared library into a Java project, and have hit some rough waters with coordinating Java threads with OCaml’s domains.

To set the stage, the Java runtime is the primary executable here, and it is what instigates the OCaml runtime via JNI. Everything works as expected with a single thread, but I get the dreaded no domain lock held fatality when more or different threads try to call into the initialized OCaml runtime via the handful of callback functions I’ve registered.

I’ve put guards around the calls into OCaml using pairs of caml_acquire_runtime_system/caml_release_runtime_system and caml_c_thread_register/caml_c_thread_unregister, with different interleavings, but with no good effect (and often a deadlock, insofar as I’m not understanding the proper usage of the locking guards).

Some poking around at internals makes me think that all “external” threads (i.e. those not created by OCaml) are just grouped into domain 0. Ideally, I’d like to have each new Java thread calling into OCaml to be in its own domain (the Java side does a fine job of pooling threads and such, so matching a Java thread with a domain feels like a natural state of affairs), but it looks like thread/domain controls necessary for that aren’t available from the C API?

(I have read e.g. OCaml - Interfacing C with OCaml and other related docs and OCaml C sources, but the multithreading discussions seem to consistently focus on allowing for parallel execution of long-running/blocking C code called from OCaml, not the other way around.)

As a workaround, I’m thinking I could set up an async single-threaded message bus between the Java and OCaml worlds, treating them like disjoint remote systems. This will definitely work, and will probably get 90% of the parallel throughput I’d expect from “direct” synchronous calls. I wonder if there is any accepted/safe way to avoid such workarounds?

cemerick · April 10, 2025, 1:11pm

A prior thread confirms much of what I said above. @kayceesrk’s response basically answered my question:

(I guess this is a 2nd [or, Nth] example of where having a way to register C threads with other domains would be very useful! )

There’s a workaround linked to in that thread, but there’s also forking going on there, and it appears to be limited to using a single domain (perfectly reasonable, but less than what I’m aiming for here, i.e. pairing Java/systhreads 1:1 with OCaml domains).

kayceesrk · April 11, 2025, 2:18am

Would be useful to make a feature request on ocaml/ocaml so that it will be seen by the right people. Do mention a subset of these N examples so that the compiler devs know that the users want it. Thanks!

adrien · April 11, 2025, 4:02pm

I had an issue with pyml which I think is similar and went through several stages of progress but there was always a new issue. The last kind of issue I had was that ocaml would do a major collection and reclaim a python object and freeing it would require the python lock which was held in another domain. Unfortunately, that second domain would never released the lock as it was stopped by the major collection. That was the last straw for me to try to make things interface without resorting to tricks.

Generally speaking, I think you can’t have two worlds work together if they use locking and don’t provide a few primitives to manage the locks. For pyml (with ocaml the primary world), I’ve use a dedicated domain and ensured no GC major operation would be triggered for python values (large minor heap and frequent minor collections) but obviously that’s not always possible.

For your issue, I was going to propose what you’ve described in your last paragraph. ;p

cemerick · April 11, 2025, 8:19pm

Yeah, I decided from the start that I wouldn’t be trying to get the runtimes’ respective GCs to play nicely together. I’m just shipping protobuf strings back and forth in order to avoid that mess entirely.

It’s turned out to be fine, certainly no more of a problem/speedbump than the concession to using protobufs. I only have the JVM side done so far (so at least I can ~guarantee no more no domain lock held failures), and I haven’t done any benchmarking, but I think it’ll be — again — fine.

gasche · April 12, 2025, 5:20pm

I suppose that we could provide support for a foreign thread joining as a new domain, and/or to attach to a given existing domain. But how would user code actually make use of this capability? Naively I would expect that it is very hard to use this correctly.

(In particular, notice that domains behave badly when there are more domains than available CPU cores, which is probably not such a strong requirement for your Java thread pools, so it’s not clearly a good idea to map each Java worker to a distinct OCaml domain.)

Maybe an easier programming model would be to spawn N OCaml domains independently from the Java stuff, and have each Java thread interested in running OCaml code temporarily register into one of the domains and un-register when the OCaml computation is done.

borisd · April 13, 2025, 1:06am

I’d like to mention that I’ve run into the “no domain lock held” issue when developing the libdispatch bindings in a roughly similar setup. I’m outsourcing the interaction of the native threads with the OCaml runtime to Ctypes hoping that it will do the right thing:

module Dispatch_function =
  (val Foreign.dynamic_funptr ~thread_registration:true ~runtime_lock:true
         (ptr void @-> returning void))

Just adding this as a data point for the OCaml devs to consider the needs of the community.

gasche · May 30, 2025, 12:55pm

I don’t understand what the needs are that you have in mind.

My best guess from your description is that you tried to access the OCaml runtime from a C thread without owning an OCaml runtime lock – this is a programming mistake and it cannot work. So maybe the need is better documentation of how the FFI work (I agree that there is room for improvement here), and your example of Ctypes usage doing this goes in this same direction, thanks!

Or maybe you have another need in mind that would benefit from a change to the C-facing API, but then more explanations would help.

borisd · May 30, 2025, 10:09pm

In my case I was not writing C stubs, yet I was interacting with a C api that was calling my OCaml functions. So I couldn’t use caml_release_runtime_system and caml_acquire_runtime_system directly. Perhaps the functionality to acquire the runtime system could be added to a low-level OCaml library module, eg Obj, so that the OCaml code could recover from such condition without the application crashing. I have no idea if this would be feasible or what other implications it may have. But that’s an example of the needs of the community.

gasche · May 31, 2025, 5:02am

It is completely unsafe to run OCaml code without owning the runtime lock – so the solution to try to acquire the runtime lock from OCaml code will not work. If you want to pass a callback to a C API, you should probably pass as a C function that takes the runtime lock before calling the OCaml function you have in mind.

borisd · May 31, 2025, 1:25pm

Suppose I want to write:

let fun_ptr_exportable_to_c = Callback.with_runtime_lock (fun arg -> ...)

Couldn’t with_runtime_lock do exactly what you described – take the runtime lock and call my function?

I know I can write this as my external primitive, but if it was provided by the stdlib I would trust it more

gasche · May 31, 2025, 1:59pm

For me providing with_runtime_lock as an OCaml function does not make sense: all OCaml code must run with the runtime lock, in particular this function. Taking/releasing the runtime lock is an operation to be done from C. You control the C code that jumps back into the OCaml world (this is not something that is done by libdispatch itself), so you should ensure that the runtime lock is held before calling an OCaml function.

I see that Ctypes provides a wrapper to do this automagically and this seems to suit your needs better than doing this on the C side. This is fine, but, again, this isn’t something that can reasonably be done on the OCaml side.

Within the C API, I suppose we could expose functions caml_callback<n>_with_runtime_system_res that call an OCaml callback with value parameters, and are careful to temporarily acquire the runtime lock when it is not held. (Doing this conditionally lets user write code that works in both contexts.)

Topic		Replies	Views
OCaml 5.0 and C interface Community	6	921	November 25, 2022
Trying to understand `caml_acquire_runtime_system` when called from C threads Learning	7	240	May 26, 2025
Feature request: Support specifying which domain to join when registering C threads Ecosystem	7	283	May 30, 2025
What's the difference between threads and domains? Learning	1	167	July 9, 2025
Fatal error: Fatal error during lock: Resource deadlock avoided Learning ocaml5	25	1658	June 26, 2023

Coordinating (Java) systhreads / domains

Related topics