I have integrated an OCaml shared library into a Java project, and have hit some rough waters with coordinating Java threads with OCaml’s domains.
To set the stage, the Java runtime is the primary executable here, and it is what instigates the OCaml runtime via JNI. Everything works as expected with a single thread, but I get the dreaded no domain lock held fatality when more or different threads try to call into the initialized OCaml runtime via the handful of callback functions I’ve registered.
I’ve put guards around the calls into OCaml using pairs of caml_acquire_runtime_system/caml_release_runtime_system and caml_c_thread_register/caml_c_thread_unregister, with different interleavings, but with no good effect (and often a deadlock, insofar as I’m not understanding the proper usage of the locking guards).
Some poking around at internals makes me think that all “external” threads (i.e. those not created by OCaml) are just grouped into domain 0. Ideally, I’d like to have each new Java thread calling into OCaml to be in its own domain (the Java side does a fine job of pooling threads and such, so matching a Java thread with a domain feels like a natural state of affairs), but it looks like thread/domain controls necessary for that aren’t available from the C API?
(I have read e.g. OCaml - Interfacing C with OCaml and other related docs and OCaml C sources, but the multithreading discussions seem to consistently focus on allowing for parallel execution of long-running/blocking C code called from OCaml, not the other way around.)
As a workaround, I’m thinking I could set up an async single-threaded message bus between the Java and OCaml worlds, treating them like disjoint remote systems. This will definitely work, and will probably get 90% of the parallel throughput I’d expect from “direct” synchronous calls. I wonder if there is any accepted/safe way to avoid such workarounds?
(I guess this is a 2nd [or, Nth] example of where having a way to register C threads with other domains would be very useful! )
There’s a workaround linked to in that thread, but there’s also forking going on there, and it appears to be limited to using a single domain (perfectly reasonable, but less than what I’m aiming for here, i.e. pairing Java/systhreads 1:1 with OCaml domains).
Would be useful to make a feature request on ocaml/ocaml so that it will be seen by the right people. Do mention a subset of these N examples so that the compiler devs know that the users want it. Thanks!
I had an issue with pyml which I think is similar and went through several stages of progress but there was always a new issue. The last kind of issue I had was that ocaml would do a major collection and reclaim a python object and freeing it would require the python lock which was held in another domain. Unfortunately, that second domain would never released the lock as it was stopped by the major collection. That was the last straw for me to try to make things interface without resorting to tricks.
Generally speaking, I think you can’t have two worlds work together if they use locking and don’t provide a few primitives to manage the locks. For pyml (with ocaml the primary world), I’ve use a dedicated domain and ensured no GC major operation would be triggered for python values (large minor heap and frequent minor collections) but obviously that’s not always possible.
For your issue, I was going to propose what you’ve described in your last paragraph. ;p
Yeah, I decided from the start that I wouldn’t be trying to get the runtimes’ respective GCs to play nicely together. I’m just shipping protobuf strings back and forth in order to avoid that mess entirely.
It’s turned out to be fine, certainly no more of a problem/speedbump than the concession to using protobufs. I only have the JVM side done so far (so at least I can ~guarantee no more no domain lock held failures), and I haven’t done any benchmarking, but I think it’ll be — again — fine.
I suppose that we could provide support for a foreign thread joining as a new domain, and/or to attach to a given existing domain. But how would user code actually make use of this capability? Naively I would expect that it is very hard to use this correctly.
(In particular, notice that domains behave badly when there are more domains than available CPU cores, which is probably not such a strong requirement for your Java thread pools, so it’s not clearly a good idea to map each Java worker to a distinct OCaml domain.)
Maybe an easier programming model would be to spawn N OCaml domains independently from the Java stuff, and have each Java thread interested in running OCaml code temporarily register into one of the domains and un-register when the OCaml computation is done.
I’d like to mention that I’ve run into the “no domain lock held” issue when developing the libdispatch bindings in a roughly similar setup. I’m outsourcing the interaction of the native threads with the OCaml runtime to Ctypes hoping that it will do the right thing: