OS threads of external origin

I write Ocaml programs mainly on Haiku, which has a thread based native C++ GUI. Each graphic window runs in its own thread, which will start up in response to a “run” member function. These threads are then dispatched via GUI events, and they’ll execute in Ocaml via C++ virtual function callbacks. Ocaml 4.14.1, ocamlopt.

I have yet to arrive at a really reliable implementation for this, and I’m wondering if there’s some good system for handling the non-reentrant parts, that I missed. What I do, briefly:

  • first dispatch of the thread, caml_c_thread_register()
  • when entering Ocaml from a callback, caml_acquire_runtime_system(); (and caml_release_runtime_system() on the way out.)
  • additionally, my own semaphore lock and unlock to absolutely serialize Ocaml access from my UI threads (since it didn’t look to me like caml_acquire_runtime_system() does that.)
  • release and reacquire via the above system in a few “long running” external functions called from Ocaml. So if an Ocaml function for example accesses the filesystem via my external function, it will release the runtime and may trade off Ocaml execution with another thread.

Trace output shows what looks like orderly access to the runtime.

A very brief look at the systhreads library turns up some special code for dealing with roots etc., but that’s specific to systhreads and not inherently useful for threads of external origin, am I right?

Is there some debug tracing option that can shed some light on potential garbage collection issues?

1 Like

There is a debug runtime variant that I have found useful in a similar situation.

You can use ocamlopt -runtime-variant d to have it link libasmrund.a instead of libasmrun.a and that makes it do stricter checks on memory.

(I’m not sure what the dune stanza is or whether there is one for runtime variants at all)

This is the only documentation I could find for it is this:

Another thing to try is triggering collections in various places in your code where you suspect it would break your code. To trigger collections from C++ code, I think you can just call the corresponding C function that Gc.major or Gc.full_major calls (you may not have the header accessible, just declare it as extern and the linker will find it because it’s defined in the runtime).

1 Like

Thanks, that runtime variant turned up, at least, a pattern, perhaps a clue. When my application crashes, it’s always right after the first Growing heap to 1472k bytes>! prints out.

I may want to look at how allocation should work on my platform. I suppose it’s possible that this is to be expected - the initial allocation has been used and reused enough in the runtime, that a garbage collection error will be less reliably toxic than when it starts using fresh memory from the wild - but I have also seen some situations where for example subtle differences between our mmap() and behavior on other POSIX platforms makes a difference.

You can write a dune profile for it by hand. See this example in multicoretests:

(env
 (debug-runtime
  (link_flags :standard -runtime-variant=d)
  ...)
 (_
  ...)
)

With it, one can, e.g., dune build --profile=debug-runtime.
Come to think of it, --profile=d would be shorter and match -runtime-variant better… :sweat_smile:

1 Like

It might help to run your code with the naked pointer checker enabled. It is a configure-time option for ocamlopt, and can be specified when creating an opam switch, look for nnpchecker among the switch option packages.

I have debugged crashes that were strongly correlated to growing the OCaml heap that turned out to be caused by the OCaml heap growing into memory that was previously allocated by C code, where there were still pointers into it from (dead) OCaml values. So I would be worried if you code has anything like the following pattern:

  1. mallocs some memory,
  2. creates some OCaml values containing naked pointers into that memory,
  3. computes some pure OCaml value from them,
  4. concludes algorithmically that the malloced memory will never be used again and so,
  5. frees it,
  6. returns the pure OCaml value result and compute some more

If you do have such code, best to get rid of the naked pointers, they won’t work in OCaml 5 anyhow. If that is hard, in the meantime make sure you Gc.full_major before freeing the memory from the C side to make sure there are no dead OCaml objects that still point into the memory to be freed.

I hate to ask, but … what’s a “naked pointer?” I thought I’d seen a definition of this somewhere recently and concluded it doesn’t apply to me, but all I can find now is that it doesn’t have a black header, which isn’t really speaking my language.

Of course the interface has boatloads of memory allocated by malloc and whatever C++ uses, but it’s all in custom_operations structs with default functions. I’ll keep an eye out for any way some C++ trickery could be smuggling pointers without my knowlege.

A naked pointer refers to using an arbitrary C pointer directly as an OCaml value without storing it in a custom or abstract block (this used to be allowed before 5). I’m a bit lazy so I’m simply going to point you this bit of the manual. HTH.

Then you don’t have naked pointers.

I got around to giving this a try. Having no idea where a collection would break anything, I just started more or less at the top, and ran Gc.full_major () before calling into the application dispatch system that would start up with the threads and callbacks. And just that seems to have cured the problem altogether!

In case it could be relevant, while I’m not aware of any C pointers in Ocaml values, I do have a lot of Ocaml pointers in C values. A callback function closure, as type value is stored in the C++ instance that may call it via virtual function. I pass the location where it’s stored to caml_register_generational_global_root(), and there would be quite a few of those. The first batch of several functions were set up in that manner just before the Gc.full_major.

And just that seems to have cured the problem altogether!

ehh… maybe it did, but I feel like it’s more likely it just kicked the can on it until a certain size of heap or usage pattern.

A callback function closure, as type value is stored in the C++ instance that may call it via virtual function. I pass the location where it’s stored to caml_register_generational_global_root(), and there would be quite a few of those.

I have never tried this myself, and I may be misreading the manual, but have you considered using the registration mechanism described here? I find it weird that you have to keep track of when the callbacks move, or have to prevent them from getting collected. If it works like I think it does, you could just do this registration in an init function that you call from your (presumably) C++ main before all the threads pop up and then let the OCaml runtime take care of their lifetime / management.

Note that I didn’t mark as a solution. There must be a bug in there somewhere, and this just papered over it. But the fact that it could … there must be some insight to be gained there. Someday it may occur to me what it is.

I think maybe the reason I hated that global name registration is that it’s global. Say there’s a Button class with a callback, and I have a lot of Buttons with different callback functions? I guess I could set up some kind of unique ID to mangle the names with, and then the Button instance could grope around to see if it has any of its own callbacks, but it seems simpler to just poke the callbacks in as an array of value. If the caml_register_generational_global_root() isn’t causing any trouble.