Hi!
I’m trying out ocaml multicore (4.12.0+domains
) as a backend for our compiler, which generates ocaml code. I have the problem that every 10:th bootstrap or so triggers a segfault in caml_shared_try_alloc
. This is the top of the backtrace from lldb:
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x14ff)
* frame #0: 0x00000001001f084a mii`caml_shared_try_alloc [inlined] pool_allocate(local=0x0000000101808800, sz=<unavailable>) at shared_heap.c:353:18 [opt]
frame #1: 0x00000001001f044e mii`caml_shared_try_alloc(local=0x0000000101808800, wosize=6, tag=3, pinned=0) at shared_heap.c:392 [opt]
frame #2: 0x00000001001d4161 mii`oldify_one [inlined] alloc_shared(wosize=6, tag=3) at minor_gc.c:151:15 [opt]
frame #3: 0x00000001001d4145 mii`oldify_one(st_v=<unavailable>, v=68719644640, p=0x000000010a01d280) at minor_gc.c:313 [opt]
frame #4: 0x00000001001d47b7 mii`oldify_mopup(st=0x00007ffeefbff100, do_ephemerons=0) at minor_gc.c:449:9 [opt]
frame #5: 0x00000001001d3f90 mii`caml_empty_minor_heap_promote(domain=0x00000001004537c0, participating_count=<unavailable>, participating=0x00000001004603f8, not_alone=1) at minor_gc.c:676:3 [opt]
frame #6: 0x00000001001d4aa6 mii`caml_stw_empty_minor_heap_no_major_slice(domain=0x00000001004537c0, unused=<unavailable>, participating_count=9, participating=0x00000001004603f8) at minor_gc.c:740:3 [opt]
frame #7: 0x00000001001d4bf3 mii`caml_stw_empty_minor_heap(domain=0x00000001004537c0, unused=<unavailable>, participating_count=<unavailable>, participating=<unavailable>) at minor_gc.c:768:3 [opt]
frame #8: 0x00000001001f6efa mii`caml_try_run_on_all_domains_with_spin_work(handler=(mii`caml_stw_empty_minor_heap at minor_gc.c:767), data=0x0000000000000000, leader_setup=<unavailable>, enter_spin_callback=<unavailable>, enter_spin_data=<unavailable>) at domain.c:895:3 [opt]
frame #9: 0x00000001001d4c7d mii`caml_empty_minor_heaps_once [inlined] caml_try_stw_empty_minor_heap_on_all_domains at minor_gc.c:799:10 [opt]
frame #10: 0x00000001001d4c51 mii`caml_empty_minor_heaps_once at minor_gc.c:817 [opt]
frame #11: 0x00000001001f8193 mii`caml_poll_gc_work at domain.c:942:5 [opt]
frame #12: 0x00000001001d0735 mii`caml_garbage_collection at signals_nat.c:110:5 [opt]
frame #13: 0x00000001001f91e3 mii`caml_call_gc + 231
I haven’t been able to reproduce the error in a small program. I’ve only seen it while bootstrapping our compiler, which is about 250,000 lines of generated OCaml code. I understand this makes it hard to reason about what the error could be, but I wanted to ask here anyway, in case I’m missing something obvious or if someone recognises a known error.
I discovered the error while implementing a parallel task pool. Reducing the program a little, I found that spawning some domains that wait on a channel while the compiler is running triggers the error. In essence, this is what the generated code does:
let chan = Domainslib.Chan.make_unbounded () in
let tids = List.map (fun _ -> Domain.spawn (fun _ -> Domainslib.Chan.recv chan)) (List.init 10 (fun _ -> ())) in
(* Do compiler stuff here ... *)
(* segfaults while compiler stuff is running, if ever *)
List.iter (fun _ -> Domainslib.Chan.send chan 1) tids;
List.map Domain.join tids
Note that everything else in the compiler is sequential, just these domains that are spawned. I also found that the error appears when the spawned domains do Thread.delay
instead of waiting on a channel. Furthermore, I didn’t observe the error (in about 100 runs) when computing something heavy inside the domain (such as fibonacci 48
) instead of waiting.
Thankful for thoughts you might have about this!