We have issues with interruption of our parallel program in OCaml 5.2 (+flambda).
Our programs runs many tasks via Domainslib.Task.parallel_for
and we want to interrupt and print currently found data via Ctrl-C.
I can’t find any suitable API for interruption in Domainslib, so I added old school
Sys.(set_signal sigint) (Signal_handle (fun _ -> ....))
. In the handler we don’t do anything about Domainslib, we handle already produced data by OCaml code and some Z3 calls. Sometimes it interrupts OK, but sometimes doesn’t:
- Sometimes a process hangs and don’t print stuff should be printed in handler. We tried to attach via GDB, a backtrace is in the end of the post.
- Sometimes we get crashes like
IOT instruction (core dumped)
- Sometimes libc crashes with
double free or corruption (!prev)
- Sometimes
malloc(): unaligned tcache chunk detected
We tried to change signal fro SIGINT to SIGUSR1 and it looks like the latter works (i.e. we didn’t observe any crashes).
Questions:
- How to properly interrupt tasks of
Domainslib
? - Are we allowed to make external C calls in a signal handler?
- Any idea why SIGUSR1 may work better?
GDB backtrace
#0 0x0000597213f035ef in frame_return_to_C (d=<optimized out>) at runtime/caml/frame_descriptors.h:80
#1 scan_stack_frames (fflags=(unknown: 0x90d42c00), gc_regs=<optimized out>, stack=0x597217064960, fdata=<optimized out>,
f=<optimized out>) at runtime/fiber.c:255
#2 caml_scan_stack (f=f@entry=0x597213f13980 <oldify_one>, fflags=fflags@entry=(unknown: 0x90d42c00),
fdata=fdata@entry=0x7ffe0ada17d0, stack=0x597217064960, stack@entry=0x597214d6bb40, gc_regs=0x597214d86980,
gc_regs@entry=0x74682da68010) at runtime/fiber.c:285
#3 0x0000597213f16e25 in caml_do_local_roots (f=f@entry=0x597213f13980 <oldify_one>, fflags=(unknown: 0x90d42c00),
fflags@entry=SCANNING_ONLY_YOUNG_VALUES, fdata=fdata@entry=0x7ffe0ada17d0, local_roots=<optimized out>,
current_stack=0x597214d6bb40, v_gc_regs=0x74682da68010) at runtime/roots.c:69
#4 0x0000597213f1451c in caml_empty_minor_heap_promote (domain=domain@entry=0x597214d6bb60,
participating_count=participating_count@entry=5, participating=participating@entry=0x59721426b4e0 <stw_request+64>)
at runtime/minor_gc.c:609
#5 0x0000597213f14876 in caml_stw_empty_minor_heap_no_major_slice (domain=0x597214d6bb60, participating_count=5,
participating=0x59721426b4e0 <stw_request+64>, unused=<optimized out>) at runtime/minor_gc.c:746
#6 0x0000597213eff580 in stw_handler (domain=0x597214d6bb60) at runtime/domain.c:1385
#7 handle_incoming (s=<optimized out>) at runtime/domain.c:340
#8 0x0000597213effdc0 in caml_handle_incoming_interrupts () at runtime/domain.c:353
#9 caml_handle_gc_interrupt () at runtime/domain.c:1774
#10 0x0000597213f1a58b in caml_do_pending_actions_exn () at runtime/signals.c:341
#11 caml_process_pending_actions_with_root_exn (root=<optimized out>, root@entry=1) at runtime/signals.c:386
#12 0x0000597213f1a632 in caml_process_pending_actions_with_root (root=1) at runtime/signals.c:395
#13 caml_process_pending_actions () at runtime/signals.c:406
#14 <signal handler called>
#15 0x0000597213e2d6e5 in camlStdlib__Set.mem_530 () at set.ml:251