How to interrupt tasks of DomainsLib?

We have issues with interruption of our parallel program in OCaml 5.2 (+flambda).
Our programs runs many tasks via Domainslib.Task.parallel_for and we want to interrupt and print currently found data via Ctrl-C.

I can’t find any suitable API for interruption in Domainslib, so I added old school
Sys.(set_signal sigint) (Signal_handle (fun _ -> ....)). In the handler we don’t do anything about Domainslib, we handle already produced data by OCaml code and some Z3 calls. Sometimes it interrupts OK, but sometimes doesn’t:

  • Sometimes a process hangs and don’t print stuff should be printed in handler. We tried to attach via GDB, a backtrace is in the end of the post.
  • Sometimes we get crashes like IOT instruction (core dumped)
  • Sometimes libc crashes with double free or corruption (!prev)
  • Sometimes malloc(): unaligned tcache chunk detected
    We tried to change signal fro SIGINT to SIGUSR1 and it looks like the latter works (i.e. we didn’t observe any crashes).

Questions:

  1. How to properly interrupt tasks of Domainslib?
  2. Are we allowed to make external C calls in a signal handler?
  3. Any idea why SIGUSR1 may work better?
GDB backtrace
#0  0x0000597213f035ef in frame_return_to_C (d=<optimized out>) at runtime/caml/frame_descriptors.h:80
#1  scan_stack_frames (fflags=(unknown: 0x90d42c00), gc_regs=<optimized out>, stack=0x597217064960, fdata=<optimized out>, 
    f=<optimized out>) at runtime/fiber.c:255
#2  caml_scan_stack (f=f@entry=0x597213f13980 <oldify_one>, fflags=fflags@entry=(unknown: 0x90d42c00), 
    fdata=fdata@entry=0x7ffe0ada17d0, stack=0x597217064960, stack@entry=0x597214d6bb40, gc_regs=0x597214d86980, 
    gc_regs@entry=0x74682da68010) at runtime/fiber.c:285
#3  0x0000597213f16e25 in caml_do_local_roots (f=f@entry=0x597213f13980 <oldify_one>, fflags=(unknown: 0x90d42c00), 
    fflags@entry=SCANNING_ONLY_YOUNG_VALUES, fdata=fdata@entry=0x7ffe0ada17d0, local_roots=<optimized out>, 
    current_stack=0x597214d6bb40, v_gc_regs=0x74682da68010) at runtime/roots.c:69
#4  0x0000597213f1451c in caml_empty_minor_heap_promote (domain=domain@entry=0x597214d6bb60, 
    participating_count=participating_count@entry=5, participating=participating@entry=0x59721426b4e0 <stw_request+64>)
    at runtime/minor_gc.c:609
#5  0x0000597213f14876 in caml_stw_empty_minor_heap_no_major_slice (domain=0x597214d6bb60, participating_count=5, 
    participating=0x59721426b4e0 <stw_request+64>, unused=<optimized out>) at runtime/minor_gc.c:746
#6  0x0000597213eff580 in stw_handler (domain=0x597214d6bb60) at runtime/domain.c:1385
#7  handle_incoming (s=<optimized out>) at runtime/domain.c:340
#8  0x0000597213effdc0 in caml_handle_incoming_interrupts () at runtime/domain.c:353
#9  caml_handle_gc_interrupt () at runtime/domain.c:1774
#10 0x0000597213f1a58b in caml_do_pending_actions_exn () at runtime/signals.c:341
#11 caml_process_pending_actions_with_root_exn (root=<optimized out>, root@entry=1) at runtime/signals.c:386
#12 0x0000597213f1a632 in caml_process_pending_actions_with_root (root=1) at runtime/signals.c:395
#13 caml_process_pending_actions () at runtime/signals.c:406
#14 <signal handler called>
#15 0x0000597213e2d6e5 in camlStdlib__Set.mem_530 () at set.ml:251

I could advise you to use Miou, which allows you to use a pool of domains and run tasks in parallel with Miou.parallel. Miou also implements task cancellation, so using Miou.call / Miou.cancel should satisfy you. It is also possible, as explained in this short tutorial, to manage signals such as Ctrl-C.

1 Like