I was testing whether a program reports the termination status of a child correctly. So I thought I’d have the child process run this:
Unix.kill (Unix.getpid ()) Sys.sigsegv
But the child seemed unaffected and exited successfully with code 0.
It turns out the expected crash doesn’t occur with native code but occurs with bytecode:
$ cat crash.ml
Unix.kill (Unix.getpid ()) Sys.sigsegv;;
$ ocamlc -o crash.bytecode unix.cma crash.ml
$ ocamlopt -o crash.native unix.cmxa crash.ml
Segmentation fault (core dumped)
$ ./crash.native # <-- expected to die similarly
$ echo $?
The intent of the
print_endline was to trigger an interrupt allowing the signal to be delivered (I may not understand those things correctly). I also tried adding a
Unix.gettimeofday () but it changed nothing. Other signals I tried cause the program termination. Both sigterm and sigusr1 cause program termination in native code, as expected.
I’m on Linux. Is this some sort of unspecified system behavior, or is it a bug in ocaml, or something else?
The native-code runtime system catches SIGSEGV and (on some systems) SIGBUS in an attempt to detect stack overflows and recover from them. Better not mess with these signals. If you really need to kill a process reliably, use SIGKILL.
Since I am digging into that part of the runtime currently, I am wondering what is your use-case and also I think there is a bug there.
Note that this is about OS signal handlers, not OCaml signal handlers, so there is no need to poll to trigger signal handlers.
As explained by Xavier, native OCaml has its own signal handler for segv. It goes as follows: the segv signal is processed by OCaml’s handler which checks whether the faulting address corresponds to a stack overflow. When it does not, OCaml removes its own signal handler and returns to re-start where the segfault happened, expecting the segfault to happen again, now handled by the default handler that aborts the program.
In your case, the segv signal is ignored the first time, and no second segv is sent. The bug is that the execution continues, but OCaml is now in an inconsistent state (no stack overflow detection henceforth). It would be better to treat the segv fatally in all cases. To fix this, one can raise the signal again in the signal handler, or one could directly abort, or call caml_fatal_error (which will call the user-supplied error hook if any before aborting). Only the first two options are explicitly signal-safe.
I’m just being curious. The use case as I mentioned was a test in which I wanted to make a child process crash on purpose, but I didn’t want to make an actual illegal memory access. In the end it doesn’t matter, I can just raise another signal for testing purposes.
Thank you both for the explanations!