EINTR and SA_RESTART support in the OCaml runtime

edwin · September 4, 2024, 1:34pm

A signal can be handled in 2 ways after its handler has finished:

return EINTR from the system call that got interrupted. This is the default behaviour
automatically restart the system call, if the handler was registered using SA_RESTART.

I don’t see how the latter would be achievable in OCaml: sigaction is always called with a hardcoded 0 flag (unless I make my own C binding to sigaction).
(And all signal handlers must be registered this way)

And the former seems brittle: you need to wrap all calls to the Unix module with EINTR handlers (C applications have the same problem, they are probably not very robust against signals…),
and all libraries that could call the Unix module would need such wrappers.
But I don’t see how I could guarantee at compile time that this has happened.

I think it’d make sense to have SA_RESTART support in the runtime, but meanwhile is there an OCaml library that provides this already perhaps? I can’t be the first one who ran into this problem.

Background:

I tried using posix virtual timers (Unix.set_itimer) to decrease the 50ms OCaml’s timeslice, which sends a signal periodically where I can run Thread.yield. That part works, and decreases latency visibly.
However it then revealed that the application is not signal-safe at all (even though it has signal handlers for logrotate, etc.): it just raises an error in the middle of whatever system call it happens to be, and doesn’t actually handle EINTR in most places (there are very few places where it does).

xavierleroy · September 5, 2024, 1:03pm

The problem with SA_RESTART signal handing in POSIX is that you often have to run the code that reacts to the signal within the POSIX signal handler. (Just setting a flag in the handler and testing it when the syscall returns is not enough, as it may take arbitrarily long before the syscall returns.) And there’s very little that you can do safely in a POSIX signal handler.

Early versions of OCaml tried to run OCaml signal handlers from POSIX signal handlers in some cases, but this did not end well. The current EINTR-based signal handling in the OCaml runtime system is much more robust.

The stdlib I/O functions (from modules In_channel and Out_channel) catch EINTR and restart after signal handling, so you get the “restart” behavior you expect.

With the exception of Unix.sleep, the functions from the Unix module don’t try to restart after EINTR, because they are intended as direct wrappers around POSIX system calls. Every time we try to make Unix functions more clever than their POSIX counterpart, someone complains that they want the POSIX behavior like in Stevens’ books, and we end up with Unix.single_write (POSIX write syscall) in addition to Unix.write (the more usable version).

My advice is to not use signals if at all possible, and if you must use signals, get ready to catch EINTR exceptions around every call to a function from the Unix module.

edwin · September 5, 2024, 1:49pm

The problem with SA_RESTART signal handing in POSIX is that you often have to run the code that reacts to the signal within the POSIX signal handler. (Just setting a flag in the handler and testing it when the syscall returns is not enough, as it may take arbitrarily long before the syscall returns.) And there’s very little that you can do safely in a POSIX signal handler.

Early versions of OCaml tried to run OCaml signal handlers from POSIX signal handlers in some cases, but this did not end well. The current EINTR-based signal handling in the OCaml runtime system is much more robust.

Thanks for explaining the tradeoffs.

The stdlib I/O functions (from modules In_channel and Out_channel) catch EINTR and restart after signal handling, so you get the “restart” behavior you expect.

With the exception of Unix.sleep, the functions from the Unix module don’t try to restart after EINTR, because they are intended as direct wrappers around POSIX system calls. Every time we try to make Unix functions more clever than their POSIX counterpart, someone complains that they want the POSIX behavior like in Stevens’ books, and we end up with Unix.single_write (POSIX write syscall) in addition to Unix.write (the more usable version).

Having both is useful, but you are right that this approach doesn’t scale, we don’t want to recursively duplicate most functions that use these.

My advice is to not use signals if at all possible, and if you must use signals, get ready to catch EINTR exceptions around every call to a function from the Unix module.

Thanks, I’ll try to use signal masks to direct the signal to a C thread and then do something similar
to what the OCaml tick thread does then (caml_record_signal on OCaml 4 + atomic store/interrupt on OCaml 5), IIUC these do not deliver a real signal to the OCaml just set a flag that is checked periodically.

(the advantage with using the virtual timer is that it would only interrupt your process when it used the CPU, and won’t interrupt you when you are entirely idle, so I should be able to use much shorter intervals without increasing background idle CPU usage).

Or perhaps I’ll try something else that doesn’t rely on signals at all (e.g. use the 50ms interval but check rusage and if we used a lot of CPU recently then switch to a smaller interval, and switch back when idle again), or something more clever that measures latency of e.g. Unix read wake-ups and uses a feedback control loop to adjust the timer interval based on it.

I’ll do some experiments and report back on whether that works

Best regards,
–Edwin

gadmm · September 7, 2024, 12:52pm

One can use the number of allocated words done as a proxy for the amount of computation done. So you could call yield inside a memprof callback that tracks how many words are allocated. A low sampling rate (e.g. 10⁻⁴) should give a good approximation of the work done without affecting performance.

gadmm · September 7, 2024, 12:55pm

By the way, I’ve been asked if memprof-limits could be used to suspend/resume tasks instead of cancelling them, so maybe there is a commonly-shared concern here.

Topic		Replies	Views
Signal handling checkpoints and Mutex.lock Learning	20	573	June 24, 2024
Signal management in Sys (Sys.set_signal) Learning	7	2224	February 25, 2018
Signal delivery in OCaml 4.06 vs. OCaml 4.07 Learning	2	768	January 28, 2019
Fatal error: Fatal error during lock: Resource deadlock avoided Learning ocaml5	25	1661	June 26, 2023
Computation with time constraint Learning	26	2854	December 16, 2020

EINTR and SA_RESTART support in the OCaml runtime

Related topics