OCAml custom signal number

When doing system programming with OCaml it is sometime annoying that the signal numbers used by the Sys and Unix modules does not correspond to the usual signal numbers.

I wonder what’s the reason for this.
Also, would it make sense to make something like the caml_convert_signal_number function available in the Unix module ?
WDYT ?

The exact signal numbers used by the underlying operating system are not portable and vary across (e.g.) Linux and OpenBSD, and also within the same OS by CPU architecture. What are you trying to achieve with the underlying signal number?

Regardless of them being portable or not across architecture, it would still be useful to have them portable from one program to another on a given architecture. Imagine for instance you want to communicate with another process running on the machine but not written in OCaml, via some agreed signal, for instance.

In my case I just want to log why some process have exited. Logging the signal received from Unix.wait currently logs garbage because the codes used by OCaml are not the same used by the rest of the system, despite being integers and having the same name, yet the code differs, so I have to translate, which is error prone because then my translation code will break whenever the stdlib evolves.

File names are not portable across architectures as well, yet the Unix module takes them unmodified as far as I know, so they can reliably be shared with other programs, for instance :slight_smile:

The translation can definitely be made to work – but we need to be careful with issues such as cross-compilation as with any system-dependent variable. My response was addressing your querying the reason why it’s like this.

It’s convention to refer to signals by their name (e.g. SIGABRT in C or Sys.sigabrt in OCaml) and not by their direct value. All of the Unix and Sys module functions in OCaml should translate from the OS value and only ever expose the OCaml-signal-integer number, so in your Unix.waitpid case the return value should match one of the signals in the Sys.sig* values.

Are you seeing something different? caml_rev_convert_signal_number and caml_convert_signal_number are the two functions in the runtime which perform this translation, and as far as I can see the allocation of W_SIGNALED values does perform this conversion in the return of Unix.waitpid.

1 Like

I haven’t though about cross-compilation. This indeed could explain the current design.

Regarding wait (and friends) translating from OS to OCaml number: yes indeed that’s what they do and what I observe. Let me rephrase my problem: My OCaml program runs some other programs, then wait for them and log their exit status. In case the execed processes died by a signal, my OCaml program logs this outcome with the signal number. This log is later analysed by another program not written in OCaml (or a human) and the signal number do not make sense. So I have to translate them, in my OCaml program, from the OCaml custom numbers to the system numbers. Does that make more sense ?

That does make perfect sense. If you want the logs to be persistent and reliable, then logging the signal name is the only portable way to do this under POSIX-style systems. This will make your logs robust to (e.g.) running under ARM or x86 architectures.

I haven’t though about cross-compilation. This indeed could explain the current design.

Actually no, it doesn’t, because when cross compiling the signal number gotten from the headers from the remote OS headers will be as good as those used in runtime/signals.c by the caml_[rev_]convert_signal_number functions.

This will make your logs robust to (e.g.) running under ARM or x86 architectures.

My logs are already robust regardless of the architecture (notwithstanding that I have to translate from OCaml custom integers to local ones), as long as it is assumed that the signal numbers logged are those corresponding to the local system (which is a very natural assumption).

Using names would actually be less robust in this use case, because other programs reading these logs (not under my control) expect numbers and, even if they could also understand names, there are no standardized names for signals, even considering only the local architecture.

Using names also suffer from the possible introduction of errors when upgrading stdlib (if the Sys module add new names and new conversions, then my program will miss those without a compilation error since those are not defined in a sum type (why?)).

Using names would indeed be more reliable in case those logs were copied into another arch and/or read by humans, but this is not my use case.