Does Windows OCaml (5.3) `Unix.execvp` not properly pass along the exit-code?

In the long night of porting Camlp5 to Windows, I’ve encountered a couple of pretty weird and deep bugs. The first, I have a pretty clean repro for, though it needs to be cleaned-up a little more to submit a bugreport. So before I do the extra work, I thought I’d ask:

Suppose

(1) a shell

(2) invokes program#1

(3) which does Unix.execvp to program #2.

(4) which exits with a nonzero exit-code

Will the shell in #1 receive that exit-code ? Instead of a shell in #1, it could be “make” – in which case getting that exit-code is actually crucial.

B/c I have a clean repro where the exit-code is -not- returned.

I also have a version of program #1 that uses Unix.system instead of Unix.execvp, reaps the exit-code, and Unix.exit with it, so I know that the return-code is bona-fide, etc.

==========

Concretely, in a Makefile I have:

test::
	../../src/LAUNCH2 -- false || touch FAILED
	if [ ! -f FAILED ] ; then false; fi
	rm -f FAILED

“false” is the program that exits with code=1 (not zero). LAUNCH2 invokes its argument with Unix.execvp. And if you follow the shell logic, if the file FAILED is created, that means
the first arm of the disjunction did not return nonzero – it returned zero. So the target fails if LAUNCH2 doesn’t properly ensure that the return-code of false is properly received by make.

Let me try that again: if FAILED is created, then the first arm of the first line exited nonzero. And on the next line, if FAILED does not exit, then we fail. So: that second line fails exactly when the first arm of the first line DOES NOT fail.

OK, so that’s the situation. It’s a pretty trivial program, but searching the issue database for ocaml/ocaml doesn’t show that anybody’s found this?

Again, only on Windows.

ETA: just in case, I do understand that it isn’t program #1 that “passes along” the return-code from program #2. Rather, the kernel does the job. But I don’t know how else to describe it. When OCaml program#1 execs over itself with (some arbitrary) program#2, does that mean that the invoker of program#1 gets the exit-code of program#2? that’s the question I’m asking.

ETA2: and here’s a repro in a git repo with a CI that repros on windows: GitHub - chetmurthy/ocaml-exec-bug: Sysadmin scripts written in OCaml (and Perl precursors)
And of course, it works on other platforms, and there’s a CI that demonstrates that too.

As the documentation of execvp says, “On Windows: the CRT simply spawns a new process and exits the current one. This will have unwanted consequences if e.g. another process is waiting on the current one.” So, your shell process is actually waiting on LAUNCH2, not on false, so it does not get the exit code of false.

This is one of two bugs I encountered in the last 72hr. The other is much more paradoxical, but happily it goes away if I replace the OCaml program (LAUNCH.ml) with the Perl program (LAUNCH.PL) I used to use before rewriting it in OCaml.

As you discovered, exec does not have the same semantics in Windows as in Un*x. If you want to port your software to Windows, you may want to use higher-level abstractions, such as Unix.create_process + Unix.waitpid.

Cheers,
Nicolas

2 Likes

That said, following on from some of the things I (re-)discovered in ocaml/ocaml#13879, I’m tempted to reimplement the Windows versions of the exec functions in terms of spawn to have rather less surprising semantics (it still wouldn’t be identical to the Unix semantics, but the functions would at least be useful on Windows, where at present they’re not)

3 Likes