Huh, wonder if this is an OCaml Windows bug

(1) I have a program, BOING.ml, that uses Unix.system to invoke another program, “foobar.ml” (compied/copied to “yadda…” (with varous sufixes).
(2) When yadda.ml is compiled to “yadda.exe”, it works fine.
(3) When yadda.ml is compiled to “yadda.opt.exe”, and invoked as “yadda.opt -verbose”, I get a weird error (perhaps from cmd.exe, I got no idear, since I ain’t no Windows dude):

unknown option -verbose

I have as close as possible to a clean repro, in this repo: GitHub - chetmurthy/ci-sandbox: CI Sandbox
commithash: 2c6a9bc5ef921f66df36b7a8eedd931d09342340

The example is in the directory “system” and can be reproduced by running the Github CI action in the repo, or running:

make -C system all

Salient output below (lines with “HERE” inserted to highlight error):

ls -la
total 2085
drwxrwx---+ 1 Administrators None      0 Mar 27 03:36 .
drwxrwx---+ 1 Administrators None      0 Mar 27 03:35 ..
-rwxr-x---+ 1 Administrators None     42 Mar 27 03:34 .gitignore
-rwxr-x---+ 1 runneradmin    None   1100 Mar 27 03:36 BOING.cmi
-rwxr-x---+ 1 runneradmin    None   1933 Mar 27 03:36 BOING.cmo
-rwxr-x---+ 1 runneradmin    None 545419 Mar 27 03:36 BOING.exe
-rwxr-x---+ 1 Administrators None    616 Mar 27 03:34 BOING.ml
-rwxr-x---+ 1 Administrators None    568 Mar 27 03:34 Makefile
-rwxr-x---+ 1 runneradmin    None    688 Mar 27 03:36 foobar.cmi
-rwxr-x---+ 1 runneradmin    None    891 Mar 27 03:36 foobar.cmo
-rwxr-x---+ 1 runneradmin    None 516422 Mar 27 03:36 foobar.exe
-rwxr-x---+ 1 Administrators None     68 Mar 27 03:34 foobar.ml
-rwxr-x---+ 1 runneradmin    None 516422 Mar 27 03:36 yadda.exe
-rwxr-x---+ 1 runneradmin    None 516422 Mar 27 03:36 yadda.opt.exe
./BOING yadda -verbose foo || echo yadda failed
LAUNCH: command "yadda ^"-verbose^" ^"foo^""
["yadda"; "-verbose"; "foo"]
./BOING yadda.exe -verbose foo || echo yadda.exe failed
LAUNCH: command "yadda.exe ^"-verbose^" ^"foo^""
["yadda.exe"; "-verbose"; "foo"]
./BOING yadda.opt -verbose foo || echo yadda.opt failed
LAUNCH: command "yadda.opt ^"-verbose^" ^"foo^""
------------------ HERE ---------------------
unknown option -verbose
------------------ HERE ---------------------
yadda.opt failed
./BOING yadda.opt.exe -verbose foo || echo yadda.opt.exe failed
LAUNCH: command "yadda.opt.exe ^"-verbose^" ^"foo^""
["yadda.opt.exe"; "-verbose"; "foo"]

Notice the extremely terse error-message.

I am neither a Windows programmer, admin, nor user. Literally have never had it on a machine I own. I’ve used Github’s CI somewhere between 100 and 1000 times, in order to narrow down this and a couple other Windows support bugs. So someone else will need to run with this, if they want it fixed.

For myself, I can attest that Perl’s equivalent, IPC::System::Simple::runx has never displayed such problematic errors.

P.S. I did read the doc-comments, and saw nothing to make me think that this was expected behaviour.

It is indeed a very old bug. You can see the same thing on Windows if you run ocamlc.byte -verbose vs ocamlc.byte.exe -verbose. I think I first spotted it working on Relocatable OCaml in 2021, but as it has always been wrong and no one else had apparently ever hit it, I made a private note to fix it and left it at that.

This only affects bytecode executables which have a . in the name. There are three workarounds - one is to use native code (!!); as you’ve seen, appending the .exe explicitly also fixes it; the other is to fully qualify the path (.\yadda.opt should work).

What you’re actually seeing is an error message ocamlrun (i.e. from the interpreter), just here:

The reason that is happening is because the Windows API function SearchPath has some very surprising semantics w.r.t. a dot in the filename.

I have a fix in [Relocatable] 3f: Executable header fixes and tweaks by dra27 · Pull Request #190 · dra27/ocaml · GitHub - the change in behaviour in that commit intentionally fixes the bug you’re seeing, as well as another one with the Cygwin version of OCaml.

7 Likes

There was a different manifestation of what I think is the same bug, where the argument is a “-n”. Same sort of behaviour, IIRC. Hopefully your fix will fix that too.

Next bug. Or maybe not. Who knows, who knows.

Repro: ci-sandbox/scripts/hello1 at master · chetmurthy/ci-sandbox · GitHub
commithash: a06d93a3d05b991cc5465cc33400a67794e375f4

The CI in the repo reproduces the bug. Here is a run of the CI: CI-WINDOWS-FIXED · chetmurthy/ci-sandbox@a06d93a · GitHub

[I don’t know how long the logs are kept around for, but it should be straightforward to clone the repo and rerun the CI]

Description:

On Windows you can write scripts in Perl and Bash. Those scripts should be invokable by the standard Unix.system machinery, just as they are invokable by Perl’s IPC::System::Simple::runx. This repro tests that behaviour.

There are four cases, which correspond to:

(1) whether the launcher is written in (a) Perl, or (b) OCaml
(2) whether the invoked program is written in (a) Perl or (b) Bash.

  • actualy, I didn’t need bash in my work (only Perl). But since I was doing this testing anyway, I figured hey, let’s see what happens.

The CI runs thru all 4 cases of the matrix. I attach the output of (1a)+(2a) and (1b)+(2a). that is, Perl-invokes-Perl and OCaml-invokes-Perl. Each case consists in a series of tests; each test does an invocation, and the output should contain either the output of the invoked program, or an ERROR/EXPECTED message. That is to say, either the invoker should successfully invoke the program, or there should be an error.

[Note: I do check that each launcher can invoke OCaml (by invoking “BOING true”).]

What -should not- happen, is that the invoker -thinks- it invoked the program, but the program did not actually run, silently. Needless to say, this is the same “BOING.ml” from before, so using Unix.system.

==== Perl invokes Perl ====
 ==== START ====
 ./hello.perl || echo "ERROR: cannot exec ./hello.perl directly"
 Hello world from ./hello.perl
 ===============
 HELLO.PL || echo "ERROR: cannot exec HELLO.PL (from bin)"
 Hello world from bin/HELLO.PL
 ===============
 ./LAUNCH.PL BOING true || echo "ERROR: launcher cannot exec BOING"
 ./LAUNCH.PL: BOING true
 D:\a\ci-sandbox\ci-sandbox\scripts\hello1\bin\BOING.exe: command "true "
 ===============
 ./LAUNCH.PL hello.perl || echo "EXPECTED: PATH does not contain . in ./LAUNCH.PL" 
 ./LAUNCH.PL: hello.perl
 "hello.perl" failed to start: "No such file or directory" at ./LAUNCH.PL line 21.
 EXPECTED: PATH does not contain . in ./LAUNCH.PL
 ===============
 ./LAUNCH.PL HELLO.PL || echo "ERROR: ./LAUNCH.PL cannot exec HELLO.PL (from bin)"
 ./LAUNCH.PL: HELLO.PL
 Hello world from bin/HELLO.PL
 ===============
 ./LAUNCH.PL hello.pl || echo "ERROR: ./LAUNCH.PL cannot exec hello.pl (from bin); Windows path-lookup should be case-insensitive"
 ./LAUNCH.PL: hello.pl
 Hello world from bin/hello.pl
 ===============

==== OCaml invokes Perl ====
 make[1]: Entering directory '/cygdrive/d/a/ci-sandbox/ci-sandbox/scripts/hello1'
 ==== START ====
 ./hello.perl || echo "ERROR: cannot exec ./hello.perl directly"
 Hello world from ./hello.perl
 ===============
 HELLO.PL || echo "ERROR: cannot exec HELLO.PL (from bin)"
 Hello world from bin/HELLO.PL
 ===============
 ./BOING BOING true || echo "ERROR: launcher cannot exec BOING"
 D:\a\ci-sandbox\ci-sandbox\scripts\hello1\BOING.exe: command "BOING ^"true^""
 BOING: command "true "
 ===============
 ./BOING hello.perl || echo "EXPECTED: PATH does not contain . in ./BOING" 
 D:\a\ci-sandbox\ci-sandbox\scripts\hello1\BOING.exe: command "hello.perl "
 ===============
 ./BOING HELLO.PL || echo "ERROR: ./BOING cannot exec HELLO.PL (from bin)"
 D:\a\ci-sandbox\ci-sandbox\scripts\hello1\BOING.exe: command "HELLO.PL "
 ===============
 ./BOING hello.pl || echo "ERROR: ./BOING cannot exec hello.pl (from bin); Windows path-lookup should be case-insensitive"
 D:\a\ci-sandbox\ci-sandbox\scripts\hello1\BOING.exe: command "hello.pl "
 ===============

Here is yet another bug.

Synopsis: it is a bug to silently reject Cygwin-style pathnames, while relying on Cygwin for CI. The rejection should be noisy, so that programmers are aware of the mistake.

(1) OCaml on Windows (perhaps correctly) does not prereq cygwin, nor depend on it in any way.

(2) Nevertheless,the Github CI environment uses Cygwin.

(3) For a UNIX programmer porting packages to Windows, this makes it trivially easy to end up using Cygwin-style paths in places where Windows-style or at least mixed-style should be used.

(4) When you do this in OCaml commands, you get errors, viz. ci-sandbox/.github/workflows/ci-cygwin-bug.yml at master · chetmurthy/ci-sandbox · GitHub

(5) If OCaml and its associated tooling would -check- at any place where a filename or directory-name came in (on cmdlines) for cygwin-style pathnames, and at least emitted a “this is a very, very bad idea” warning, that might help prevent UNIX programmers from making this mistake.

No you can’t! You’re running those scripts in Cygwin which is fundamentally not the same thing. With a native Windows perl installation (ActivePerl et al) you might get an association to be able to just about start .pl files from the Windows shells, but those shebang-style scripts are meaningless on Windows.

I get that it’s confusing, but in order to run those commands, you had to override the shell in each run step. The “Show what is cached” step is running in an actual Windows shell:

Run opam switch
  opam switch
  opam list
  opam pin list
  shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"

If you’re using Cygwin to run the native Windows stuff, you’re essentially cross-compiling from a host which happens to be able to run target executables. That either has to be taken into account (as various of the CIs for the bigger core projects in the GitHub ocaml org do), or the harnesses you’re using want to be native Windows as well (which various other projects do).

I’ve never actually done it, but if you want to write shell scripts which can actually be run on Windows, Linux and macOS… you want to write them in PowerShell :wink:

1 Like