Wildcard expansion on Windows

While implementing a small CLI tool, I ran into a somehow undocumented feature of the Ocaml compiler: it automatically expands wildcards before doing anything else. Which proved to be a problem.

This post serves three different goals:

  • give some visibility, in case someone else run into this issue in the future
  • expose a possible workaround
  • ask the community if there is a better way™ to solve this

Context

My tool uses Cmdliner for CLI args processing, and needs to handle basic wildcard processing for one of its options, eg. it should handle mytool.exe -x *.ml.

This would get expanded to mytool.exe -x a.ml b.ml c.ml which Cmdliner cannot handle. Under any common Unix shell, this is not a problem: we just have to escape the star character with eg. mytool.exe -x \*.ml, have mytool handle the expansion itself and we’re all set. So far, so good.

Then came Windows. Whatever I would do, it seemed like there was no way of preventing that wildcard to be expanded. I learned that on Windows, the calling program was responsible for dealing with wildcards, not the shell. After some digging, the root cause of this behaviour was found in the ocaml runtime itself, in runtime/main.c:

int main_os(int argc, char_os **argv)
{
#ifdef _WIN32
  /* Expand wildcards and diversions in command line */
  caml_expand_command_line(&argc, &argv);
#endif

/* [...] */
}

After a bit of history digging, it turns out this behaviour dates back from the very early stages of the Ocaml compiler, see this commit by Xavier Leroy from… 1996!

Workaround

The runtime/main.c file gives a hint on how to work around this:

/* Main entry point (can be overridden by a user-provided main()
   function that calls caml_main() later). */

So the most elegant workaround I could find was to create a copy of the main.c file inside the source tree of mytool and comment out the call to caml_expand_command_line. Then it was a matter of compiling and linking everything altogether. I use dune to compile mytool.exe, and after a lot of trial-and-error, I found out it could handle this very easily with the foreign_stubs stanza:

(executable
 (name mytool)
 (foreign_stubs (language c) (names main))
 ; ...
)

Minimal working example

I opened a Github repository containing a minimal project featuring a custom entry point so that command-line arguments expansion does not happen on Windows.

See: GitHub - benji-sb/ocaml-windows-argv

Open Questions

  • The root cause of this issue was introduced almost 30 years ago. How come no one on the Internets seem to have run into a similar issue?
  • Why was this behaviour introduced in the first place? I suspect it may have make it easier to setup a Windows toolchain back then, but that’s just wild speculation.
  • Is this behaviour still needed, or could we get rid of it?
  • Should this be more wildly documented, and if so, where? The ocaml compiler docs and the dune docs could probably benefit from a small paragraph on how to override the default entry point.
1 Like

Interesting question!

As you mentioned, under Windows, wildcard expansion is the responsibility of the program, not the calling shell. I guess that the rationale was to emulate some basic Un*x behaviour under Windows (in the same way that certain POSIX functions of the Unix module are emulated under Windows).

Another possibility is to modify the code so that a * which is preceded by a backslash is not interpreted.

I think so. The manual seems to be the most appropriate place, in chapter 15 (bytecode runtime system) and maybe also 16 (native-code compilation).

Also, you may want to repost this issue to the OCaml bugtracker Issues · ocaml/ocaml · GitHub, where the maintainers of the system may be able to give you a more relevant answer.

Cheers,
Nicolas

But on Windows backslash is the directory separator :–)

Do you have any idea how/if Window’s provided way of doing this, supports escaping in any way ?

It seems using this was considered at some point upstream in this issue.

Darn, I stand corrected!

No idea, but a random search seems to indicate that the answer is “no”: c++ - Is it possible to quote/escape command line arguments on Windows while linking setargv.obj? - Stack Overflow

Thanks for the reference!

Cheers,
Nicolas