Printing filesystem paths in OCaml

Ahoy!
I’m a bit dissatisfied when I see the different ways we print filesystem paths for human consumption to users in OCaml code. I think we should settle once and for all on a method to print paths.

I expect that if a program prints a path, I should be able to copy-paste it as-is, in the general case, and give it to another system command. I may add quoting myself, but would prefer not to.

I often see %S being used to escape the path, as in:

Printf.printf "My path %S" path
  • S: convert a string argument to OCaml syntax (double quotes, escapes).

I think this is incorrect: paths shouldn’t be printed as OCaml strings. This leads to overzealous double-quoting, such as:

# let err = Printf.sprintf "My path %S" {|/home/Antonin/My Pictures|} in
  Printf.ksprintf failwith "error: %s" err;;
Exception: Failure "error: My path \"/home/Antonin/My Pictures\""

I’m sure users don’t care about the nested escaped quotes. I’m not sure what %S is good for unless you’re pretty-printing OCaml code.

I think that Filename.quote is more adapted, as it:

Returns a quoted version of a file name, suitable for use as one argument in a command line, escaping all meta-characters.

# let err = Printf.sprintf "My path %s" (Filename.quote {|/home/Antonin/My Pictures|}) in
  Printf.ksprintf failwith "error: %s" err;;
Exception: Failure "error: My path '/home/Antonin/My Pictures'".

It does suffer from the fact that Filename is not parametrized by the type of shell, but depends of the platform (Windows, Cygwin, Unix), which makes it impossible out-of-the box to emit bash-compatible paths from a program build for Win32, or emit Windows paths from Linux. See also Filename.{Unix,Win32,Cygwin} should be exported #11940.

The elegant Fpath library has conversions to strings and pretty-printers. If I’m not mistaken it doesn’t do anything special with respect to the shell or human readers.

Then there’s the case of ANSI escape sequences embedded in file names, or Unicode characters for that matter. I would expect this ideal function to present escaped paths, such as GNU ls:

$ touch "\e[0;31mred\e[0m"
$ for file in *; do echo "$file"; done
red # actually shown in red
$ ls
\e[0;31mred\e[0m

Neither Filename.quote nor Fpath.pp escape ANSI codes:

# print_string (Filename.quote "/october/\027[0;31mred\027[0m/bloup");;
'/october/red/bloup'- : unit = ()
# Fpath.pp Format.std_formatter (Fpath.v "/october/\027[0;31mred\027[0m/bloup");;
/october/red/bloup- : unit = ()

Don’t even get me started on Windows paths! I often see paths containing double backslashes, that I really can’t do anything with unless I painstakingly remove one of each, or change the whole quoting style. Sometimes paths start with backslashes and switch to forward slashes, which the system can deal with, but is not consistent for users.

In conclusion: please stop using %S for printing paths. A path isn’t an OCaml string. Someone please parametrize and fix the Filename module. Consider switching to Fpath.
Is there somewhere a function that can correctly print and escape paths? maybe in compiler-libs? in Dune std? in opam? if not, can we devise one and push it everywhere?

8 Likes

This is an excellent idea. Perl has a package that something like this: “ShellQuote” and I fricken’ love it! When I print a filename, I can cut-and-paste it right back into the shell, no worries no crying. And it doesn’t matter if the filename has a quote, or a double-quote: ShellQuote takes care of choosing the right quotes. I admit that non-ASCII chars stump it (or the shell, or my terminal emulator) and I need to figure that out.

This is something whose time came long ago, and we didn’t do anything about it. Thank you for bringing it up.

2 Likes

Unfortunately, the question as posed does not have a unique solution, as the best textual format for file system paths is a function of the path itself but also the expected consumer of the textual representation.

Concretely, file system paths need to be quoted in various ways if they will be consumed by humans (eg error messages), by CreateProcess on Windows, by the shell (eg /bin/sh/cmd.exe/Sys.command) or by an OCaml program (for which %S is well-suited).

One could argue that the double quoting stems not from the use of %S but from the fact that you are using the default exception handler to print the Failure _ exception. The default handler is not really meant to be particularly human-readable. If you provided your own handler, eg

let () = try ... with Failure s -> Printf.eprintf "Error: %s\n%!" s

there would be no double quoting in your message.

If you intend to pass your paths to CreateProcess on Windows, Filename.quote is probably your best bet. If your path will be passed to the shell you will want to use Filename.quote_command (which correctly handles the notorously tricky cmd.exe quoting rules; on Un*x systems, both functions are pretty similar).

Cheers,
Nicolas

3 Likes

Even for that it’s not good as it will escape your UTF-8 encoded strings. What you want is actually context dependent so you’d like a family of formatter for pretty printing OCaml string literals and quoted strings in user message.

I started to try to devise better unicode OCaml string literal formatters in the Fmt of b0.std (see the _literal functions here). I’m not fully happy with them yet though .

1 Like

Now I have to fix both the path printing function and the exception handler! :stuck_out_tongue:

Thanks, that is useful. Having ranted out loud I think I’m really looking for something like ls, which escapes ANSI characters and quotes for printing. I may have a look at the algorithm it uses. Filename.quote_command won’t escape ANSI sequences, so I can’t print a path with it.

1 Like

I’m not sure I’m getting what you are trying to achieve here. What is the property you are seeking exactly ?

Ah I think I know. It’s the function I feel is missing from the combinators I linked to which is Fmt.text_string but with UTF-8 decoding errors being replaced by hex escapes rather than U+FFFD (which is slightly more general than trying to specifically look for ANSI escapes but what I’d like to have to snapshot test TTYs)

1 Like

It appears that the client doesn’t have a clear idea of what he wants! and is constantly redefining the requirements. :grinning_face_with_smiling_eyes:

I want to print a path for human consumption, say a log trace. I’d like that in the most cases I should be able to copy-paste the path and feed it to the shell; like when Dune prints the commands it executes. I should be able to re-use the commands and printed paths and tweak them. I understand and expect that it might not be possible is the path really is too weird, in which case I’m fine with a sane default.
I think this means that non-printable characters should be escaped, and something should be done with ANSI sequences. Maybe quotes, spaces need to be handled in some way too.
I’ll have a look at Fmt.text_string, thanks!

1 Like

This seems like a succinct problem-statement. I use Perl’s ShellQuote to achieve this, but since I’ve never really gotten unicode to work with my terminal emulator, it’s mostly confined to ASCII. But even at that, it’s incredibly valuable: when something gets printed out (usually, commands) being able to cut-and-paste it right back into the shell is so great. I notice that “ls” on modern Linux does this, too.

Especially with filenames containing unicode non-ascii chars, this is more and more a problem.

[OK,I could be completely wrong about this, but here goes …]

I re-read your post, and it seems to me that while you’re absolutely right about the need to have an ironclad way of printing filenames so they can be cut-and-pasted back into the (a?) shell, the example you’ve cited isn’t actually broken.

The code of Filename.quote only ever uses single-quotes: that’s why it appears better in strings:

# Filename.quote "foo";;
- : string = "'foo'"
# Filename.quote "foo'bar";;
- : string = "'foo'\\''bar'"

The minute you use a single-quote in a string, Filename.quote 's output in strings gets ugly too.

I think it’s a nice idea to want the contents of quoted filenames in OCaml strings output by the toplevel as part of pretty-printing values, to look “right” (cut-and-paste-able back into the shell). But that’s much, much less important than getting the output of OCaml programs that print output directly, to look right. And I say that, even though I use the toplevel religiously for debugging all the code I write.

I’d go further, and say that what this quoting mechanism should do, is to choose single-quotes when the content does not contain single-quotes, double-quotes when the content does not contain double-quotes, and … some default when both are present.