Exception UTF-8 inconsistently printing as an escaped string

With a test.ml that contains just failwith “тест”, my guard is lowered by this printing as shown here, and then in a real program it prints as string escapes. What’s responsible for that?

$ ocaml test.ml
Exception: Failure "тест".
$ ocaml -I +unix unix.cma test.ml
Exception: Failure "тест".
$ ocamlc test.ml && ./a.out
Fatal error: exception Failure("тест")
$ ocamlc -I +unix unix.cma test.ml && ./a.out
Fatal error: exception Failure("\209\130\208\181\209\129\209\130")

Unix registers a callback, but copying that into this test still doesn’t breaking printing, so my guess is that some C stub changes the printing.

I’ve found a close workaround in

let () =
  Printexc.register_printer (function
    | Failure msg -> Some (Printf.sprintf "Failure(%s)" msg)
    | _ -> None)

What I’d like is sadly a locale, or%Sescaping everything except for some graphemes that I consider to be printable, so I need register_printer in any case. I just wonder what’s changing the printer above.

There are two handlers for uncaught exceptions. One, written in OCaml, lies in the standard library (module Printexc). The other one, written in C, lies in the runtime and is used as a fallback if the former does not exist. The one from the standard library is quite sophisticated, too clever for its own good, which causes it to mess with your UTF8 string. The one from the runtime, much simpler, does not try to interpret the exception string in anyway, thus letting it go through unmodified.

When you link the Unix library, you implicitly cause the Printexc module to be loaded, which in turn messes with your exception.

I suggest that, instead of letting the default handler take care of your exception, you register your own using Printexc.register_printer, so that you have complete control over the way the string is displayed.

4 Likes