Deriving, Format-ting and unicode

I get a UTF8 string from C++ and want to copy it to OCaml side. Right now I’m using standard string type. And printing of these strings is not fancy.

I heard that visualization of unicode string somehow depend on the destination where I’m printing it. But my understanding is very vague and I’m not sure. Example

main.ml

type t = A of string [@@deriving show]

let p = A "(\208\149\208\178\208\179\208\181\208\189\208\184\208\185)"

let () =
  let (A s) = p in
  print_endline s
;;

let () = Format.printf "%a\n%!" pp p

And I get

(Евгений)            
(Main.A "(\208\149\208\178\208\179\208\181\208\189\208\184\208\185)")

Clearly Format is not so fancy in printing.

Questions:

  1. Which alternatives we have to store UTF8 string as OCaml value?
  2. Is there any ppx_deriving special attribute that repairs printing of Unicode? Or Format-specific one?

The issue is not Format, which prints UTF-8-encoded strings just fine. Rather, it is the PPX-derived pretty-printer that prints the string in escaped form, so every character outside of the ASCII range is being shown in decimal form (\ddd).

Cheers,
Nicolas

Thanks! The right solution is

type t = A of (string[@printer fun fmt -> fprintf fmt "%s"]) 
[@@deriving show]

Seconded. You could add [@printer Format.pp_print_string] to use the direct unescaped printer.

3 Likes