Hello! I am trying to understand how the ppx_sexp_conv
handles Unicode characters. Consider the following type
type token = TEXT of string [@@deriving sexp]
If I run the following commands in the toplevel I get what I expect.
# #require "ppx_jane";;
# let x = TEXT "hello";;
# Sexp.to_string (sexp_of_token x);;
- : string = "(TEXT hello)"
Similiarly, if run the same command except I define x
using unicode string literals I get the same output.
# let y = TEXT "\u{0068}\u{0065}\u{006C}\u{006C}\u{006F}";;
# Sexp.to_string (sexp_of_token y);;
- : string = "(TEXT hello)"
However if I try to put in the character あ / U+3042 I get an output that I don’t really understand.
# let y = TEXT "\u{3042}";;
val y : token = TEXT "あ"
# Sexp.to_string (sexp_of_token y);;
- : string = "(TEXT\"\\227\\129\\130\")"
Maybe I am missing something really obvious but I tried to decode those code points and it doesn’t resolve to the character I put in. Would really appreciate if someone could shed some light on this!