Help me pick a bikeshed color, er, print syntax for Uchar.t

perry · October 5, 2018, 12:26pm

Anyone want to help me figuring out the color for a bikeshed?

Uchar.t currently lacks a default printer for ppx_deriving.show — indeed, it’s practically the only thing in the standard library that doesn’t have such a printer. I’d like to do a pull request to add one. Writing the code is easy, but figuring out what the default printed representation should look like is not obvious, because there’s no read syntax for Uchar.t in OCaml currently.

I’d like to pick something that might even be a good candidate for a read syntax someday, so thoughts on a good one are actively solicited. I don’t want what I pick to suck, inspire the addition of that syntax to the language itself, and then end up living for a long time even though it sucks.

So, any suggestions? One complexity is that Uchar.t can store things that don’t print out very well, like zero-width joiners, combining characters, etc.

See also the discussion at https://github.com/ocaml-ppx/ppx_deriving/issues/174

perry · October 5, 2018, 6:51pm

One suggestion for an OCaml Uchar.t read syntax that’s evolved on the discord channel:

let pi : Uchar.t = \u'π'

(for direct entry of Unicode chars in source)

and

let alsopi: Uchar.t = \u{3C0}

(for entry of chars by their hex codepoint.)

It’s gross, but finding something less gross seems hard…

cfcs · October 14, 2018, 2:43pm

Printing out π requires determining the current locale of the user and leveraging that information to encode the codepoints to that encoding (you can’t just print UTF-8 and cross your fingers), depending on whether or not you want to support fun stuff like EBCDIC platforms or people running on iso-8859 platforms.

On both latin1 and utf-8 platforms I guess you could print out the char encoding provided it’s printable (0x20 <= t <= 0x7e or whatever)

An alternative would be to just print the codepoints, preferably in a format that would permit re-entry somehow. Ie I’d prefer 0x3C0 over \u{3C0} because I can copy-paste the former.

Not an easy problem.

perry · October 14, 2018, 5:14pm

That doesn’t really have much to do with this question, since we’re discussing read syntax.

cfcs · October 14, 2018, 5:21pm

Ah, I had never encountered the term “read syntax” before, but I take it that means the of_string syntax?

In that case let pi : Uchar.t = \u'π' seems to suggest we should parse the user’s input according to their locale, or assume UTF-8?

perry · October 14, 2018, 5:52pm

For source files, we will almost certainly be adopting unicode eventually — pretty much all programming languages have.

Topic		Replies	Views
Printing Uchar.t values Ecosystem unicode , stdlib	6	1135	February 8, 2023
Literals for Uchar.t (Unicode code points, more precisely Unicode scalar values)? Community	31	1542	October 28, 2023
Deriving, Format-ting and unicode Learning format , unicode , deriving	3	996	February 20, 2022
What's the function of Uchar? Learning	3	1882	October 9, 2017
Using non-ASCII characters in pretty printing? Learning	7	1625	June 29, 2017

Help me pick a bikeshed color, er, print syntax for Uchar.t

Related topics