OCaml standard library Unicode support

If both the CLI application and the terminal application use wcswidth to determine the length of the string then the CLI application should be able to more reliably pad/align Unicode text. Correct or not, if they both call the same wcswidth the end result will be at least consistent.
Unfortunately this would only work if the application is local and the terminal implementation calls wcwidth/wcswidth (not all of them do, some have their own implementation).
If you SSH somewhere else then you have no hope of getting the same wcwidth or wcswidth implementation.

Yes, that might be one way.

Another way might be some kind of terminal (or unicode) control sequence that says “this group of unicode codepoints is X wide, pad it with spaces or truncate it to make it exactly that wide”. That way control over the length is transferred to the application, and not the terminal (so the output might look more consistent on different terminals and different fonts).

This would be similar to what you already have to do for the shell to correctly calculate line length if you use Unicode or colors in your prompt, or the way you tell OCaml’s Format module about the 0 length for color control sequences, etc.

That control sequence would only need to be inserted in more complicated situations (emojis, long grapheme clusters), for situations where the length is unambigous the application wouldn’t have to emit it.
That could be implemented in the font rendering layer (and thus shared between many terminal implementations on a given OS), instead of the terminal application (where you’d need to convince each terminal to implement it. We still don’t have full 24-bit color support on every terminal, so don’t underestimate the speed of adopting new terminal features… especially on OSes that aren’t Linux)
Another advantage is that it avoids a round-trip between the application and the terminal, which may not matter if you run it locally, but may matter a lot when run remotely (even on a ~low latency link lets say ~10ms, if you need to synchronously wait 10ms to output each character – to know its length, that’ll be a very slow application…)

On a related note, here is a fun IETF draft: draft-bormann-dispatch-modern-network-unicode-04 - Modern Network Unicode

1 Like