Unicode 13.0.0 update for Uucd, Uucp, Uunf and Uuseg

Hello,

Unicode 13.0.0 was released on the 10th of march.

It adds 5390 characters to the standard including graphic symbols for legacy computing. If you were looking for characters representing seven-segment decimal digits, now you have them. For the curious, the encoding proposal has the motivation and source of these new symbols. For more information about all the other additions, see this page.

Accordingly the libraries mentioned at the end of this message had to be updated, consult the individual release notes for details. Both Uucd and Uucp are incompatible releases sinces new script and block enumerants had to be added.

Uucp has a new Emoji module with the new emoji properties introduced in 13.0.0 which are now used by Uuseg to improve emoji segmentation. The overall compiled size of Uucp shrinked a bit; here uucp.cmxs went from 7.8Mo to 4.6Mo. Further reduction can likely be achieved with more work. Thanks to David Kaloper Meršinjak for helping on this.

A periodic reminder, if Unicode still puzzles you, read an absolute minimal Unicode introduction and OCaml Unicode tips on this page (also available via odig doc uucp).

Happy retro computing,

Daniel

P.S. The OCaml compiler detected an obsolete rule in the 13.0.0 update of the Unicode line breaking algorithm.


Uucd 13.0.0 Unicode character database decoder for OCaml.

http://erratique.ch/software/uucd

Uucp 13.0.0 Unicode character properties for OCaml.

http://erratique.ch/software/uucp

Uunf 13.0.0 Unicode text normalization for OCaml.

http://erratique.ch/software/uunf

Uuseg 13.0.0 Unicode text segmentation for OCaml.

http://erratique.ch/software/uuseg

13 Likes

Thank you very much for all of your work in this area; and for passing along the fun outcome re: the line breaking spec!