[ANN] Unicode 15.1.0 update for Uucd, Uucp, Uunf and Uuseg


Unicode 15.1.0 was released on September 12.

This is a point release for Unicode organisational reasons but it still adds 627 new characters to the standard and a few rules were changed in the segmentation standards. See the details on the announcement page.

Accordingly the libraries mentioned at the end of this message had to be updated. Consult the individual release notes for details. Both Uucd and Uucp are incompatible releases sinces a block enumerant had to be added and some property values changed their type. A few new properties related to identifiers, CJK and Indic breaking are also added, see the Uucp release notes for details.

As mentioned last year all the libraries and sample code have been changed to use the UTF decoders of the standard library rather than rely on the uutf package.

This has the following impact:

  1. These new versions are only available for OCaml >= 4.14.0
  2. The library name uunf.string is deprecated. The Uunf_string
    module is now simply part of the uunf library.
  3. The library name uuseg.string is deprecated. The Uuseg_string module
    is now simply part of the uuseg library.

Regarding point 2. and 3. the libraries still exist but generate an ocamlfind warning if they are used. They are empty and simply require the base library. They will be fully removed at some point.

Two other less visible changes are:

  • After waiting for too long if intra module link time dead code elimination would maybe make it in the compiler, Uucp was finally changed to use module aliases. This means that only the data modules you use get linked in your programs.

  • Also after much reluctance, the repos now track generated data files for better source traceability, sandboxed pinning, and make it easier to dig them out when their data generation strategy breaks the compiler.

A big thanks for funding from the OCaml Software Foundation and from my donators.

I welcome and thank the new donator ahrefs.

And remember, OCaml :heart: Unicode.