[ANN] Confero 0.1.1 - Unicode Collation

Confero implements the Unicode Collation Algorithm (UCA), currently built for Unicode 15.0.0. It also provides the Default Unicode Collation Element Table (DUCET), which implements a language-agnostic collation order.

For most use-cases, it should suffice to link with confero and confero.ducet and use the single entry point Confero.collate. For a drop-in replacement for String.compare, pass ~total:true, otherwise it will disagree with (=) due to normalization. If you don’t link with confero.ducet, the default collation will be based on Unicode codepoints. The API allows you to take more control of which collation mapping is used, and to evaluate separate stages of the UCA, if needed.

I haven’t looked into localizing collation, but it should be possible to create a custom mapping which calls the DUCET mapping as a fall-back. Note, however, that the collation elements are not stable across Unicode versions. CLDR should of interest to those who want to look into this.

The API documentation is not online yet, but I’ll post a link when it gets indexed on ocaml.org.

4 Likes