How to access the module Uutf.String.UTF_8

I quite did not understand the tone of your comment … but anyway.

I understand that handling Unicode can be very complex, but someone needs to be practical. If most mainstream languages offer direct support, or indirect via libraries, at higher abstraction levels, even with some flaws or limitations, why does not OCaml do the same? This is a matter of opting for practicality, or of offering too low-level libraries, or nothing practical, until there is some perfect solution.

I always opt for practicality, but YMMV.

I’m going to agree with both sides here.

  1. Handling Unicode is really unpleasant. The complexities are enough to make you go bald from pulling your own hair out.
  2. Other languages seem to have worked out having basic type-level support for the needed Unicode representations.
  3. Other languages seem to have standardized their library support for manipulating Unicode. It isn’t always great, but they’ve picked APIs they can live with.
4 Likes

Maybe “Unicode Regular Expressions” and the Technical Standard #18 are good starting points to discuss what a proper design of these operation (and a regexp lib) should be? @yoriyuki may want to comment on the decisions he made for Camomile.

1 Like

As many people pointed above, Unicode handling can be complex. However, the code point level operations (like cutting substring, pointing somewhere in the text…) should be easy and standardized. Introducing a type for Unicode string and literals would greatly ease development of multilingualized OCaml programs.

Camomile is under an overhaul now. My plan is splitting Camomile into basic code points level operations, Unicode algorithm (Casing etc.), modules for additional encodings and locale data.

4 Likes