FYI, I am interested in opinions on this pull request I’ve made, which would finally desupport Latin-1 identifiers and the like and proclaim in the documentation that the official charset and encoding of OCaml source code is Unicode/UTF-8: https://github.com/ocaml/ocaml/pull/1802
Latin-1 identifiers have been deprecated for quite a while, so this is not actually such a big change per se, and all it involves on a technical level is disabling the residual support for Latin-1 identifiers in the lexer. Given that, nothing would prevent people from making things like Tuareg mode presume UTF-8, and from freely using UTF-8 in comments and strings.
One might therefore ask why making this an official policy is of importance. Two good reasons are:
- It would mean that editors and other code management tools could be set to default to UTF-8 mode, which would be pleasant.
- It is a logical step along the path towards better Unicode support in OCaml as a whole.