Should the official OCaml source charset be Unicode/UTF-8?

FYI, I am interested in opinions on this pull request I’ve made, which would finally desupport Latin-1 identifiers and the like and proclaim in the documentation that the official charset and encoding of OCaml source code is Unicode/UTF-8: https://github.com/ocaml/ocaml/pull/1802

Latin-1 identifiers have been deprecated for quite a while, so this is not actually such a big change per se, and all it involves on a technical level is disabling the residual support for Latin-1 identifiers in the lexer. Given that, nothing would prevent people from making things like Tuareg mode presume UTF-8, and from freely using UTF-8 in comments and strings.

One might therefore ask why making this an official policy is of importance. Two good reasons are:

  1. It would mean that editors and other code management tools could be set to default to UTF-8 mode, which would be pleasant.
  2. It is a logical step along the path towards better Unicode support in OCaml as a whole.
10 Likes

Note: right now, that pull request ( https://github.com/ocaml/ocaml/pull/1802 ) does not involve any actual tests to see if files are in valid UTF-8. I would be interested in learning opinions on this topic, either here or on the pull request itself.

1 Like