Should the official OCaml source charset be Unicode/UTF-8?

perry · May 26, 2018, 6:49pm

FYI, I am interested in opinions on this pull request I’ve made, which would finally desupport Latin-1 identifiers and the like and proclaim in the documentation that the official charset and encoding of OCaml source code is Unicode/UTF-8: https://github.com/ocaml/ocaml/pull/1802

Latin-1 identifiers have been deprecated for quite a while, so this is not actually such a big change per se, and all it involves on a technical level is disabling the residual support for Latin-1 identifiers in the lexer. Given that, nothing would prevent people from making things like Tuareg mode presume UTF-8, and from freely using UTF-8 in comments and strings.

One might therefore ask why making this an official policy is of importance. Two good reasons are:

It would mean that editors and other code management tools could be set to default to UTF-8 mode, which would be pleasant.
It is a logical step along the path towards better Unicode support in OCaml as a whole.

perry · May 29, 2018, 3:10pm

Note: right now, that pull request ( https://github.com/ocaml/ocaml/pull/1802 ) does not involve any actual tests to see if files are in valid UTF-8. I would be interested in learning opinions on this topic, either here or on the pull request itself.

Topic		Replies	Views
Unicode in OCaml source code? Learning	12	4596	December 8, 2017
[ANN] Unicode 15.1.0 update for Uucd, Uucp, Uunf and Uuseg Community announce , ocsf	0	439	September 15, 2023
UNICODE support in Objective CAML runtime system Learning ocaml	11	2976	January 20, 2020
Missing header for plaintext documentation Site Feedback	1	80	January 24, 2025
Newbie question: Unbound module Uchar.Utf8 Learning	5	245	August 5, 2024

Should the official OCaml source charset be Unicode/UTF-8?

Related topics