Dear OCaml community,
For pedagogical purposes, I would like to try to modify a local version of the ocaml/ocaml source code, and change the parser so that the final
ocaml (and hopefully
ocamlopt) commands accept to read OCaml code where (some of) the keywords are translated in French. (Let’s just focus at French for the example, as it would easily be generalized to other latin-based languages).
For instance, I would like to be able to read, execute and compile a tiny program like this:
soit récursive factorielle (n : entier) =
filtrer n avec
| n quand n < 0 -> échoueavec "Erreur : n doit être >= 0 pour factorielle."
| 0 -> 1
| n -> n * (factorielle (n-1))
Accents are not important, if it’s hard to have them, I’ll settle down with no accents. Compare to the real (English) version:
let rec fact (n : int) =
match n with
| n when n < 0 -> failwith "Error: n has to be >= 0 for fact."
| 0 -> 1
| n -> n * (fact (n-1))
I read the parsing Hacking guide, but don’t really understand what should be modified and where. I want the parser to accept both English (real) and French (new) versions of the keywords, as I don’t want to translate anything from the codebase, I just this new parser to accept tiny files where all key-words are in French.
My main question is to know which file to modify in the parsing/ folder, and how to accept two versions of a keyword and map both to the same
From what I see in the guide, I should update
parser.mly, but my old memories from mly/mll files tell me that a
%token LET "let"
%token LET "soit"
will complain that
LET token is not unique…
Thanks in advance!
You can modify
parsing/lexer.mll returning the same token (eg
LET) for both versions of the keyword. You should be able to use latin-1 accents.
You could look at the source files of this april fools’ version of Chamelle numéro 5.
Another point if you ever want to have a localized version of warnings and error messages, I could refresh my internationalization patch.
Thanks for pointing out Chamelle, it seems to be exactly what I am aiming at.
I’ve failed to build it a few times, I guess it’s not easy to use it…
Thanks @nojb, I was indeed able to obtain everything I wanted, by just modifying
parser/lexer.mll. It was easier than I was afraid of!
Of course some small French keywords were used in a few places in the ocaml/*.ml files as variables names, but a simple rename worked (e.g.,
and in French) can be renamed
I have published my fork on my GitHub profile, honestly it’s more for a personal use than anything else, but well it’s there: https://github.com/Naereen/ocaml-mots-cles-en-francais. A full documentation of what I did and why is included, but only in French.
Thanks for your quick reply, that was an interesting tiny experiment!
Sorry @octachron, I don’t plan on spending more time on this.
If I remember correctly, ç supports OCaml (with Rust and C/C++) and it does exactly what you want.
For reference, the OCaml translation table for ç is here:
Thanks a lot to both of you for mentioning ç, it’s a very interesting project. Tiny but efficient and it does the job for C, OCaml and Rust!
It’s perfect for the C language (which I wanted to target tomorrow for a similar experiment), and quite good the OCaml language.
It seems that their choice of localized keywords is not prefix and the translator program does only one pass of transformation, so it misses a keyword (and it’s not a corner case, it misses the
do keyword!). I’ll try to fix it on my own before reaching to the developers of ç.
I’m a bit late to the party but I had the same idea a while ago. To handle accents, I decided to remove support for ISO encoding.