What files to modify in ocaml/ocaml source-code, to have a parser accepting keywords in French (traduire mots clés en Français)?

Dear OCaml community,
For pedagogical purposes, I would like to try to modify a local version of the ocaml/ocaml source code, and change the parser so that the final ocaml (and hopefully ocamlc and ocamlopt) commands accept to read OCaml code where (some of) the keywords are translated in French. (Let’s just focus at French for the example, as it would easily be generalized to other latin-based languages).

For instance, I would like to be able to read, execute and compile a tiny program like this:

soit récursive factorielle (n : entier) =
    filtrer n avec
    | n quand n < 0 -> échoueavec "Erreur : n doit être >= 0 pour factorielle."
    | 0 -> 1
    | n -> n * (factorielle (n-1))

Accents are not important, if it’s hard to have them, I’ll settle down with no accents. Compare to the real (English) version:

let rec fact (n : int) =
    match n with
    | n when n < 0 -> failwith "Error: n has to be >= 0 for fact."
    | 0 -> 1
    | n -> n * (fact (n-1))

I read the parsing Hacking guide, but don’t really understand what should be modified and where. I want the parser to accept both English (real) and French (new) versions of the keywords, as I don’t want to translate anything from the codebase, I just this new parser to accept tiny files where all key-words are in French.

My main question is to know which file to modify in the parsing/ folder, and how to accept two versions of a keyword and map both to the same token ?
From what I see in the guide, I should update parser.mly, but my old memories from mly/mll files tell me that a

%token LET                    "let"
%token LET                    "soit"

will complain that LET token is not unique…

Thanks in advance!

You can modify parsing/lexer.mll returning the same token (eg LET) for both versions of the keyword. You should be able to use latin-1 accents.


1 Like

You could look at the source files of this april fools’ version of Chamelle numéro 5.

1 Like

Another point if you ever want to have a localized version of warnings and error messages, I could refresh my internationalization patch.

1 Like

Thanks for pointing out Chamelle, it seems to be exactly what I am aiming at.
I’ve failed to build it a few times, I guess it’s not easy to use it…

Thanks @nojb, I was indeed able to obtain everything I wanted, by just modifying parser/lexer.mll. It was easier than I was afraid of!
Of course some small French keywords were used in a few places in the ocaml/*.ml files as variables names, but a simple rename worked (e.g., et (and in French) can be renamed ettt).

I have published my fork on my GitHub profile, honestly it’s more for a personal use than anything else, but well it’s there: https://github.com/Naereen/ocaml-mots-cles-en-francais. A full documentation of what I did and why is included, but only in French.

Thanks for your quick reply, that was an interesting tiny experiment!

Sorry @octachron, I don’t plan on spending more time on this.

If I remember correctly, ç supports OCaml (with Rust and C/C++) and it does exactly what you want.

1 Like

For reference, the OCaml translation table for ç is here:

1 Like

Thanks a lot to both of you for mentioning ç, it’s a very interesting project. Tiny but efficient and it does the job for C, OCaml and Rust!
It’s perfect for the C language (which I wanted to target tomorrow for a similar experiment), and quite good the OCaml language.
It seems that their choice of localized keywords is not prefix and the translator program does only one pass of transformation, so it misses a keyword (and it’s not a corner case, it misses the do keyword!). I’ll try to fix it on my own before reaching to the developers of ç.

I’m a bit late to the party but I had the same idea a while ago. To handle accents, I decided to remove support for ISO encoding.

1 Like