I don’t fully agree with you, I think, in his.
Presumably yes, as neither you nor I do know all current and future locales (at least I have not checked all of the now existing locales). And yes, other languages has classifications for characters in the line of v
and w
in Swedish. Like in German with s and double s, French all versions of a, o, u and e with ```, '
, ^
etc. So no, the actual bug is not in sv_SE.UTF-8
, it is in the building system that OCaml and other software uses.
If they want to sort and classify characters in locale C
for sorting and regular expressions, they should state that. That is what locale C
is there for.
So yes, there are potential of problems in other languages.
The proper way is to actually choose the locale that the buildning system really want to use, which is C
. So the bug is in OCaml (and possible many other sources too), that need to be addressed. No, I wouldn’t been able to sport this either if I would create such a building system. So I am not blaming anyone.
I have added a bug to OCaml in GitHub now.
Ocaml generat link error if compiling with LC_ALL/LC_COLLATE set to sv_SE.UTF-8 and not C #10332
In short. After lots of testing, with opam
(1) and GitHub source, I has come to this minimal needed solution.
Running locale(1) will give you that LC_ALL
(and thus all LC_*
character classes) is set to sv_SE.UTF-8
when one have installed and uses Swedish locale (probably also Swedish in Finland).
But by running this in a bourne shell
export LC_ALL=""
export LC_COLLATE="C"
will see to that any later command in the shell and sub shells will do what the software building system expect. Sorting and classify characters as in locale C
.
But anyway.
Thanks for everyone in this thread that helped me find this bug. It wouldn’t been possible without your help. Now I am able to let my students use OCaml and opam
(1) as it is supposed to be used. Thanks.