Problem Using OPAM to Create a First Switch Installation, Linking Fails

jaxon · April 6, 2021, 10:32pm

Thanks, I didn’t knew about this error. Looks strange for a native speaker.

Any ways, I would argue that this isn’t a locale bug in Swedish per see. But an error in the program when it doesn’t set the locale to C when that is obviouslu what the program want to use in the regular expression.

This is just an example when it went wrong, if will/could go equally wrong in other locales too.

silene · April 7, 2021, 6:34am

Presumably not. As far as I know, Swedish is the only locale that is (was) marking some single characters from the A-Z range as having only secondary differences. (That said, there might also be some oddities with Hawaïan, but this locale is not supported by Glibc, so hard to tell.)

Nonetheless, I agree that this is a bug that needs to be worked around, because it will take years for a compliant Glibc to propagate to all the users (assuming it is even fixed in the first place).

jaxon · April 11, 2021, 11:51pm

I don’t fully agree with you, I think, in his.

Presumably yes, as neither you nor I do know all current and future locales (at least I have not checked all of the now existing locales). And yes, other languages has classifications for characters in the line of v and w in Swedish. Like in German with s and double s, French all versions of a, o, u and e with ```, ', ^ etc. So no, the actual bug is not in sv_SE.UTF-8, it is in the building system that OCaml and other software uses.
If they want to sort and classify characters in locale C for sorting and regular expressions, they should state that. That is what locale C is there for.

So yes, there are potential of problems in other languages.

The proper way is to actually choose the locale that the buildning system really want to use, which is C. So the bug is in OCaml (and possible many other sources too), that need to be addressed. No, I wouldn’t been able to sport this either if I would create such a building system. So I am not blaming anyone.

I have added a bug to OCaml in GitHub now.

Ocaml generat link error if compiling with LC_ALL/LC_COLLATE set to sv_SE.UTF-8 and not C #10332

In short. After lots of testing, with opam(1) and GitHub source, I has come to this minimal needed solution.

Running locale(1) will give you that LC_ALL (and thus all LC_* character classes) is set to sv_SE.UTF-8 when one have installed and uses Swedish locale (probably also Swedish in Finland).

But by running this in a bourne shell

export LC_ALL=""
export LC_COLLATE="C"

will see to that any later command in the shell and sub shells will do what the software building system expect. Sorting and classify characters as in locale C.

But anyway.
Thanks for everyone in this thread that helped me find this bug. It wouldn’t been possible without your help. Now I am able to let my students use OCaml and opam(1) as it is supposed to be used. Thanks.

silene · April 12, 2021, 3:16am

I did check all the current and futures locales. (That is why I could say that Hawaïan would potentially be a strange one, once implemented in Glibc.) Obviously, I cannot predict what would happen with languages that have not even be considered by the Unicode committee, but hopefully they will not do the same mistake again. (And the reason I am saying “mistake” here is that they decided last year that the locale would no longer impact basic character ranges in regular expressions.)

I was very careful with my wording in my origin reply. So, let me write it again: “Swedish is the only locale that is (was) marking some single characters from the A-Z range as having only secondary differences.” German and French do not.

For example, let me quote the introductory paragraph for German:

German based on EN13710 Annex E.4.
Principles from EN13710:2011-06 Annex E.4 are as follows
1). First level letters are A-Z only.
…

Swedish is the only language for which a letter in the A-Z range (namely W) was considered to not be a first-level letter. And let me stress “was”, because that was changed 15 years ago. Too bad the Glibc was never fixed.

jaxon · April 13, 2021, 11:12pm

Don’t care. The proper use is NOT to expect the locale to by pure chance give the expected result with comparing/regular expressions as is expected by locale C. The proper use is to set locale to what is wanted in this case, the locale C. Then no future language will have this problem. That is the intention of locale C, and why it even exist in the first place.

And by the way, this problem is now fixed in Ocaml, by setting LC_ALL=C in proper places.

Case is close.

Thanks for all help fixing this bug in OCaml.

Topic		Replies	Views
Alpha switch and linker errors Ecosystem opam	7	580	September 16, 2022
Initial switch creation failed: Failed to get sources of ocaml-base-compiler.5.1.1 Learning ocamlinit	3	460	December 20, 2023
Ocaml base compiler failing to build Learning opam , compiler , base , install	4	4751	April 16, 2022
Opam switch create and gcc10 Ecosystem opam	21	2572	June 11, 2020
Multicore with opam--instructions? Ecosystem opam	8	1034	October 1, 2022

Problem Using OPAM to Create a First Switch Installation, Linking Fails

Related topics