[ANN] iso639 - language codes

This is a new package which provides types which enumerate human languages and language groups according to the ISO-369 standard. The standard has different parts depending on whether one is dealing with individual and macro languages or groups and families of languages, whether one uses two- or three-letter codes, and some historic quirks. See the project page and the API reference for further details.

This library is rather mundane, but I think it can help software dealing with multiple languages to make sure a language code is valid, and to make sure different language codes for the same language maps to the same language. I think the API is more or less in a final form unless there are usability issues which need to be addressed.

4 Likes

When I saw this announce I thought “this could be a nice target for some Crowbar fuzz-testing”. I went to look at the testing code and it turns out that you exhaustively test your properties of interest on all ascii three-letter codes (there are not that many).


  for i0 = 0 to 255 do
    let ch0 = Char.chr i0 in
    for i1 = 0 to 255 do
      let ch1 = Char.chr i1 in
      let alpha2 = sprintf "%c%c" ch0 ch1 in
      check_alpha2 p1_count alpha2;
      for i2 = 0 to 255 do
        let ch2 = Char.chr i2 in
        let alpha3 = sprintf "%c%c%c" ch0 ch1 ch2 in
        check_alpha3 p2_count p3_count p5_count alpha3
      done
    done
done;

Well done.

1 Like

Random testing was also my first thought. Crowbar has been on my agenda to try out. I usually write random tests by hand, but there are several good library available now. I recently used qcheck which turned out to be a perfect fit for library which is “algebraic” in nature.

1 Like

Nice library, I will be able to improve this with it! :slight_smile:

Note that this does raise the question of languages without tags. But I can probably live without supporting Toki Pona.

The library does not include the English end French names of languages, and there is no plan to add them, as this would significantly increase the size of the library.

Why not include that as a sublibrary? I wouldn’t bloat the size of the core library. There are many cases where it would be very useful, and you already have the infrastructure to auto-generate the database.

Yes, maybe. I also though it was someone out of scope to just supply translations for two languages for a library which would typically be used in an localized manner. But yes, I have the infrastructure, if it is useful I could make a sublibrary.

Ah, now we know which software they were running in Little Britain.

And languages which don’t have the word “no”? Well, seems you included most of them.

Those languages don’t have a unique word for yes/no or do not answer questions by yes/no, but they definitely have a translation for “strong negative affirmation” (ie., NOOO!) which is what I was going for. I’m happy to get PR with corrections! I also need to do a positive follow up. :wink: