Convert ISO 639-1 to ISO 639-2

5.4k Views Asked by At

I need to take an ISO 639-1 code such as en-GB and convert it into an ISO 639-2 code such as eng.

I have looked at the following libraries, but did not find a documented means to perform that transformation in any of them:

Have I missed something? That is - is this possible with any of these libraries?

2

There are 2 best solutions below

5
On BEST ANSWER

You can use pycountry for what you want. Do note that if you want the reverse scenario (ISO 639-2 to ISO 639-1) it may not always work because while there should always be a mapping from an ISO 639-1 language code to ISO 639-2, the reverse is not guaranteed.

import pycountry

code = 'en-GB'

# ISO 639-1 codes are always 2-letter codes, so you have to take
# the first two characters of the code

# This is a safer way to extract the country code from something
# like en-GB (thanks ivan_pozdeev)
lang_code = code[:code.index('-')] if '-' in code else code

lang = pycountry.languages.get(iso639_1_code=lang_code)
print("ISO 639-1 code: " + lang.iso639_1_code)
print("ISO 639-2 code: " + lang.iso639_2T_code)
print("ISO 639-3 code: " + lang.iso639_3_code)

The above should print out:

ISO 639-1 code: en
ISO 639-2 code: eng
ISO 639-3 code: eng
3
On

List of ISO 639-2 codes at Wikipedia has a table specifying the correspondence. Since it's not a 1-1 mapping, the conversion is not always possible.

You did miss something - it's quite possible to do the conversion with the libraries you specified.

Built-in language converters (alpha2, alpha3b, alpha3t, name, scope, type and opensubtitles):

>>> language = babelfish.Language('por', 'BR')
>>> language.alpha2
'pt'
<...>
>>> babelfish.Language.fromalpha3b('fre')
<Language [fr]>
  • langcodes is tailored for different tasks - recognizing and matching languages regardless of standards. So you can extract all the codes that are related to your initial one - to varying extents - but it will not tell you which standards they pertain to.

  • pycountry is similar to babelfish and is covered by the other answer.