Is there a list of BCP 47 language codes in R?

2.7k Views Asked by At

I'm running the fantastic pandoc from within an R package, relying on the LaTeX babel package for some typesetting niceties.

Pandoc expects a lang argument as a BCP 47 code (e.g. en-US), but babel expects its own language codes (e.g. american).

Pandoc, being as awesome as it is, maps between the two in this haskell script.

In the spirit of defensive programming, I'd like to warn my users when they're using a wrong language code, and give them a definitive list of such acceptable BCP 47 codes.

Does such a list (or vector, or whatever) exist somewhere in R or a package for convenient use?

I'm trying to avoid manually typing up the pandoc haskell script.

2

There are 2 best solutions below

0
On BEST ANSWER

I needed to provide a convenient selectize input, so I had to have the available options ready in R and ended up hand-copying them (yikes).

In case anyone finds this useful, here are: - language codes in short (lang_short), - variant or locale (var_short), - a longer version of the language (helpful for input) lang_long (possibly non-standard!), - a longer version of the variant or locale (helpful for input) var_long (probably non-standard!), - logical values for polyglossia and babel, indicating whether pandoc maps to one or both of these (might come in handy if you need to rely on only one of these LaTeX packages.

Remember that pandoc expects languages of the form en_US etc., so you need to paste column 1 and 2.

Remember that these are not all languages and variants under the BCP 47 standard; it's just the (small) subset mapped by pandoc.

(If anyone comes across a more definitive list of language codes in R, that would be great).


lang_short;var_short;lang_long;var_long;polyglossia;babel
ar;DZ;arabic;Algeria;TRUE;FALSE
ar;IQ;arabic;Iraq;TRUE;FALSE
ar;JO;arabic;Jordan;TRUE;FALSE
ar;LB;arabic;Lebanon;TRUE;FALSE
ar;LY;arabic;Libya;TRUE;FALSE
ar;MA;arabic;Morocco;TRUE;FALSE
ar;MR;arabic;Mauritania;TRUE;FALSE
ar;PS;arabic;Palestinian Territory;TRUE;FALSE
ar;SY;arabic;Syria;TRUE;FALSE
ar;TN;arabic;Tunisia;TRUE;FALSE
de;DE;german;;TRUE;TRUE
de;AT;german;Austria;TRUE;TRUE
de;CH;german;Switzerland;TRUE;TRUE
dsb;;lower sorbian;;TRUE;FALSE
hsb;;upper sorbian;;FALSE;TRUE
el;polyton;greek;polytonic;TRUE;TRUE
en;AU;english;Australia;TRUE;TRUE
en;CA;english;Canada;TRUE;TRUE
en;GB;english;Great Britain;TRUE;TRUE
en;NZ;english;New Zealand;TRUE;TRUE
en;UK;english;United Kingdom;TRUE;TRUE
en;US;english;United States;TRUE;TRUE
grc;ancient;greek;ancient;TRUE;TRUE
la;;latin;;TRUE;TRUE
sl;;slovenian;;TRUE;TRUE
fr;CA;french;Canada;FALSE;TRUE
pt;BR;portoguese;Brazil;TRUE;TRUE
sr;;serbian;;TRUE;TRUE
af;;afrikaans;;TRUE;TRUE
am;;amharic;;TRUE;TRUE
ar;;arabic;;TRUE;TRUE
as;;assamese;;TRUE;TRUE
ast;;asturian;;TRUE;TRUE
bg;;bulgarian;;TRUE;TRUE
bn;;bengali;;TRUE;TRUE
bo;;tibetan;;TRUE;TRUE
br;;breton;;TRUE;TRUE
ca;;catalan;;TRUE;TRUE
cy;;welsh;;TRUE;TRUE
cs;;czech;;TRUE;TRUE
cop;;coptic;;TRUE;TRUE
da;;danish;;TRUE;TRUE
dv;;divehi;;TRUE;TRUE
el;;greek;;TRUE;TRUE
en;;english;;TRUE;TRUE
eo;;esperanto;;TRUE;TRUE
es;;spanish;;TRUE;TRUE
et;;estonian;;TRUE;TRUE
eu;;basque;;TRUE;TRUE
fa;;farsi;;TRUE;TRUE
fr;;french;;TRUE;TRUE
fur;;friulan;;TRUE;TRUE
ga;;irish;;TRUE;TRUE
gd;;scottish;;TRUE;TRUE
gez;;ethiopic;;TRUE;TRUE
gl;;galician;;TRUE;TRUE
he;;hebrew;;TRUE;TRUE
hi;;hindi;;TRUE;TRUE
hr;;croatian;;TRUE;TRUE
hu;;magyar;;TRUE;TRUE
hy;;armenian;;TRUE;TRUE
ia;;interlingua;;TRUE;TRUE
id;;indonesian;;TRUE;TRUE
is;;icelandic;;TRUE;TRUE
it;;italian;;TRUE;TRUE
km;;khmer;;TRUE;TRUE
kmr;;kurmanji;;TRUE;TRUE
kn;;kannada;;TRUE;TRUE
ko;;korean;;TRUE;TRUE
lo;;lao;;TRUE;TRUE
lt;;lithuanian;;TRUE;TRUE
lv;;latvian;;TRUE;TRUE
ml;;malayalam;;TRUE;TRUE
mn;;mongolian;;TRUE;TRUE
mr;;marathi;;TRUE;TRUE
nb;;norsk;;TRUE;TRUE
nl;;dutch;;TRUE;TRUE
nn;;nynorsk;;TRUE;TRUE
no;;norsk;;TRUE;TRUE
nqo;;nko;;TRUE;TRUE
oc;;occitan;;TRUE;TRUE
pa;;panjabi;;TRUE;TRUE
pms;;piedmontese;;TRUE;TRUE
pt;;portoguese;;TRUE;TRUE
rm;;romanian;;TRUE;TRUE
ro;;russian;;TRUE;TRUE
sa;;sanskrit;;TRUE;TRUE
se;;samin;;TRUE;TRUE
sk;;slovak;;TRUE;TRUE
sq;;albanian;;TRUE;TRUE
sr;;serbian;;TRUE;TRUE
syr;;syriac;;TRUE;TRUE
ta;;tamil;;TRUE;TRUE
te;;telugu;;TRUE;TRUE
th;;thai;;TRUE;TRUE
ti;;ethiopic;;TRUE;TRUE
tk;;turkmen;;TRUE;TRUE
tr;;turkish;;TRUE;TRUE
uk;;ukrainian;;TRUE;TRUE
ur;;urdu;;TRUE;TRUE
vi;;vietnamese;;TRUE;TRUE
0
On

Since the R script doesn't have access to the Haskell code (which is run in its own process), that won't be possible. However, pandoc >2.0 emits warnings to STDERR, in this case:

$ echo "foo" | pandoc -M lang=asdf -t latex
[WARNING] Invalid 'lang' value 'asdf'.
  Use an IETF language tag like 'en-US'.

There should be a way to catch this from R.