Convert country names into their native language using R?

183 Views Asked by At

Using spData, I'm able to get the world's country names - but only in English - e.g. Germany, China, Morocco.

I want to get those names in the country's own language - e.g. Deutschland, 中国, المملكة المغربية .

I know I can get their ISO code from spData (DE, CN, MA), but I can't find a package which will convert those to a country name in that country's script.

Packages like countrycode can convert an ISO code to a country name using countryname("China", destination = 'cldr.short.zh') - but that requires knowing that the language code for China is zh.

Is there any way to go from CN → 中国, DE → Deutschland etc?


There are 2 best solutions below


Using the link suggested by @zx8754 in the comments, you could determine the appropriate destination code and apply it per row. Note that the link uses zh_hans while the cldr destination codes uses yue_hans to specify the language for China, so that had to be manually swapped. This could be simplified, but showing the steps for clarity.


country_lang_url <- ""

iso2c_lang <- 
  read_html(country_lang_url) %>% 
  html_element(css = "table.wikitable") %>% 
  html_table() %>% 
  separate_wider_delim(cols = 4, delim = ",", names = "lang", too_many = "drop") %>% 
  select(iso2c = 1, lang)

data.frame(iso2c = spData::world$iso_a2) %>% 
  filter(iso2c %in% countrycode::codelist$iso2c) %>% 
  left_join(iso2c_lang, by = join_by(iso2c)) %>% 
  mutate(lang = sub("-", "_", lang)) %>% 
  mutate(lang = if_else(lang == "zh_hans", "yue_hans", lang)) %>% 
  mutate(destination_code = paste0("", lang)) %>% 
  mutate(destination_code = if_else(destination_code %in% cldr_examples$Code, destination_code, "")) %>% 
  mutate(country_name_en = countrycode(iso2c, "iso2c", "")) %>% 
  rowwise() %>% 
  mutate(country_name = countrycode(iso2c, "iso2c", destination_code)) %>% 
  filter(country_name_en %in% c("Germany", "China", "Armenia", "Sri Lanka", "Morocco", "Russia", "Thailand", "United States"))
#> # A tibble: 8 × 5
#> # Rowwise: 
#>   iso2c lang     destination_code   country_name_en country_name  
#>   <chr> <chr>    <chr>              <chr>           <chr>         
#> 1 US    en       United States   United States 
#> 2 RU    ru       Russia          Россия        
#> 3 TH    th       Thailand        ไทย           
#> 4 AM    hy       Armenia         Հայաստան      
#> 5 DE    de       Germany         Deutschland   
#> 6 LK    si       Sri Lanka       ශ්‍රී ලංකාව      
#> 7 CN    yue_hans China           中华人民共和国
#> 8 MA    ar       Morocco         المغرب

You may use the Wikidata database through the API. The WikidataR package is very helpful for this.

With below example :

  • get_native_name("Belgium") will return "België" "Belgique" "Belgien"
  • get_native_name("Poland") will return "Polska" "Польшча"
  • get_native_name("China") will return"中华人民共和国"
#' get_native_name
#' find name of a country in country's original language(s) based on Wikidata
#' @param name character : the name of the country
#' @param name language character : the current language of the name of the country
#' @param name limit integer : max number of items returned by wikidata
#' @import WikidataR
#' @return list of names

get_native_name <- function(name, language = "en", limit = 10) {
  # getting all Wikidata items matching the name
  lookup <- WikidataR::find_item(name, language, limit)
  # getting ids of those items
  id_list <- sapply(lookup, function(x) x$id)
  # initialising name list
  names_ <- c()
  # looping through id list
  for(id_ in id_list) {
    i_ <- WikidataR::get_item(id_)
    # checking if item is a country ('P31' property set to 'Q6256')
    if("Q6256" %in% i_[[1]]$claims[["P31"]]$mainsnak$datavalue$value$id) {
      # get the property "name in original language" if 'P1705' is filled
      on_ <- i_[[1]]$claims[["P1705"]]$mainsnak$datavalue$value$id
      if(!is.null(on_))  names_<- c(names_, on_) else {
        # getting official language(s)
        l_ <- i_[[1]]$claims[["P37"]]$mainsnak$datavalue$value$id
        # getting current name of the country for each official language
        for(lo_id_ in l_) {
          # getting iso denomination of the language (using iso-639-1 - 'P218')
          iso_ <- WikidataR::get_item(lo_id_)[[1]]$claims[["P218"]]$mainsnak$datavalue$value
          # adding name to the list 
          if(!is.null(iso_)) names_<- c(names_, i_[[1]]$labels[[iso_]]$value)