How to get R to read all the other human languages?

602 Views Asked by At

Can someone tell me how to get R to display normally all human languages. My problem is that I have a dataframe with news article headlines that are written in all languages in the EU. Poor database design comments to the side, how can I get R to show each row in its respective language?

I read this R bloggers post and it makes sense when changing the Sys.setlocale to one of the languages, but then the last command executed is the one that counts. Separating the database manually into each language bin and running the script for each language is a possibility, but I would rather not do it.

Gratitude!

Edit:

Link to base .xls document

R code to import:

 library(data.table)
library(XLConnect)
library(stringr)
library(stringi)
library(dplyr)

#load .xls
wb <- loadWorkbook('D:/MOMUT1/GIS_Workload/Other/alex/Book2_1.xls')
df <- readWorksheet(wb, 1, header = TRUE)

#remove NAs
df_final <- subset(df, !is.na(df$HEADLINE))

#take out HEADLINE column to work on
head_col <- data.table(df_final$HEADLINE)

Running on: Windows 10 Pro 1803 64bit RStudio 3.4.4

1

There are 1 best solutions below

1
On BEST ANSWER

One solution when dealing with multiple languages is to run R in Linux, where UTF-8 is the standard encoding. Since you're on Win 10 Pro, you can do this in the Windows Subsystem for Linux without actually having to install an OS from scratch.

  1. Install WSL: https://learn.microsoft.com/en-us/windows/wsl/install-win10 (Ubuntu is probably the best choice of distro)
  2. Install R: http://sites.psu.edu/theubunturblog/installing-r-in-ubuntu/
  3. Install any packages you need via install.packages. You may have to install system library dependencies yourself.
  4. Run your analysis.

Caveat: I haven't actually tried this. Also, you'll be running R from the commandline rather than with RStudio.