Umlaute (ö, ä, ü) in haven::read_spss()

101 Views Asked by At

I am trying to import a german SPSS data file (*.sav) using haven::read_spss(), but there seems to be a problem concerning "Umlaute".

dat <- haven::read_sav("file.sav")
dat %>% group_by(variable)

leads to the error message Error in gsub("^\\s+|\\s+$", "", y) : input string 39 is invalid

When I opened the dataset in SPSS I found out that there was a german "ü" in the variable.

What can I do in order to read the file in correctly? (correcting the umlauts in the dataset is not really an option as the dataset is too big)

1

There are 1 best solutions below

0
On

If the issue is really coming from the Umlaute you could specify the encoding of the file.

E.g. I generated an example file containing umlaute and tried to read it in as latin1 and UTF-8.

library("haven")
dat <- mtcars
dat$contains_umlaute <- "äüö"
write_sav(dat, "dat.sav")

#does not work
dat_r <- read_sav("dat.sav", encoding = "latin1")

#works
dat_r <- read_sav("dat.sav", encoding = "UTF-8")

Now in order to get the encoding of your file.sav in order to specify the correct encoding you can open it in notepad++ and look for the encoding:

enter image description here