I am trying to import a german SPSS data file (*.sav) using haven::read_spss()
, but there seems to be a problem concerning "Umlaute".
dat <- haven::read_sav("file.sav")
dat %>% group_by(variable)
leads to the error message Error in gsub("^\\s+|\\s+$", "", y) : input string 39 is invalid
When I opened the dataset in SPSS I found out that there was a german "ü" in the variable.
What can I do in order to read the file in correctly? (correcting the umlauts in the dataset is not really an option as the dataset is too big)
If the issue is really coming from the Umlaute you could specify the encoding of the file.
E.g. I generated an example file containing umlaute and tried to read it in as
latin1
andUTF-8
.Now in order to get the encoding of your
file.sav
in order to specify the correct encoding you can open it in notepad++ and look for the encoding: