How do I create a loop to change the text encoding of the labels in labelled variables in R

109 Views Asked by At

I have imported a stata file that is giving me some encoding problems in the value labels. On import, using labelled::lookfor for any keyword returns this error:

Error in structure(as.character(x), names = names(x)) : 
  invalid multibyte string at '<e9>bec Solidaire'

Knowing the data-set, that is almost certainly a value label in there.

How to I loop through the data-set fixing the encoding problem in the names of the value labels and then reset them. I have found a solution, I think, to fix the problematic characters, but I don't know how to replace the original names.

v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, "Bloc Qu\xe9b\xe9cois" = 3, "don't know" = 9))
x<- labelled(c(1,2,2,2,3,9,1,3,2,NA), c("Bloc Qu\xe9b\xe9cois" = 1, no = 3, "don't know" = 9))

mydat<-data.frame(v=v, x=x)

glimpse(mydat)
mydat %>% 
  map(., val_labels)
#This works individually
iconv(names(val_labels(x)), from="latin1", to="UTF-8")
#And this seems to work looping over each variable, but how to I store it?
mydat %>% 
  map(., function(x) iconv(names(val_labels(x)), from="latin1", to="UTF-8"))
1

There are 1 best solutions below

3
On

This seems to be a bit tough to do in one simple step, so here I used some helper functions

conv_names <- function(x) {
  setNames(x, iconv(names(x), from="latin1", to="UTF-8"))
}
conv_val_labels <- function(x) {
  val_labels(x) <- conv_names(val_labels(x))
  x
}

mydat <- map_dfc(mydat, conv_val_labels)

But we map the function to each column and then reassign those columns back to the data frame. Note we use map_dfc to combine the columns back into a data frame