I have a function like this which im using to clean data and works correctly.
my_fun <- function (x, y){
y <- ifelse(str_detect(x, "-*\\d+\\.*\\d*"),
as.numeric(str_extract(x, "-*\\d+\\.*\\d*")),
as.numeric(y))
}
It takes numbers that have been entered in the wrong column and reassigns them to the correct column. It is used as follows to clean the y variable:
df$y <- my_fun(x, y)
I have many columns/variables (more than 10) that are paired in the same format something like this
x_vars <- c("x_1", "x_2", "x_3", "x_4", "x_5", "x_6")
y_vars <- c("y_1", "y_2", "y_3", "y_4", "y_5", "y_6")
My question is. Is there a way to apply this function across all the variables in my data set that need to be cleaned in the same way? I can easily do this in other instances where my data cleaning function has only one argument using lapply but am struggling in this case.
I have tried mapply but could not get it to work, this might be because I'm still quite a novice in R. Any advice would be much appreciated.
We can use
mapply/Map. We need to extract the columns based on the column names by passing the 'x_vars', 'y_vars' as arguments toMap, apply themy_funon the extracted thevectors, and assign it back to 'y_vars' in the original datasetOr this can be also written as
NOTE: Here, we are assuming that all the elements in 'x_vars' and 'y_vars' are columns in the original dataset. We would also state that using
Mapwill be much more faster and efficient than reshaping it to long and then do some conversion.To provide a different approach, we can use the
meltfromdata.tableThen, again, we need to
dcastit back to 'wide' format. So, it is requires more steps and not much easydata