I have a function like this which im using to clean data and works correctly.
my_fun <- function (x, y){
y <- ifelse(str_detect(x, "-*\\d+\\.*\\d*"),
as.numeric(str_extract(x, "-*\\d+\\.*\\d*")),
as.numeric(y))
}
It takes numbers that have been entered in the wrong column and reassigns them to the correct column. It is used as follows to clean the y variable:
df$y <- my_fun(x, y)
I have many columns/variables (more than 10) that are paired in the same format something like this
x_vars <- c("x_1", "x_2", "x_3", "x_4", "x_5", "x_6")
y_vars <- c("y_1", "y_2", "y_3", "y_4", "y_5", "y_6")
My question is. Is there a way to apply this function across all the variables in my data set that need to be cleaned in the same way? I can easily do this in other instances where my data cleaning function has only one argument using lapply
but am struggling in this case.
I have tried mapply
but could not get it to work, this might be because I'm still quite a novice in R. Any advice would be much appreciated.
We can use
mapply/Map
. We need to extract the columns based on the column names by passing the 'x_vars', 'y_vars' as arguments toMap
, apply themy_fun
on the extracted thevector
s, and assign it back to 'y_vars' in the original datasetOr this can be also written as
NOTE: Here, we are assuming that all the elements in 'x_vars' and 'y_vars' are columns in the original dataset. We would also state that using
Map
will be much more faster and efficient than reshaping it to long and then do some conversion.To provide a different approach, we can use the
melt
fromdata.table
Then, again, we need to
dcast
it back to 'wide' format. So, it is requires more steps and not much easydata