I am creating dummy variables where missing values are 1 and non-missing values are 0. The missing values are NA
, i.e.:
NA
NA
Positive
NA
Negative
My code for one variable at a time successfully created the dummy variable:
library(dplyr)
#create new dummy variable
df <- mutate(df, newvar = ifelse(is.na(var1), 1,0))
#check
sum(df$newvar == 1)
I have 4 string variables and want to create a new dummy variable where missing values in any of the variables are 1, and non-missing values are 0. I tried reusing the above code:
mylist <- c("var1", "var2", "var3", "var4")
for(i in mylist){
df <- mutate(df, newvar = ifelse(is.na(i), 1,0))
}
I know that I am incorrectly using the for
loop, but is this the correct approach, or should I be doing something different?
We can use
mutate
withacross
if we have an earlier version, use
mutate_at
If we need to create a new column that flags if there are any missing value in those columns in the 'mylist'