I am trying to use rowSum in sparklyr to create an indicator variable where all the variables are missing but it seems that rowSum doesn't work in sparklyr.
I have to write the name of all the variables in is.na() function like below which is impossible since I have 100 variables.
y <- c(NA,1,2)
x <- c(NA,NA,3)
z <- c(NA,NA,NA)
dt = data.frame(x,y,z)
sdf_copy_to(sc, dt)
dt %>%
mutate(new = ifelse(is.na(x) & is.na(y) & is.na(z), 1,0))
Is there anyway to write multiple variables in is.na() function?
create a string with all the variable names of interest. I am calling all of them for simplicity; use regex (e.g.,
grep
) otherwiseparse the string with
rlang