Create an indicator variable in SparklyR when all the variables are missing

130 Views Asked by At

I am trying to use rowSum in sparklyr to create an indicator variable where all the variables are missing but it seems that rowSum doesn't work in sparklyr.

I have to write the name of all the variables in is.na() function like below which is impossible since I have 100 variables.

y <- c(NA,1,2)
x <- c(NA,NA,3)
z <- c(NA,NA,NA)
dt = data.frame(x,y,z)

sdf_copy_to(sc, dt)

dt %>% 
 mutate(new = ifelse(is.na(x) & is.na(y) & is.na(z), 1,0))

Is there anyway to write multiple variables in is.na() function?

1

There are 1 best solutions below

1
On BEST ANSWER
library(rlang)
library(glue)
  1. create a string with all the variable names of interest. I am calling all of them for simplicity; use regex (e.g., grep) otherwise

    cols_of_interest <- names(dt)
    
    
    test_string <-  glue("ifelse({glue('is.na({cols_of_interest})') %>% 
    glue_collapse(sep = '&')}, yes = 1, no = 0)")
    
  2. parse the string with rlang

    dt %>% mutate(flag = !!rlang::parse_expr(test_string))