Tidyverse Rowwise sum of columns that may or may not exist

1k Views Asked by At

Consider the following tibble:

library(tidyverse)
data <- tibble(x = c(rnorm(5,2,n = 10)*1000,NA,1000),
               y = c(rnorm(1,1,n = 10)*1000,NA,NA))

Suppose I want to make a row-wise sum of "x" and "y", creating variable "z", like this:

data %>%
  rowwise() %>%
  mutate(z = sum(c(x,y), na.rm = T))

This works fine for what I want, but the problem is that my true dataset has many variables and I did not want to check before what variables I have and what I do not have. So, suppose I may have variables that do not exist among the elements of the sum:

data %>%
  rowwise() %>%
  mutate(k = sum(c(x,y,w), na.rm = T))

In this case, it will not run, because column "w" does not exist. How can I make it run anyway, ignoring the non-existence of "w" and summing over "x" and "y"?

PS: I prefer to do it without filtering the dataset before running the sum. I would like to somehow make the sum happen in any case, whether variables exist or not.

2

There are 2 best solutions below

3
On BEST ANSWER

if I understood your problem correctly this would be a solution (slight modification of @Duck's comment:

library(tidyverse)

data <- tibble(x = c(rnorm(5,2,n = 10)*1000,NA,1000),
               y = c(rnorm(1,1,n = 10)*1000,NA,NA),
               a = c(rnorm(1,1,n = 10)*1000,NA,NA))


wishlist <- c("x","y","w")


data %>% 
  dplyr::rowwise() %>% 
  dplyr::mutate(Sum=sum(c_across(colnames(data)[colnames(data) %in% wishlist]),na.rm=T))

       x      y       a   Sum
   <dbl>  <dbl>   <dbl> <dbl>
 1 3496.   439.   -47.7 3935.
 2 6046.   460.  2419.  6506.
 3 6364.   672.  1030.  7036.
 4 1068.  1282.  2811.  2350.
 5 2455.   990.   689.  3445.
 6 6477.  -612. -1509.  5865.
 7 7623.  1554.  2828.  9177.
 8 5120.   482.  -765.  5602.
 9 1547.  1328.   817.  2875.
10 5602. -1019.   695.  4582.
11   NA     NA     NA      0 
12 1000     NA     NA   1000 
0
On

Try this:

library(tidyverse)
data <- tibble(x = c(rnorm(5,2,n = 10)*1000,NA,1000),
               y = c(rnorm(1,1,n = 10)*1000,NA,NA))

data$k <- rowSums(as.data.frame(data[,which(c("x","y","w")%in%names(data))]),na.rm=TRUE)

Output:

# A tibble: 12 x 3
       x     y     k
   <dbl> <dbl> <dbl>
 1 3121.  934. 4055.
 2 6523. 1477. 8000.
 3 5538.  863. 6401.
 4 3099. 1344. 4443.
 5 4241.  284. 4525.
 6 3251. -448. 2803.
 7 4786. -291. 4495.
 8 4378.  910. 5288.
 9 5342.  653. 5996.
10 4772. 1818. 6590.
11   NA    NA     0 
12 1000    NA  1000