Rowwise logical operations with mutate() and filter() in R

654 Views Asked by At

I frequently have to mutate() or select() in dataframes, with conditions that apply to a subset of variables, but I cant do it consistently.

Toy example data frame:

data<-data.frame(id=c("John", "Jane", "Louis", "Mirian"),
                         a=c(FALSE, FALSE, TRUE, TRUE),
                         b=c(FALSE, NA, TRUE, NA), 
                         c=c(TRUE, FALSE, TRUE, TRUE),
                         num=1:4)

Operation 1: mutate() with rowwise operation on logical variables:
-I want to create a new column "abc_any", whith TRUE values when any from a:c are TRUE:

I usually use:

data%>%mutate(abc_any=a|b|c)

      id     a     b     c num abc_any
1   John FALSE FALSE  TRUE   1    TRUE
2   Jane FALSE    NA FALSE   2      NA
3  Louis  TRUE  TRUE  TRUE   3    TRUE
4 Mirian  TRUE    NA  TRUE   4    TRUE

But I can't do it without specifying all variables a:c in chained "|" comparisons.

I tried the following, which gave inconsistent results. Don't know why:

data%>%mutate(abc_any=Reduce("|", a:c))

      id     a     b     c num abc_any
1   John FALSE FALSE  TRUE   1    TRUE
2   Jane FALSE    NA FALSE   2    TRUE
3  Louis  TRUE  TRUE  TRUE   3    TRUE
4 Mirian  TRUE    NA  TRUE   4    TRUE

This works, but surprisingly coerces abc_any to numeric:

data%>%rowwise()%>%mutate(abc_any=Reduce("|", a:c))

# A tibble: 4 x 6
# Rowwise: 
  id     a     b     c       num abc_any
  <fct>  <lgl> <lgl> <lgl> <int>   <int>
1 John   FALSE FALSE TRUE      1       1
2 Jane   FALSE NA    FALSE     2       0
3 Louis  TRUE  TRUE  TRUE      3       1
4 Mirian TRUE  NA    TRUE      4       1

This does not work, and throws several error messages:

data%>%rowwise()%>%mutate(abc_any=apply(a:c, 1, any))

Operation #2: filter()
-I sometimes want to filter with a similar condition, but cant:

data%>%filter(a|b|c)

works alright

This and several variations (rowwise()%>%Reduce(...) and others) failed with various error messages:

data%>%filter(rowwise(Reduce("|", a:c)))

This one did not filter at all, and throws "numerical expression has 4 elements: only the first used" message:

data%>%filter(Reduce("|", a:c))

Should I necessarily create a new 'temp' column with mutate() as above, then filter?

With binary numeric variables:
-Now lets say these logical variables are coerced to numeric with data_2<-data%>%mutate(across(where(is.logical), as.numeric)):

I tried to use rowSums(), but failed too:

data_2%>%rowwise()%>%mutate(abc_any=rowSums(a:c, na.rm = TRUE))

Error: Problem with `mutate()` input `abc_any`.
x 'x' must be an array of at least two dimensions
ℹ Input `abc_any` is `rowSums(a:c, na.rm = TRUE)`.
ℹ The error occured in row 1.

What are possible workarounds for these errors?

1

There are 1 best solutions below

3
On BEST ANSWER

The Reduce should be on a list - select the columns a:c and use Reduce on it as data.frame/tibble is a list as well

library(dplyr)
data %>% 
     mutate(abc_any = Reduce("|", select(., a:c)))

Or if we need to filter

data %>% 
    filter(Reduce(`|`, select(., a:c)))

Or another option is if_any

data %>%
   filter(if_any(a:c))
#      id     a     b    c num
#1   John FALSE FALSE TRUE   1
#2  Louis  TRUE  TRUE TRUE   3
#3 Mirian  TRUE    NA TRUE   4

Or use reduce from purrr

library(purrr)
data %>%
      mutate(abc_any = select(., a:c) %>%
        reduce(`|`))

Or the code with rowSums

data %>%
   mutate(across(where(is.logical), as.numeric)) %>%
   mutate(abc_any = rowSums(select(., a:c), na.rm = TRUE) > 0)