Combining across and filter in groups

80 Views Asked by At

I'd like to filter just only x1,x2, and x3 values with the distance between the 5th and 95th quantiles by groups (id). But I don't have success in combining across with my variables (x1,x2, and x3), in my example:

library(dplyr)

data <- tibble::tibble(id= paste0(rep("sample_",length(100)),rep(1:10,10)),x1 = rnorm(100),x2 = rnorm(100),x3 = rnorm(100))

data %>%
  group_by(id) %>%
  dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05) 
                x < quantile(x, 0.95)))
#Error: Problem with `filter()` input `..1`.
#i Input `..1` is `across(...)`.
#i The error occurred in group 1: id = "sample_1".
2

There are 2 best solutions below

0
On BEST ANSWER

Your function will run if you change the code to use & ("AND") between each condition.

data %>%
  group_by(id) %>%
  dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05) & x < quantile(x, 0.95)))

You can also shorten the code with:

data %>%
  group_by(id) %>%
  filter(across(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))

However, I think filter is intended to be used with either if_all or if_any (introduced in dplyr 1.0.4; see here), depending on whether you want all selected columns or any selected column to fulfill the condition.

For example:

data %>%
  group_by(id) %>%
  filter(if_all(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))

data %>%
  group_by(id) %>%
  filter(if_any(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))

In your case, if_all and across give the same results, but I'm not sure if across is guaranteed to always behave the same as if_all.

0
On

You forgot & between the two conditions:

library(dplyr)

data <- tibble::tibble(id= paste0(rep("sample_",length(100)),rep(1:10,10)),x1 = rnorm(100),x2 = rnorm(100),x3 = rnorm(100))

data %>%
  group_by(id) %>%
  dplyr::filter(across(.cols = x1:x3, function(x) x > quantile(x, 0.05) & 
                       x < quantile(x, 0.95)))

   id            x1      x2      x3
   <chr>      <dbl>   <dbl>   <dbl>
 1 sample_2 -0.0222 -1.17   -0.634 
 2 sample_4 -0.584   0.400  -1.01  
 3 sample_8 -0.462  -0.890   0.851 
 4 sample_1  1.39   -0.0418 -1.31  
 5 sample_2 -0.446   1.61   -0.0368
 6 sample_3  0.617  -0.148  -0.358 
 7 sample_4 -1.20    0.340   0.0903
 8 sample_6 -0.538  -1.10   -0.387 
 9 sample_9 -0.680   0.195  -1.51  
10 sample_5 -0.779   0.419   0.720