How to speed up case_when/conditional mutate?

1k Views Asked by At

One can notice that case_when do not evaluate in the formula the subset of the tibble when condition is met, but the whole tibble, an example:

picks = c(1:3)

a = tibble(id = c(1:4),
           k = NA)

a %>% mutate(
  k = case_when(id %in% picks~runif(length(picks)))
)

This is clear in the error:

Error: Problem with `mutate()` column `k`. i `k = case_when(id %in% picks ~ runif(length(picks)))`. x `id %in% picks ~ runif(length(picks))` must be length 4 or one, not 3.

an alternative would be to rowwise() or group_by(id) but that would still be highly inefficient. I would probably still route for rowwise(), but since I have to perform operations only on 1% of the tibble, I just want a mutate within that 1%, anything else untouched. Any suggestion to make R perform the minimal number of evaluations?

I tought about combination of filter and join, but, for example, that would not work for a tidygraph object, because through filtering the nodes, one would filtering out edges too, so local_members would not work anymore properly.

EDIT:

Also, in my experience, it seems that base::ifelse is faster than dplyr::case_when; is that expected?

1

There are 1 best solutions below

0
On

dplyr::if_else is faster than base::ifelse. You also need T ~ NA_real_ in the case_when together with n():

picks = c(1:3)

a = tibble(id = c(1:4),
           k = NA)

a %>% mutate(
    k = case_when(
        id %in% picks~runif(n()),
        T ~ NA_real_
    )
)
# A tibble: 4 x 2
     id       k
  <int>   <dbl>
1     1  0.0757
2     2  0.708 
3     3  0.255 
4     4 NA  
# Would be faster with if_else:
a %>% mutate(
    k = if_else(id %in% picks,
        runif(n()),
        NA_real_
    )
)