One can notice that case_when
do not evaluate in the formula the subset of the tibble when condition is met, but the whole tibble, an example:
picks = c(1:3)
a = tibble(id = c(1:4),
k = NA)
a %>% mutate(
k = case_when(id %in% picks~runif(length(picks)))
)
This is clear in the error:
Error: Problem with `mutate()` column `k`. i `k = case_when(id %in% picks ~ runif(length(picks)))`. x `id %in% picks ~ runif(length(picks))` must be length 4 or one, not 3.
an alternative would be to rowwise()
or group_by(id)
but that would still be highly inefficient. I would probably still route for rowwise()
, but since I have to perform operations only on 1% of the tibble, I just want a mutate within that 1%, anything else untouched. Any suggestion to make R perform the minimal number of evaluations?
I tought about combination of filter
and join
, but, for example, that would not work for a tidygraph
object, because through filtering the nodes
, one would filtering out edges
too, so local_members
would not work anymore properly.
EDIT:
Also, in my experience, it seems that base::ifelse
is faster than dplyr::case_when
; is that expected?
dplyr::if_else
is faster thanbase::ifelse
. You also needT ~ NA_real_
in thecase_when
together withn()
: