I frequently have to mutate()
or select()
in dataframes, with conditions that apply to a subset of variables, but I cant do it consistently.
Toy example data frame:
data<-data.frame(id=c("John", "Jane", "Louis", "Mirian"),
a=c(FALSE, FALSE, TRUE, TRUE),
b=c(FALSE, NA, TRUE, NA),
c=c(TRUE, FALSE, TRUE, TRUE),
num=1:4)
Operation 1: mutate() with rowwise operation on logical variables:
-I want to create a new column "abc_any", whith TRUE values when any from a:c are TRUE:
I usually use:
data%>%mutate(abc_any=a|b|c)
id a b c num abc_any
1 John FALSE FALSE TRUE 1 TRUE
2 Jane FALSE NA FALSE 2 NA
3 Louis TRUE TRUE TRUE 3 TRUE
4 Mirian TRUE NA TRUE 4 TRUE
But I can't do it without specifying all variables a:c in chained "|" comparisons.
I tried the following, which gave inconsistent results. Don't know why:
data%>%mutate(abc_any=Reduce("|", a:c))
id a b c num abc_any
1 John FALSE FALSE TRUE 1 TRUE
2 Jane FALSE NA FALSE 2 TRUE
3 Louis TRUE TRUE TRUE 3 TRUE
4 Mirian TRUE NA TRUE 4 TRUE
This works, but surprisingly coerces abc_any to numeric:
data%>%rowwise()%>%mutate(abc_any=Reduce("|", a:c))
# A tibble: 4 x 6
# Rowwise:
id a b c num abc_any
<fct> <lgl> <lgl> <lgl> <int> <int>
1 John FALSE FALSE TRUE 1 1
2 Jane FALSE NA FALSE 2 0
3 Louis TRUE TRUE TRUE 3 1
4 Mirian TRUE NA TRUE 4 1
This does not work, and throws several error messages:
data%>%rowwise()%>%mutate(abc_any=apply(a:c, 1, any))
Operation #2: filter()
-I sometimes want to filter with a similar condition, but cant:
data%>%filter(a|b|c)
works alright
This and several variations (rowwise()%>%Reduce(...)
and others) failed with various error messages:
data%>%filter(rowwise(Reduce("|", a:c)))
This one did not filter at all, and throws "numerical expression has 4 elements: only the first used" message:
data%>%filter(Reduce("|", a:c))
Should I necessarily create a new 'temp' column with mutate() as above, then filter?
With binary numeric variables:
-Now lets say these logical variables are coerced to numeric with data_2<-data%>%mutate(across(where(is.logical), as.numeric))
:
I tried to use rowSums()
, but failed too:
data_2%>%rowwise()%>%mutate(abc_any=rowSums(a:c, na.rm = TRUE))
Error: Problem with `mutate()` input `abc_any`.
x 'x' must be an array of at least two dimensions
ℹ Input `abc_any` is `rowSums(a:c, na.rm = TRUE)`.
ℹ The error occured in row 1.
What are possible workarounds for these errors?
The
Reduce
should be on alist
-select
the columnsa:c
and useReduce
on it asdata.frame/tibble
is alist
as wellOr if we need to
filter
Or another option is
if_any
Or use
reduce
frompurrr
Or the code with
rowSums