I want to write a function, which will take both symbolic names of column and names passed as a variable (string).
Let me show you an example:
The data:
> ( d <- data.frame(A=1:3, B=3:1) )
A B
1 1 3
2 2 2
3 3 1
Now my function:
fn <- function(data, cols) {
return(data %>% mutate(across({{cols}}, ~. * 2)))
}
It works well for:
A) symbolic names
> d %>% fn(cols = A)
A B
1 2 3
2 4 2
3 6 1
> d %>% fn(cols = B)
A B
1 1 6
2 2 4
3 3 2
> d %>% fn(cols = c(A, B))
A B
1 2 6
2 4 4
3 6 2
B) names passed as strings
> column <- "A"
> d %>% fn(cols = column)
A B
1 2 3
2 4 2
3 6 1
> d %>% fn(cols = c("A", "B"))
A B
1 2 6
2 4 4
3 6 2
So far, so good!
Now when I provide an external vector > 1 column, it throws a warning.
> d %>% fn(cols = columns)
Note: Using an external vector in selections is ambiguous.
i Use `all_of(columns)` instead of `columns` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
A B
1 2 6
2 4 4
3 6 2
So I added the all_of function, which works well for strings:
fn <- function(data, cols) {
return(data %>% mutate(across(all_of({{cols}}), ~. * 2)))
}
> d %>% fn(cols = columns)
A B
1 2 6
2 4 4
3 6 2
but throws an error when I pass the symbolic name:
> d %>% fn(cols = A)
Error: Problem with `mutate()` input `..1`.
x object 'A' not found
i Input `..1` is `across(all_of(A), ~. * 2)`.
Run `rlang::last_error()` to see where the error occurred. > d %>% fn(cols = B)
> d %>% fn(cols = c(A, B))
Error: Problem with `mutate()` input `..1`.
x object 'A' not found
i Input `..1` is `across(all_of(c(A, B)), ~. * 2)`.
Run `rlang::last_error()` to see where the error occurred.
How to fix this, to enable both approaches and avoid the warning?
My suggestion would be to keep your original implementation and the warning that comes with it, because the situation really is ambiguous. Consider:
The users of your function can then resolve the ambiguity by using
all_of()
themselves, and you can document so in the function's help page.EDIT: While I recommend the above approach, another way is to check for the existence of the variable in the calling environment. If the variable exists, assume that it contains column names and use it in
all_of()
; otherwise, assume that the column names are provided as is: