How to use string manipulation functions inside .names argument in dplyr::across

465 Views Asked by At

Though I tried to search whether it is duplicate, but I cannot find similar question. (though a similar one is there, but that is somewhat different from my requirement)

My question is that whether we can use string manipulation function such substr or stringr::str_remove inside .names argument of dplyr::across. As a reproducible example consider this

library(dplyr)
iris %>%
  summarise(across(starts_with('Sepal'), mean, .names = '{.col}_mean'))

  Sepal.Length_mean Sepal.Width_mean
1          5.843333         3.057333

Now my problem is that I want to rename output columns say str_remove(.col, 'Sepal') so that my output column names are just Length.mean and Width.mean . Why I am asking because, the description of this argument states that

.names
A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

I have tried many possibilities including the following, but none of these work

library(tidyverse)
library(glue)
iris %>%
  summarise(across(starts_with('Sepal'), mean, 
                   .names = glue('{xx}_mean', xx = str_remove(.col, 'Sepal'))))

Error: Problem with `summarise()` input `..1`.
x argument `str` should be a character vector (or an object coercible to)
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.


#OR
iris %>%
  summarise(across(starts_with('Sepal'), mean, 
                   .names = glue('{xx}_mean', xx = str_remove(glue('{.col}'), 'Sepal'))))

I know that this can be solved by adding another step using rename_with so I am not looking after that answer.

1

There are 1 best solutions below

1
On BEST ANSWER

This works, but with probably a few caveats. You can use functions inside a glue specification, so you could clean up the strings that way. However, when I tried escaping the ".", I got an error, which I assume has something to do with how across parses the string. If you need something more dynamic, you might want to dig into the source code at that point.

In order to use the {.fn} helper, at least in conjunction with creating the glue string on the fly like this, the function needs a name; otherwise you get a number for the function's index in the .fns argument. I tested this out with a second function and using lst for automatic naming.

library(dplyr)
iris %>%
  summarise(across(starts_with('Sepal'), .fns = lst(mean, max), 
                   .names = '{stringr::str_remove(.col, "^[A-Za-z]+.")}_{.fn}'))
#>   Length_mean Length_max Width_mean Width_max
#> 1    5.843333        7.9   3.057333       4.4