Grouped means in collapse package

198 Views Asked by At

I am trying to calculate grouped means using collapse package. Below is an example of what I am trying to achieve.

library(data.table)
library(collapse)

data_1 <- as.data.table(airquality)
var_means <- c(
  "Ozone",
  "Solar.R",
  "Wind"
)
data_1[,paste0(var_means,"_mean") := lapply(.SD,mean,na.rm = TRUE),by = .(Month)]
2

There are 2 best solutions below

1
lotus On BEST ANSWER

There are at least a couple of ways. Using the dplyr-style syntax:

library(collapse)

var_means <- c(
  "Ozone",
  "Solar.R",
  "Wind"
)

airquality |>
  fgroup_by(Month) |>
  fmutate(across(var_means, fmean, .names = TRUE)) |>
  fungroup()

Or using ftransform():

ftransform(airquality,
           fmean(
             list(
               Ozone_mean = Ozone,
               Solar.R_mean = Solar.R,
               Wind_mean = Wind
             ),
             g = Month,
             TRA = 1
           ))   

Or if you want to pass a character vector of columns you need something like:

ftransform(airquality, 
           fmean(
             do.call(list, lapply(setNames(var_means, paste0(var_means, "_mean")), as.name)),
             g = Month,
             TRA = 1
           ))
0
Sebastian On

A good answer you got by Ritchie. I would add that you can pass the function in a list to fmutate:

airquality |>
  fgroup_by(Month) |>
  fmutate(across(var_means, list(mean = fmean), .names = TRUE)) |>
  fungroup()

you could also use ftransform with compound pipes and the add_stubfunction:

library(magrittr)
airquality %>% ftransform(get_vars(., var_means) %>% fmean(Month, TRA = 1) %>% 
                          add_stub("_mean", pre = FALSE)) 

If you don't need to rename columns a simple approach would also be to use settransformv

settransformv(airquality, var_means, fmean, Month, TRA = 1, apply = FALSE)

comes very close to what you do with data.table. apply = FALSE here ensures we use fmean.data.frame applied to the whole subset of the frame, thus we only need to group once.

A final hybrid option you have is fcomputev with add_vars<- or ftransform<-, where the latter is more intelligent (i.e. it would replace columns if executed again) but the former is faster.

add_vars(airquality) <- airquality |> 
    fcomputev(var_means, fmean, Month, TRA = 1, apply = FALSE) |> 
    add_stub("_mean", pre = FALSE)