dplyr dynamically create lag and ma features

Question

dplyr dynamically create lag and ma features

580 Views Asked by takmers At 07 June 2025 at 09:52

I am trying to create a process that takes in a dataframe and creates additional lagged and rolling window features (e.g. moving average). This is what I have so far.

# dummy dataframe
n <- 20
set.seed(123)
foo <- data.frame(
  date = seq(as.Date('2020-01-01'),length.out = n, by = 'day'),
  var1 = sample.int(n),
  var2 = sample.int(n))

# creates lags and based on (some of) them creates rolling average features
foo %>% 
  mutate_at(vars(starts_with('var')),
            funs(lag_1 = lag(.), lag_2 = lag(.,2))) %>% 
  mutate_at(vars(contains('lag_1')),
            funs(ra_3 = rollmean(., k = 3, align = 'right', fill = NA)))

The above chunk :

creates lag01,lag02 features considering the selected variables
based on a subset of the newly created columns, creates rolling average features

What I am now looking for, is to create an arbitrary number of lagged features (e.g. lag3,lag6,lag9 so on) as well as create an arbitrary number of rolling average features (of different window length - i.e. var1_lag_1_ra_3, var1_lag_1_ra_6, var2_lag_1_ra_3, var2_lag_1_ra_6. At the moment the settings to generate such features are hardcoded. Ideally I would have couple of vectors to adjust the outcome; like so:

lag_features <- c(3,6,9)
ma_features <- c(12,15)

Lastly, it would be quite nice if there was a way to configure the names of the generated features in a dynamic manner. I 've seen {{}},!!,:= operators, but I am not really in a position to tell the difference or how to use them.

I have also implemented the above using some readily available functions from the timetk package, but since I am looking for some additional flexibility, I was wondering how I could replicate such behavior myself.

library(timetk)
foo %>% 
  select(date,starts_with('var')) %>%
  tk_augment_lags(.value = starts_with("var"),
                  .lags = 1) %>% 
  tk_augment_slidify(.value   = ends_with("lag1"),
                     .period  = seq(0,24,3)[-1],
                     .f       = mean,
                     .align   = 'right', 
                     .partial = TRUE
  )

Any support would be really appreciated.

Original Q&A

There are 1 best solutions below

**Ronak Shah** · Answer 1

You can use the map function to get the lagged value for variable numbers. We can use the .names argument in across to provide names to new columns.

library(dplyr)
library(purrr)
library(zoo)

lag_features <- c(3,6,9)
ma_features <- c(12,15)

foo <- bind_cols(foo, map_dfc(lag_features, ~foo %>% 
                         transmute(across(starts_with('var'), 
                                          lag, .x, .names = '{col}_lag{.x}'))),
                map_dfc(ma_features, ~foo %>%
                        transmute(across(contains('lag3'), rollmeanr, k = .x, 
                             fill = NA, .names = '{col}_{.x}'))))

dplyr dynamically create lag and ma features

There are 1 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in TIME-SERIES

Related Questions in FEATURE-ENGINEERING

Related Questions in TIMETK

Trending Questions

Popular # Hahtags

Popular Questions