Running, Rolling Sum in a specified window in R

Question

Running, Rolling Sum in a specified window in R

308 Views Asked by Bea At 26 April 2023 at 19:11

I want to determine the running total of a column within a specified date window.

id	date	value
7	2023-01-01	1
7	2023-01-03	1
7	2023-01-04	3
7	2023-01-05	2
7	2023-01-06	1
7	2023-01-07	5
7	2023-01-10	3
7	2023-01-14	2
15	2023-01-01	1
15	2023-01-02	1
15	2023-01-04	2
15	2023-01-07	2
15	2023-01-12	1
15	2023-01-13	1
15	2023-01-14	10

Given the table above, I need to determine the max rolling sum for any 7-day window for each id. (first window would 1/1 to 1/7, second 1/2-1/8, etc...)

In other words, for id 7, that's 14 for the 1/4-1/10 windows (3 + 2 + 1 + 5 + 3). For 15, that's 12 for the 1/8-1/14 window (1 + 1 + 10).

I was using the runner library, but I can't quite get what I should be getting.

Edit: This worked if the max is not in the first few windows:

library(runner)

df %>%
group_by(id) %>%
mutate(rsum = runner(value, k=7, idx=date, f=function(x) sum(x), na_pad=T)) %>%
filter(!is.na(rsum)) %>%
summrise(max_rsum = max(rsum))

Original Q&A

There are 2 best solutions below

L Tyrone On 26 April 2023 at 21:38

This approach assumes that each date value is unique within the group it falls in. It involves using complete() to add "dummy" records for each missing day/date that falls within the date range for each group. Doing so means 7 rows == 1 week. Rolling sums for each 7 day period (e.g. 7 rows) can then be calculated using roll_sum() from the RcppRoll package. As an example of the desired output was not provided, it is assumed a summary tibble is sufficient:

library(dplyr)
library(tidyr)
library(RcppRoll)

# Your example table as a dataframe
df <- read.table(text = "id     date    value
7   2023-01-01  1
7   2023-01-03  1
7   2023-01-04  3
7   2023-01-05  2
7   2023-01-06  1
7   2023-01-07  5
7   2023-01-10  3
7   2023-01-14  2
15  2023-01-01  1
15  2023-01-02  1
15  2023-01-04  2
15  2023-01-07  2
15  2023-01-12  1
15  2023-01-13  1
15  2023-01-14  10", header = T)
df$date <- as.Date(df$date)

df %>% 
  arrange(id, date) %>%
  group_by(id) %>%
  complete(date = full_seq(date, 1), fill = list(value = 0)) %>%
  mutate(max_7day_rollsum = roll_sum(value, 7, align = "right", fill = 0)) %>%
  filter(max_7day_rollsum == max(max_7day_rollsum)) %>%
  select(-value)

and the result:

# A tibble: 2 × 3
# Groups:   id [2]
     id date       max_7day_rollsum
  <int> <date>                <dbl>
1     7 2023-01-10               14
2    15 2023-01-14               12

**G. Grothendieck** · Accepted Answer · 2023-04-27T13:02:43.883000

rollapplyr takes as arguments the value, vector of widths (how many values to sum at each point) and the function to apply (sum). Run that for each group and then take the maximum.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

df %>%
  mutate(date = as.Date(date)) %>% # can omit if already Date class
  group_by(id) %>%
  mutate(max = rollapplyr(value, 1:n() - findInterval(date - 7, date), sum)) %>%
  slice_max(max) %>%
  ungroup %>%
  select(-value)

giving:

# A tibble: 2 × 3
# Groups:   id [2]
     id date         max
  <int> <date>     <int>
1     7 2023-01-10    14
2    15 2023-01-14    12

Note

The input in reproducible form

df <- structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L), date = c("2023-01-01", "2023-01-03", 
"2023-01-04", "2023-01-05", "2023-01-06", "2023-01-07", "2023-01-10", 
"2023-01-14", "2023-01-01", "2023-01-02", "2023-01-04", "2023-01-07", 
"2023-01-12", "2023-01-13", "2023-01-14"), value = c(1L, 1L, 
3L, 2L, 1L, 5L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 10L)), 
class = "data.frame", row.names = c(NA, -15L))

Running, Rolling Sum in a specified window in R

There are 2 best solutions below

Note

Related Questions in R

Related Questions in RUNNER

Related Questions in ROLLING-SUM

Trending Questions

Popular # Hahtags

Popular Questions