I want to determine the running total of a column within a specified date window.
| id | date | value |
|---|---|---|
| 7 | 2023-01-01 | 1 |
| 7 | 2023-01-03 | 1 |
| 7 | 2023-01-04 | 3 |
| 7 | 2023-01-05 | 2 |
| 7 | 2023-01-06 | 1 |
| 7 | 2023-01-07 | 5 |
| 7 | 2023-01-10 | 3 |
| 7 | 2023-01-14 | 2 |
| 15 | 2023-01-01 | 1 |
| 15 | 2023-01-02 | 1 |
| 15 | 2023-01-04 | 2 |
| 15 | 2023-01-07 | 2 |
| 15 | 2023-01-12 | 1 |
| 15 | 2023-01-13 | 1 |
| 15 | 2023-01-14 | 10 |
Given the table above, I need to determine the max rolling sum for any 7-day window for each id. (first window would 1/1 to 1/7, second 1/2-1/8, etc...)
In other words, for id 7, that's 14 for the 1/4-1/10 windows (3 + 2 + 1 + 5 + 3). For 15, that's 12 for the 1/8-1/14 window (1 + 1 + 10).
I was using the runner library, but I can't quite get what I should be getting.
Edit: This worked if the max is not in the first few windows:
library(runner)
df %>%
group_by(id) %>%
mutate(rsum = runner(value, k=7, idx=date, f=function(x) sum(x), na_pad=T)) %>%
filter(!is.na(rsum)) %>%
summrise(max_rsum = max(rsum))
rollapplyrtakes as arguments the value, vector of widths (how many values to sum at each point) and the function to apply (sum). Run that for each group and then take the maximum.giving:
Note
The input in reproducible form