Calculating rolling average of the past ~365 days with non-continuous timeseries data

60 Views Asked by At

I have a dataframe like this

        date   values
1 2018-09-11 2.049659
2 2018-09-19 2.537952
3 2018-09-24 1.591455
4 2018-10-01 1.012579
5 2018-10-08 1.382486
6 2018-10-15 2.533606
...
193 2023-12-19 1.262344

I am looking to calculate the cumulative mean of values over the last (approximately) 365 days. The issue I am running into is that the entries are not evenly spaced; days between entries range from 2 and 35 days. The approach I have in my head is to somehow identify the date which is nearest to 365(+/-) days from each entry and calculate the mean of all values between that range, but I am struggling to implement this approach.

For example if the date is 2023-12-19, and there are dates of 2022-12-21 and 2022-12-14, I would want the mean to be calculated between 2023-12-19 and 2022-12-21.

The desired output dataframe would simply add a column to the existing dataframe and have a new column Annual Rolling Mean of values where there are NAs up to the first date ~365 days from 2018-09-11, and then a continuous mean through the last date of 2023-12-19.

The closest I have gotten is using slider::slide_dbl, but this is clearly not a correct approach with non-continuous timeseries data as it is not using the dates, just number of rows.

Reproducible Data

library(tidyverse)
library(slider)

dat <- data.frame(date = as.Date(
          c("2018-09-11", "2018-09-19", "2018-09-24", "2018-10-01", "2018-10-08", 
            "2018-10-15", "2018-10-23", "2018-10-31", "2018-11-05", "2018-11-13", 
            "2018-11-19", "2018-11-21", "2018-11-28", "2018-12-04", "2018-12-11", 
            "2018-12-18", "2018-12-26", "2019-01-02", "2019-01-08", "2019-01-15", 
            "2019-01-22", "2019-01-29", "2019-02-05", "2019-02-12", "2019-02-19", 
            "2019-02-26", "2019-03-05", "2019-03-12", "2019-03-19", "2019-03-26", 
            "2019-04-02", "2019-04-09", "2019-04-16", "2019-04-23", "2019-04-30", 
            "2019-05-07", "2019-05-14", "2019-05-21", "2019-05-28", "2019-06-04", 
            "2019-06-11", "2019-06-18", "2019-06-25", "2019-07-02", "2019-07-09", 
            "2019-07-16", "2019-07-30", "2019-08-13", "2019-08-27", "2019-09-10", 
            "2019-09-24", "2019-10-02", "2019-10-08", "2019-10-22", "2019-11-05", 
            "2019-11-19", "2019-12-03", "2019-12-17", "2019-12-31", "2020-01-14", 
            "2020-01-28", "2020-02-11", "2020-02-25", "2020-03-10", "2020-04-07", 
            "2020-05-05", "2020-06-02", "2020-07-07", "2020-08-05", "2020-08-18", 
            "2020-09-01", "2020-09-15", "2020-09-29", "2020-10-06", "2020-10-13", 
            "2020-10-20", "2020-10-27", "2020-11-10", "2020-11-24", "2020-12-08", 
            "2020-12-22", "2021-01-05", "2021-01-19", "2021-02-02", "2021-02-16", 
            "2021-03-02", "2021-03-16", "2021-03-30", "2021-04-13", "2021-04-27", 
            "2021-04-29", "2021-05-11", "2021-05-25", "2021-06-08", "2021-06-22", 
            "2021-07-07", "2021-07-20", "2021-08-03", "2021-08-17", "2021-08-31", 
            "2021-09-14", "2021-09-28", "2021-10-12", "2021-10-13", "2021-10-19", 
            "2021-10-20", "2021-10-26", "2021-11-09", "2021-11-23", "2021-12-07", 
            "2021-12-21", "2022-01-04", "2022-01-19", "2022-02-01", "2022-02-15", 
            "2022-03-01", "2022-03-15", "2022-03-29", "2022-04-12", "2022-04-26", 
            "2022-05-10", "2022-05-24", "2022-06-07", "2022-06-21", "2022-07-06", 
            "2022-07-19", "2022-08-02", "2022-08-16", "2022-08-30", "2022-09-13", 
            "2022-10-11", "2022-10-24", "2022-11-01", "2022-11-08", "2022-11-15", 
            "2022-11-22", "2022-11-29", "2022-12-06", "2022-12-13", "2022-12-20", 
            "2022-12-29", "2023-01-04", "2023-01-10", "2023-01-18", "2023-01-24", 
            "2023-01-31", "2023-02-07", "2023-02-14", "2023-02-21", "2023-02-28", 
            "2023-03-07", "2023-03-14", "2023-03-21", "2023-03-28", "2023-04-04", 
            "2023-04-11", "2023-04-18", "2023-04-25", "2023-05-02", "2023-05-09", 
            "2023-05-16", "2023-05-23", "2023-05-31", "2023-06-06", "2023-06-13", 
            "2023-06-20", "2023-06-27", "2023-07-05", "2023-07-11", "2023-07-18", 
            "2023-07-25", "2023-08-01", "2023-08-08", "2023-08-15", "2023-08-22", 
            "2023-08-29", "2023-09-05", "2023-09-12", "2023-09-19", "2023-09-26", 
            "2023-09-26", "2023-10-03", "2023-10-10", "2023-10-17", "2023-10-24", 
            "2023-10-31", "2023-11-07", "2023-11-14", "2023-11-21", "2023-11-28", 
            "2023-12-05", "2023-12-12", "2023-12-19")),
            values = runif(193, min = 0, max = 3))

## there is sometimes NA values

dat[75, 2] <- NA

## my closest attempt

rolling_mean <- dat %>%
  mutate(year = lubridate::year(date),
         days_since_last = date - lag(date, n = 1 )) %>%
  group_by(year, isna = is.na(values)) %>%
  mutate(`Annual Rolling Mean` = round(slider::slide_dbl(values, mean, .before = 52), 2)) %>%
  ungroup() %>%
  fill(`Annual Rolling Mean`)
2

There are 2 best solutions below

0
Jon Spring On

This will show the average value (ignoring NAs) for a rolling window including the 365 days up to and including the current date.

dat |>
  # arrange(date) |> # slider_index_* needs to receive dates sorted
  mutate(`Annual Rolling Mean` = round(slider::slide_index_dbl(
    values, date, mean, na.rm = TRUE, .before = 364), 2))
0
Adriano Mello On

The sample data contains two "2023-09-26". The dates are supposed to be unique, right? I guess this solution gives you more control over the window.

library(purrr)
library(lubridate)
# -----------------

dat$arm <- map_dbl(
  .x = dat$date, 
  .f = \(x) mean(dat$values[dat$date %in% (x-days(1:365))], na.rm = TRUE))