I have a dataframe like this
date values
1 2018-09-11 2.049659
2 2018-09-19 2.537952
3 2018-09-24 1.591455
4 2018-10-01 1.012579
5 2018-10-08 1.382486
6 2018-10-15 2.533606
...
193 2023-12-19 1.262344
I am looking to calculate the cumulative mean of values over the last (approximately) 365 days. The issue I am running into is that the entries are not evenly spaced; days between entries range from 2 and 35 days. The approach I have in my head is to somehow identify the date which is nearest to 365(+/-) days from each entry and calculate the mean of all values between that range, but I am struggling to implement this approach.
For example if the date is 2023-12-19, and there are dates of 2022-12-21 and 2022-12-14, I would want the mean to be calculated between 2023-12-19 and 2022-12-21.
The desired output dataframe would simply add a column to the existing dataframe and have a new column Annual Rolling Mean of values where there are NAs up to the first date ~365 days from 2018-09-11, and then a continuous mean through the last date of 2023-12-19.
The closest I have gotten is using slider::slide_dbl, but this is clearly not a correct approach with non-continuous timeseries data as it is not using the dates, just number of rows.
Reproducible Data
library(tidyverse)
library(slider)
dat <- data.frame(date = as.Date(
c("2018-09-11", "2018-09-19", "2018-09-24", "2018-10-01", "2018-10-08",
"2018-10-15", "2018-10-23", "2018-10-31", "2018-11-05", "2018-11-13",
"2018-11-19", "2018-11-21", "2018-11-28", "2018-12-04", "2018-12-11",
"2018-12-18", "2018-12-26", "2019-01-02", "2019-01-08", "2019-01-15",
"2019-01-22", "2019-01-29", "2019-02-05", "2019-02-12", "2019-02-19",
"2019-02-26", "2019-03-05", "2019-03-12", "2019-03-19", "2019-03-26",
"2019-04-02", "2019-04-09", "2019-04-16", "2019-04-23", "2019-04-30",
"2019-05-07", "2019-05-14", "2019-05-21", "2019-05-28", "2019-06-04",
"2019-06-11", "2019-06-18", "2019-06-25", "2019-07-02", "2019-07-09",
"2019-07-16", "2019-07-30", "2019-08-13", "2019-08-27", "2019-09-10",
"2019-09-24", "2019-10-02", "2019-10-08", "2019-10-22", "2019-11-05",
"2019-11-19", "2019-12-03", "2019-12-17", "2019-12-31", "2020-01-14",
"2020-01-28", "2020-02-11", "2020-02-25", "2020-03-10", "2020-04-07",
"2020-05-05", "2020-06-02", "2020-07-07", "2020-08-05", "2020-08-18",
"2020-09-01", "2020-09-15", "2020-09-29", "2020-10-06", "2020-10-13",
"2020-10-20", "2020-10-27", "2020-11-10", "2020-11-24", "2020-12-08",
"2020-12-22", "2021-01-05", "2021-01-19", "2021-02-02", "2021-02-16",
"2021-03-02", "2021-03-16", "2021-03-30", "2021-04-13", "2021-04-27",
"2021-04-29", "2021-05-11", "2021-05-25", "2021-06-08", "2021-06-22",
"2021-07-07", "2021-07-20", "2021-08-03", "2021-08-17", "2021-08-31",
"2021-09-14", "2021-09-28", "2021-10-12", "2021-10-13", "2021-10-19",
"2021-10-20", "2021-10-26", "2021-11-09", "2021-11-23", "2021-12-07",
"2021-12-21", "2022-01-04", "2022-01-19", "2022-02-01", "2022-02-15",
"2022-03-01", "2022-03-15", "2022-03-29", "2022-04-12", "2022-04-26",
"2022-05-10", "2022-05-24", "2022-06-07", "2022-06-21", "2022-07-06",
"2022-07-19", "2022-08-02", "2022-08-16", "2022-08-30", "2022-09-13",
"2022-10-11", "2022-10-24", "2022-11-01", "2022-11-08", "2022-11-15",
"2022-11-22", "2022-11-29", "2022-12-06", "2022-12-13", "2022-12-20",
"2022-12-29", "2023-01-04", "2023-01-10", "2023-01-18", "2023-01-24",
"2023-01-31", "2023-02-07", "2023-02-14", "2023-02-21", "2023-02-28",
"2023-03-07", "2023-03-14", "2023-03-21", "2023-03-28", "2023-04-04",
"2023-04-11", "2023-04-18", "2023-04-25", "2023-05-02", "2023-05-09",
"2023-05-16", "2023-05-23", "2023-05-31", "2023-06-06", "2023-06-13",
"2023-06-20", "2023-06-27", "2023-07-05", "2023-07-11", "2023-07-18",
"2023-07-25", "2023-08-01", "2023-08-08", "2023-08-15", "2023-08-22",
"2023-08-29", "2023-09-05", "2023-09-12", "2023-09-19", "2023-09-26",
"2023-09-26", "2023-10-03", "2023-10-10", "2023-10-17", "2023-10-24",
"2023-10-31", "2023-11-07", "2023-11-14", "2023-11-21", "2023-11-28",
"2023-12-05", "2023-12-12", "2023-12-19")),
values = runif(193, min = 0, max = 3))
## there is sometimes NA values
dat[75, 2] <- NA
## my closest attempt
rolling_mean <- dat %>%
mutate(year = lubridate::year(date),
days_since_last = date - lag(date, n = 1 )) %>%
group_by(year, isna = is.na(values)) %>%
mutate(`Annual Rolling Mean` = round(slider::slide_dbl(values, mean, .before = 52), 2)) %>%
ungroup() %>%
fill(`Annual Rolling Mean`)
This will show the average value (ignoring NAs) for a rolling window including the 365 days up to and including the current date.