Rolling mean across day of year

76 Views Asked by At

In my example below, I can calculate a centered 7-day rolling mean however, the first three days and the last three days are NA values. The rolling mean should take into account that day 365 is followed by day 1 and use that in the rolling mean calculation. How can I calculate a rolling 7-day mean so that there are no NA values?

library(tidyverse)
library(zoo)

set.seed(321)

aa <- data.frame(
  doy = seq(1,365,1),
  value = round(rnorm(365,30,5))
)

bb <- aa %>%
  mutate(movingAVG = round(rollmean(value, k = 7, align = 'center', fill = NA)))

head(bb)
#>   doy value movingAVG
#> 1   1    39        NA
#> 2   2    26        NA
#> 3   3    29        NA
#> 4   4    29        31
#> 5   5    29        30
#> 6   6    31        31

tail(bb)
#>     doy value movingAVG
#> 360 360    24        30
#> 361 361    38        29
#> 362 362    30        29
#> 363 363    20        NA
#> 364 364    26        NA
#> 365 365    29        NA

Created on 2023-11-29 with reprex v2.0.2

2

There are 2 best solutions below

0
jared_mamrot On BEST ANSWER

One potential option is to replicate your "aa" dataframe three times (e.g. 1-365 + 1-365 + 1-365), calculate your rolling mean for all values, then filter the middle "aa" dataframe (i.e. 1-365 + 1-365 + 1-365), e.g.

library(tidyverse)
library(zoo)

set.seed(321)

aa <- data.frame(
  doy = seq(1,365,1),
  value = round(rnorm(365,30,5))
)

bb <- aa %>%
  bind_rows(aa, .id = "index") %>%
  bind_rows(aa) %>%
  mutate(movingAVG = round(rollmean(value, k = 7, align = 'center', fill = NA))) %>%
  filter(index == 2) %>%
  select(-index)

head(bb)
#>   doy value movingAVG
#> 1   1    39        28
#> 2   2    26        30
#> 3   3    29        30
#> 4   4    29        31
#> 5   5    29        30
#> 6   6    31        31
tail(bb)
#>     doy value movingAVG
#> 360 360    24        30
#> 361 361    38        29
#> 362 362    30        29
#> 363 363    20        29
#> 364 364    26        30
#> 365 365    29        28

Created on 2023-11-30 with reprex v2.0.2

Does that make sense?

0
jay.sf On

Simply using the same data for preceding and subsequent year doesn't make much sense, does it? Alternatively you could expand value by the amount of missings generatyed in the rolling average and linearly extrapolate using this function,

> f <- \(x, n) {
+   na <- replicate(2, rep_len(NA, floor(n/2)), simplify=FALSE)
+   if (n %% 2 == 0) {
+     na[[1]] <- `length<-`(na[[1]], n/2 - 1L)
+   }
+   u <- unlist(c(na, list(x))[c(1, 3, 2)])
+   approx(u, xout=seq_along(u), rule=2)$y
+ }

calculate the rolling average and delete the NAs aftwerwards.

Here using data.table.

> n <- 7
> library(data.table)
> setDT(aa)[, mavg := approx(
+   round(na.omit(frollmean(f(value, n), n, align='c'))), 
+   xout=seq_len(nrow(aa)), rule=2)$y]
> aa
     doy value mavg
  1:   1    39   34
  2:   2    26   33
  3:   3    29   32
  4:   4    29   31
  5:   5    29   30
 ---               
361: 361    38   29
362: 362    30   29
363: 363    20   28
364: 364    26   29
365: 365    29   27

Of course, you could think of a model and predict the tails instead of linear extrapolation.