rolling function with variable width R

254 Views Asked by At

I need to summarize some data using a rolling window of different width and shift. In particular I need to apply a function (eg. sum) over some values recorded on different intervals.
Here an example of a data frame:

df <- tibble(days = c(0,1,2,3,1),
             value = c(5,7,3,4,2))
df
# A tibble: 5 x 2
   days value
  <dbl> <dbl>
1     0     5
2     1     7
3     2     3
4     3     4
5     1     2

The columns indicate:
days how many days elapsed from the previous observation. The first value is 0 because no previous observation.
value the value I need to aggregate.

Now, let's assume that I need to sum the field value every 4 days shifting 1 day at the time.
I need something along these lines:

   days value roll_sum rows_to_sum
      0     5       15 1,2,3      
      1     7       10 2,3        
      2     3        3 3          
      3     4        6 4,5        
      1     2       NA NA 

The column rows_to_sum has been added to make it clear.
Here more details:

  1. The first value (15), is the sum of the 3 rows because 0+1+2 = 3 which is less than the reference value 4 and adding the next line (with value 3) will bring the total day count to 7 which is more than 4.
  2. The second value (10), is the sum of row 2 and 3. This is because, excluding the first row (since we are shifting one day), we only summing row 2 and 3 because including row 4 will bring the total sum of days to 1+2+3 = 6 which is more than 4.
    ...

How can I achieve this? Thank you

1

There are 1 best solutions below

0
On BEST ANSWER

Here is one way :

library(dplyr)
library(purrr)

df %>%
  mutate(roll_sum = map_dbl(row_number(), ~{
    i <- max(which(cumsum(days[.x:n()]) <= 4))
    if(is.na(i)) NA else sum(value[.x:(.x + i - 1)])
}))

#   days value roll_sum
#  <dbl> <dbl>    <dbl>
#1     0     5       15
#2     1     7       10
#3     2     3        3
#4     3     4        6
#5     1     2        2

Performing this calculation in base R :

sapply(seq(nrow(df)), function(x) {
  i <- max(which(cumsum(df$days[x:nrow(df)]) <= 4))
  if(is.na(i)) NA else sum(df$value[x:(x + i - 1)])
})