Accumulate based on lagged values and values in different columns in R

44 Views Asked by At

I have the following dataset:

df <- data.frame(fractile = 1:10) %>% mutate(bracketaverage = fractile*100, topaverage = 10:19)

Which looks like:

 fractile bracketaverage topaverage
1         1            100         10
2         2            200         11
3         3            300         12
4         4            400         13
5         5            500         14
6         6            600         15
7         7            700         16
8         8            800         17
9         9            900         18
10       10           1000         19

I wish to modify the topaverage column by utilizing input from the remaining columns according to the specified formula:

topaverage = ( (100-lag(fractile) ) * lag(topaverage) + ( lag(fractile) - fractile ) * bracket_average ) / (100-fractile)

Hence, the formula calculates the topaverage recursively, meaning that each iteration relies on the topaverage from the preceding step.

1

There are 1 best solutions below

0
r2evans On BEST ANSWER

While many of purrr's functions have func2-versions for two arguments, two factors make it not easy to use them here: you need three variables, and you need both the lag and current version of at least one of them. Because of that, I think we can purrr::accumulate over the row indices themselves.

library(dplyr)
library(purrr) # accumulate
df |>
  mutate(
    val = accumulate(
      row_number(), .init = first(topaverage),
      .f = function(ta, i) if (i == 1) ta else {
        ( (100 - fractile[i-1]) * topaverage[i-1] +
            (fractile[i-1] - fractile[i]) * bracketaverage[i] ) /
          (100 - fractile[i])
      })[-1]
  )
#    fractile bracketaverage topaverage       val
# 1         1            100         10 10.000000
# 2         2            200         11  8.061224
# 3         3            300         12  8.020619
# 4         4            400         13  7.958333
# 5         5            500         14  7.873684
# 6         6            600         15  7.765957
# 7         7            700         16  7.634409
# 8         8            800         17  7.478261
# 9         9            900         18  7.296703
# 10       10           1000         19  7.088889

where fractile[i-1] is giving us lag(fractile) and fractile[i] is giving us the current-row fractile.

I assume that the first topaverage is retained unmodified. If that is not true, then this might need adjusting to account for aligning in the other direction ([i] and [i+1] instead, updating the leading conditional).