Calculating a rolling mean by group across repeated measurements?

1.3k Views Asked by At

I have a data.frame that has the the following vectors: NAME, JUMP.NUMBER, POWER. These variables are obtained by having performed repeated jumps from one to 20 from which I obtain power.

I want to determine the best rolling means for POWER from 1 to 20 jumps by each subject and then create a new data.frame that contains these values.

While it takes me a long time, I can subset my data by 'NAME', calculate rolling means using the rollmean() function in zoo, find the maximum values from each of these new data.frames, and then create a new data.set with these values. However, this is incredibly slow.

My code looks like this:

sample<-subset(JUMP.DATA, NAME=="Bob")
ROLLING1<-rollmean(sample,1)
ROLLING2<-rollmean(sample,2)
ROLLING3<-rollmean(sample,3)
ROLLING4<-rollmean(sample,4)
MAXROLLING4<- max(ROLLING4)
MAXROLLING1<- max(ROLLING1)
MAXROLLING2<- max(ROLLING2)
MAXROLLING3<- max(ROLLING3)
NUMBER=c(1, 2, 3, 4)
ROLLING.POWER=c(MAXROLLING1, MAXROLLING2, MAXROLLING3, MAXROLLING4)
BEST.ROLLING.MEAN <-cbind(NUMBER, ROLLING.POWER)

I'm sure there is a much more straightforward method to calculate a rolling mean ~ Group. Any help would be appreciated.

The original data.set would look like this:

NAME=c(Bob, Bob, Bob, Bob, John, John, John, John)
JUMP.NUMBER= c(1, 2, 3, 4, 1, 2, 3, 4)
POWER = c(3000, 2800, 2700, 2600, 3400, 3100, 2900, 2800)
JUMP.DATA= cbind(NAME, JUMP.NUMBER, POWER)
2

There are 2 best solutions below

7
On BEST ANSWER

Here's a data.table solution.

library(data.table)
library(zoo)
get.power <- function(p)sapply(1:length(p),function(k)max(rollmean(p,k)))
setDT(JUMP.DATA)[,ROLLING.POWER:=get.power(POWER),by=NAME]
JUMP.DATA
   NAME JUMP.NUMBER POWER ROLLING.POWER
1:  Bob           1  3000      3000.000
2:  Bob           2  2800      2900.000
3:  Bob           3  2700      2833.333
4:  Bob           4  2600      2775.000
5: John           1  3400      3400.000
6: John           2  3100      3250.000
7: John           3  2900      3133.333
8: John           4  2800      3050.000

Response to @Arun's comment

So here is a benchmark comparing the runmean(...) in caTools to rollmean(...) in zoo. The former is about 4 X faster. Note that the defaults are different though.

# 26 names, each with 20 jumps
set.seed(1)  # for reproducibility
JUMP.DATA   <- data.frame(NAME=rep(LETTERS,each=20),
                          JUMP.NUMBER=1:20,
                          POWER=100*rpois(20*26,10))
DT1 <- as.data.table(JUMP.DATA)
DT2 <- as.data.table(JUMP.DATA)
gp.zoo     <- function(p)sapply(1:length(p),function(k)max(rollmean(p,k)))
gp.caTools <- function(p)sapply(1:length(p),function(k)max(runmean(p,k,endrule="trim")))
f.zoo      <- function() DT1[,ROLLING.POWER:=gp.zoo(POWER),by=NAME]
f.caTools  <- function() DT2[,ROLLING.POWER:=gp.caTools(POWER),by=NAME]
library(microbenchmark)
microbenchmark(f.zoo(),f.caTools(),times=10)
# Unit: milliseconds
#         expr      min       lq   median       uq     max neval
#      f.zoo() 423.6837 427.5221 433.4347 435.8686 469.825    10
#  f.caTools() 102.6002 107.1747 108.7353 109.3258 111.931    10
identical(DT1,DT2)
# [1] TRUE
2
On

It's not totally clear to me what you are looking for here, but this seems like a good use of the dplyr package.

You can calculate rolling means by a particular group using the following, where JUMP.DATA is your dataset and NAME is the variable that you are grouping over. This will add a new column to your dataframe (which I have renamed "xx" so as not to edit your original one) called "ROLLING":

xx <- group_by(JUMP.DATA,NAME) %>% mutate(ROLLING=rollmean(POWER))

If you then want to find the maximum rolling mean within each of these groups you can then use the summarise, which will give you a single value for each group.

my.summary <- group_by(xx,NAME) %>% summarise(MY.MAX=max(ROLLING))