I have a very large dataframe and my goal is to list the cumulative USD by user ID. The dataframe looks like this, but it is much larger:
dt<-sample(seq(as.Date("2013-01-01"),as.Date("2013-05-01"),by="days"),10)
s<-c(rep(5252525,5),rep(1313131,5))
usd<-round(rnorm(10,100),2)
money<-data.frame(dt,s,usd)
money<-money[order(money$dt),]
money$Cumulative<-NA
users<-unique(money$s)
I started with a for loop, but it was very slow:
for (i in 1:length(users)){
temp=which(money$s==users[i])
money$Cumulative[temp]=cumsum(money$usd[temp])
}
I read on StackOverflow that I could use data.table to improve overall speed, and this helped somewhat:
money<-data.table(money)
setkey(money,s)
for (i in 1:length(users)){
temp=which(money$s==users[i])
money$Cumulative[temp]=cumsum(money$usd[temp])
}
I'd like to make this calculation even faster. What should I do next?
Since
moneyis already ordered by thedtcolumn, you can just useave:Or you can use data.table: