Sum certain values from changing dataframe in R

83 Views Asked by At

I have a data frame that I would like to aggregate by adding certain values. Say I have six clusters. I then feed data from each cluster into some function that generates a value x which is then put into the output data frame.

cluster year      lambda           v            e   x
1        1    1 -0.12160997 -0.31105287 -0.253391178  15
2        1    2 -0.12160997 -1.06313732 -0.300349972  10
3        1    3 -0.12160997 -0.06704185  0.754397069  40
4        2    1 -0.07378295 -0.31105287 -1.331764904   4
5        2    2 -0.07378295 -1.06313732  0.279413039  19
6        2    3 -0.07378295 -0.06704185 -0.004581941  23
7        3    1 -0.02809310 -0.31105287  0.239647063  28
8        3    2 -0.02809310 -1.06313732  1.284568047  38
9        3    3 -0.02809310 -0.06704185 -0.294881283  18
10       4    1  0.33479251 -0.31105287 -0.480496125  15
11       4    2  0.33479251 -1.06313732 -0.380251626  12
12       4    3  0.33479251 -0.06704185 -0.078851036  34
13       5    1  0.27953088 -0.31105287  1.435456851 100
14       5    2  0.27953088 -1.06313732 -0.795435607   0
15       5    3  0.27953088 -0.06704185 -0.166848530   0
16       6    1  0.29409366 -0.31105287  0.126647655  44
17       6    2  0.29409366 -1.06313732  0.162961658  18
18       6    3  0.29409366 -0.06704185 -0.812316265  13

To aggregate, I then add up the x value for cluster 1 across all three years with seroconv.cluster1=sum(data.all[c(1:3),6]) and repeat for each cluster.

Every time I change the number of clusters right now I have to manually change the addition of the x's. I would like to be able to say n.vec <- seq(6, 12, by=2) and feed n.vec into the functions and get x and have R add up the x values for each cluster every time with the number of clusters changing. So it would do 6 clusters and add up all the x's per cluster. Then 8 and add up the x's and so on.

2

There are 2 best solutions below

0
On BEST ANSWER

To get the sum of x for each cluster as a vector, you can use tapply:

tapply(df$x, df$cluster, sum)
#   1   2   3   4   5   6 
#  65  46  84  61 100  75 

If you instead wanted to output as a data frame, you could use aggregate:

aggregate(x~cluster, sum, data=df)
#   cluster   x
# 1       1  65
# 2       2  46
# 3       3  84
# 4       4  61
# 5       5 100
# 6       6  75
1
On

It seems you are asking for an easy way to split your data up, apply a function (sum in this case) and then combine it all back together. Split apply combine is a common data strategy, and there are several split/apply/combine strategies in R, the most popular being ave in base, the dplyr package and the data.table package.

Here's an example for your data using dplyr:

library(dplyr)
df %>% group_by(cluster, year) %>% summarise_each(funs(sum))