I have a variable and I would like to obtain the means within each group where the group is listed for each observation in a column and I have many such columns. I would then like to associate the group means to the appropriate observation so that if I start with a matrix of m obs x n different groupings I obtain an m x n matrix of means. For example:
> var <- round(runif(10),digits=2)
> var
[1] 0.47 0.21 0.80 0.65 0.32 0.72 0.29 0.93 0.77 0.64
> groupings <- cbind(sample(c(1,2,3), 10, replace=TRUE),
sample(c(1,2,3), 10, replace=TRUE),
sample(c(1,2,3,5), 10, replace=TRUE))
> groupings
[,1] [,2] [,3]
[1,] 3 1 5
[2,] 1 1 5
[3,] 2 1 5
[4,] 3 2 3
[5,] 2 3 1
[6,] 1 1 1
[7,] 2 3 1
[8,] 1 2 1
[9,] 3 1 5
[10,] 1 3 2
I can obtain the means within each group separately with the following (for example)
> means.1 <- sapply(split(var, groupings[,1]), function(x) mean(x))
> means.2 <- sapply(split(var, groupings[,2]), function(x) mean(x))
> means.3 <- sapply(split(var, groupings[,3]), function(x) mean(x))
> means.1
1 2 3
0.625 0.470 0.630
> means.2
1 2 3
0.5940000 0.7900000 0.4166667
> means.3
1 2 3 5
0.5650 0.6400 0.6500 0.5625
But not only are these separate calls inefficient, they still don't get me what I want, which is the following
[,1] [,2] [,3]
[1,] 0.630 0.5940000 0.5625
[2,] 0.625 0.5940000 0.5625
[3,] 0.470 0.5940000 0.5625
[4,] 0.630 0.7900000 0.6500
[5,] 0.470 0.4166667 0.5650
[6,] 0.625 0.5940000 0.5650
[7,] 0.470 0.4166667 0.5650
[8,] 0.625 0.7900000 0.5650
[9,] 0.630 0.5940000 0.5625
[10,] 0.625 0.4166667 0.6400
Another option, you can use
apply
(because you already have a matrix) to loop through columns( with Margin set to 2) and pass the column toave
function as group variable, you can either explicitly specify FUN parameter to be mean or not specify it as mean is the default function used:Or with
dplyr
, you can use themutate_all()
function: