How to scale a variable by group

2k Views Asked by At

I would really appreciate your help in this question. I have the following dataset and I would like to create a new variable which would contain the standardized values (z distribution) per level of a given factor variable.

x <- data.frame(gender = c("boy","boy","boy","girl","girl","girl"),
                values=c(1,2,3,6,7,8)) 
x

   gender values
1    boy      1
2    boy      2  
3    boy      3
4   girl      6
5   girl      7
6   girl      8

My aim is to create one new variable which will contain the z-values calculated separately for each factor level (for boys and for girls).

And another question. I mainly would like to create a variable with the z-values. Would it be similar if I would like to apply another function and for example calculate distribution in quantiles per factor level?

Thank you for your help!

2

There are 2 best solutions below

1
On

You can use scale with ave and transform:

> transform(x, z_score=ave(values, gender, FUN=scale))
  gender values z_score
1    boy      1      -1
2    boy      2       0
3    boy      3       1
4   girl      6      -1
5   girl      7       0
6   girl      8       1

aggregate is also useful

> aggregate(values ~ gender, scale, data=x)

And there are a lot of ways using ddply from plyr, tapply, data.table. Take a look at this post

0
On

The question how to create z scores has already been answered.

Here's a way to calculate quantiles for each factor level:

with(x, tapply(values, gender, FUN = quantile))
# $boy
#   0%  25%  50%  75% 100% 
#  1.0  1.5  2.0  2.5  3.0 
#
# $girl
#   0%  25%  50%  75% 100% 
#  6.0  6.5  7.0  7.5  8.0