SAS proc means equivalent in R

2.9k Views Asked by At

I have to translate someones SAS code into R as my company is moving away from SAS and was wondering if there were a comparable R function to proc means? For reference, proc means produces (or at least what I need it to produce) is the number of observations, the mean, min, max, and standard deviation. It is also important that this be able to be done by groups as well as weighted. The summary function in R produces this output but I don't think you can do it weighted. I would prefer an answer that uses base R but if that doesn't exist, a package would be okay.

Example:

df
   Temp V1  Weight
1  Hi    1  8 
2  Low   2  3 
3  Hi    3  9 
4  Low   4  9 

I need to produce various weighted summary statistics (median,min,max,sum,mean etc) of V1 by group as follows:

Group Min Max Mean Sum
Hi      1  3  2.06 35
Low     2  4  3.5  42
 
1

There are 1 best solutions below

0
On

Generally, aggregate in base R can serve as counterpart to SAS's proc means. Typically, this method runs a single simple function (that takes one input) on a single numeric column. Below is the formula version of the method:

aggregate(V1 ~ Temp, df, FUN=mean)

But aggregate can be extended for multiple columns and multiple functions:

agg_raw <- aggregate(cbind(V1, Weight) ~ Temp, df, 
                     FUN = function(x) c(count = length(x),
                              min = min(x), 
                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[2],
                              median = median(x),
                              mean = mean(x),
                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[4],
                              sum = sum(x))
                    )

agg_df <- do.call(data.frame, agg_raw)
agg_df
#   Temp V1.count V1.min V1.p.25. V1.median V1.mean V1.p.75. V1.sum Weight.count Weight.min Weight.p.25. Weight.median Weight.mean Weight.p.75. Weight.sum
# 1   Hi        2      1      1.5         2       2      2.5      4            2          8         8.25           8.5         8.5         8.75         17
# 2  Low        2      2      2.5         3       3      3.5      6            2          3         4.50           6.0         6.0         7.50         12

For weighted statistics, aggregate does not have a built-in facility, so a bit more nuance is needed where you compute the statistics before aggregation. R's ave allows calculation across groups:

df$V1_wtmean <- ave(df[c("V1", "Weight")], df$Temp, FUN=function(x) weighted.mean(x$V1, x$Weight))[[1]]
df$V1_wtsum <- ave(df[c("V1", "Weight")], df$Temp, FUN=function(x) sum(x$V1 * x$Weight))[[1]]

df
#   Temp V1 Weight V1_wtmean V1_wtsum
# 1   Hi  1      8  2.058824       35
# 2  Low  2      3  3.500000       42
# 3   Hi  3      9  2.058824       35
# 4  Low  4      9  3.500000       42

Once you have such columns, you can aggregate on min and max to return grouping needs:

agg_df <- do.call(data.frame,
                  aggregate(cbind(V1, V1_wtmean, V1_wtsum) ~ Temp, df, 
                            FUN=function(x) c(min=min(x), 
                                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[2],
                                              mean=mean(x),
                                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[4],
                                              max=max(x), 
                                              sum=sum(x)))
                  )
                          
agg_df <- setNames(agg_df[c("V1.min", "V1.max", "V1.mean", "V1.sum", "V1_wtmean.min", "V1_wtsum.min")],
                   c("Min", "Max","Simple.Mean", "Simple.Sum", "Wgt.Mean", "Wgt.Sum"))
                                
agg_df
            
#   Min Max Simple.Mean Simple.Sum Wgt.Mean Wgt.Sum
# 1   1   3           2          4 2.058824      35
# 2   2   4           3          6 3.500000      42     

Online Demo

See Also: