R calculate mean square for all sub groups in a subset

201 Views Asked by At

how do I calculate the mean square of all 2019_Preston_STD,2019_Preston_V1,2019_Preston_V2 etc using the Value column, then the adjmth1, adjmth3 columns

structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1", 
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778, 
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222, 
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684, 
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959, 
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")
1

There are 1 best solutions below

1
On BEST ANSWER

This task can be done in many ways, as shown in the link that @r2evans pointed out. My favorite one is dplyr using summarize(across() because to me its syntax is easy to understand and easy to apply to many columns. It also presents the resulted numbers in nice format.

For example, from iris data I want to get the arithmetic mean of Sepal.Length, Petal.Length, and Petal.Width for each of species : setosa, versicolor, and virginica. Here is the head of the data:

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

And here is how to get the mean in each species:

iris %>% group_by(Species) %>% 
         summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species    Sepal.Length Petal.Length Petal.Width
# <fct>             <dbl>        <dbl>       <dbl>
# 1 setosa             5.01         1.46       0.246
# 2 versicolor         5.94         4.26       1.33 
# 3 virginica          6.59         5.55       2.03 

As for your task, first you need to define the function for the mean square (because its definition slightly varies in some references). Then, you apply it to your data frame using summarize(across()).

For example, you define the mean square function as follows:

meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)

Note: This definition requires that length(x) doesn't equal 1, or otherwise NaN will be produced.

You can apply it to your data frame newdata as follows:

newdata %>% group_by(IDX) %>% 
            summarize(across(c(Value, adjmth1, adjmth3), meansq)