z-test for groups in the df, but comparing all groups to just one other group?

35 Views Asked by At

I am trying to generate z scores (then pvalues) by group, by only changing one of the groups each time, ie comparing each group to another 'reference' group, with the idea that I can do hypothesis testing to see if they are distinct distributions.

In the below example, I would like to perform z-tests on a, b and c, all against d:

z-test comparing a vs d,

z-test comparing b vs d

z-test comparing c vs d

> df
group measurement 
a     1
a     2
b     6
b     7
b     9
c     4
c     5
c     4
d     8
d     8

so that my end df looks something like this:

> group_df 
group pvalue
a     0.005
b     0.3
c     0.001
d     1.000

So far I have something like this:

# d group stats
d_only <- df %>% filter(grepl("d", group)) %>% select("measurement")
d_mean <- mean(admeasurement)
d_n    <- nrow(d_only)


# generate values needed to calculate zscore 
group_df <- df %>% group_by(group) %>% summarise_each(funs(mean, sd, n()))
group_df$sqrt_n   <- (group_df$n + d_n) %>% sqrt()
group_df$pop_mean <- (group_df$mean + d_mean) / 2


# calculate zscore
group_df $zscore <- 
(group_df$mean - group_df$pop_mean) / (group_df$sd / group_df$sqrt_n)

 
group_df$pvalue <- pnorm(-abs(zscore))

But I am getting some p values that seem wrong, and it fees like there should be a more elegant way of doing this.

0

There are 0 best solutions below