The result produce the same p-value for each row, while the p-value looks different when I calculate each row seperately.
I am trying to test the different between the baseline and endline proportions, here is the data
group n_Baseline n_Endline sample_Baseline sample_Endline
<chr> <int> <int> <dbl> <dbl>
1 A 164 158 305 273
2 B 89 65 131 106
3 C 59 68 118 108
4 D 52 48 90 84
5 E 141 107 224 186
I tried the instruction like below:
df$P_Values <- apply(df, 1, function(x) prop.test(x = c(df$n_Baseline, df$n_Endline), n = c(df$sample_Baseline, df$sample_Endline))$p.value).
The outcome has the same p-value for each row:
group n_Baseline n_Endline sample_Baseline sample_Endline P_Values
<chr> <int> <int> <dbl> <dbl> <dbl>
1 A 164 158 305 273 0.109
2 B 89 65 131 106 0.109
3 C 59 68 118 108 0.109
4 D 52 48 90 84 0.109
5 E 141 107 224 186 0.109
However, when I do this seperately for each row, the pvalue is very different. For example in the 1st row:
prop.test(x = c(164, 158), n = c(305, 273))
Output:
2-sample test for equality of proportions with continuity correction
data: c(164, 158) out of c(305, 273) X-squared = 0.82448, df = 1, p-value = 0.3639 alternative hypothesis: two.sided 95 percent confidence interval: -0.12552283 0.04342351 sample estimates: prop 1 prop 2 0.5377049 0.5787546
Why and how do I get the exact p-value for each row instead of the same one?
The easiest way to do this is probably via
rowwise
calculations insidedplyr
from the tidyverseIf you want to stick to base R, then you can use
apply
, but your syntax forapply
is not correct here. The function inapply
takes each row of your data frame as a vector and calls itx
. You then need to use the vectorx
as the elements insideprop.test
, but instead you are passing whole columns from your data frame toprop.test
. Since you are passing the same thing each time, you get the same (wrong) p value each time.In addition, because your first column is a character vector, each row will be coerced into a character vector, so the maths won't work unless you skip the first column in your
apply
call by usingdf[-1]
The correct use of
apply
would be:Data from question in reproducible format