How to do two proportion prop.test for each single row in a data frame?

Question

How to do two proportion prop.test for each single row in a data frame?

70 Views Asked by Kim Nguyen At 28 July 2025 at 01:22

The result produce the same p-value for each row, while the p-value looks different when I calculate each row seperately.

I am trying to test the different between the baseline and endline proportions, here is the data

  group n_Baseline n_Endline sample_Baseline sample_Endline
  <chr>      <int>     <int>           <dbl>          <dbl>
1 A            164       158             305            273
2 B             89        65             131            106
3 C             59        68             118            108
4 D             52        48              90             84
5 E            141       107             224            186

I tried the instruction like below:

df$P_Values <- apply(df, 1, function(x) prop.test(x = c(df$n_Baseline, df$n_Endline), n = c(df$sample_Baseline, df$sample_Endline))$p.value).

The outcome has the same p-value for each row:

 group n_Baseline n_Endline sample_Baseline sample_Endline P_Values
  <chr>      <int>     <int>           <dbl>          <dbl>    <dbl>
1 A            164       158             305            273    0.109
2 B             89        65             131            106    0.109
3 C             59        68             118            108    0.109
4 D             52        48              90             84    0.109
5 E            141       107             224            186    0.109

However, when I do this seperately for each row, the pvalue is very different. For example in the 1st row:

prop.test(x = c(164, 158), n = c(305, 273))

Output:

2-sample test for equality of proportions with continuity correction

data: c(164, 158) out of c(305, 273) X-squared = 0.82448, df = 1, p-value = 0.3639 alternative hypothesis: two.sided 95 percent confidence interval: -0.12552283 0.04342351 sample estimates: prop 1 prop 2 0.5377049 0.5787546

Why and how do I get the exact p-value for each row instead of the same one?

Original Q&A

There are 1 best solutions below

**Allan Cameron** · Accepted Answer

The easiest way to do this is probably via rowwise calculations inside dplyr from the tidyverse

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(pval = prop.test(x = c(n_Baseline, n_Endline), 
                          n = c(sample_Baseline, sample_Endline))$p.value)
#> # A tibble: 5 x 6
#> # Rowwise: 
#>   group n_Baseline n_Endline sample_Baseline sample_Endline   pval
#>   <chr>      <int>     <int>           <int>          <int>  <dbl>
#> 1 A            164       158             305            273 0.364 
#> 2 B             89        65             131            106 0.355 
#> 3 C             59        68             118            108 0.0676
#> 4 D             52        48              90             84 1.00  
#> 5 E            141       107             224            186 0.310

If you want to stick to base R, then you can use apply, but your syntax for apply is not correct here. The function in apply takes each row of your data frame as a vector and calls it x. You then need to use the vector x as the elements inside prop.test, but instead you are passing whole columns from your data frame to prop.test. Since you are passing the same thing each time, you get the same (wrong) p value each time.

In addition, because your first column is a character vector, each row will be coerced into a character vector, so the maths won't work unless you skip the first column in your apply call by using df[-1]

The correct use of apply would be:

df$pval <- apply(df[-1], 1, \(x) prop.test(x = x[1:2], n = x[3:4])$p.value)

df
#>   group n_Baseline n_Endline sample_Baseline sample_Endline       pval
#> 1     A        164       158             305            273 0.36387328
#> 2     B         89        65             131            106 0.35495949
#> 3     C         59        68             118            108 0.06758474
#> 4     D         52        48              90             84 1.00000000
#> 5     E        141       107             224            186 0.30960338

Data from question in reproducible format

df <- structure(list(group = c("A", "B", "C", "D", "E"), n_Baseline = c(164L, 
89L, 59L, 52L, 141L), n_Endline = c(158L, 65L, 68L, 48L, 107L
), sample_Baseline = c(305L, 131L, 118L, 90L, 224L), sample_Endline = c(273L, 
106L, 108L, 84L, 186L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

How to do two proportion prop.test for each single row in a data frame?

There are 1 best solutions below

Related Questions in R

Related Questions in PROPORTIONS

Related Questions in Z-TEST

Trending Questions

Popular # Hahtags

Popular Questions