Obtaining proportions and confidence intervals with prop.test() for multiple groups in R

90 Views Asked by Mando At 02 September 2023 at 00:15

I need to get the positivity rate by county for multiple drugs and multiple counties. The real data includes more drugs and more counties. The NA values mean that the test was not ordered so they should not be counted as the total number of samples.

Also, I need to use prop.test(x, y, conf.level = 0.95) to obtain the proportions and the confidence interval for each county.

Here is a mock data:

county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data <- data.frame(county, drug1, drug2)

The problem is that I cannot use a simple groupby and summarise to calculate the rate, like shown below. I have to use the prop.test() and get a rate and a confidence interval for each county.

data |> 
  group_by(county) |> 
  summarise(drug1 = sum(drug1=="Positive", na.rm = TRUE)/
              sum(!is.na(drug1)),
            drug2 = sum(drug2=="Positive", na.rm = TRUE)/
              sum(!is.na(drug2))
            ) |> 
  ungroup()

Original Q&A

There are 1 best solutions below

AkselA On 02 September 2023 at 18:21 BEST ANSWER

county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", 
  "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, 
  "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", 
  "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data1 <- data.frame(county, drug1, drug2)

It's better to keep the data in entirely "long" format. You can convert it by using f.ex. melt from reshape2.

library(reshape2)

data1 <- melt(data1, id="county", variable.name="drug")

Then you split into groups, by specifying factors, or by using a formula (as shown). lapply over the groups, sending a table of the values column to prop.test.

prop.res <- lapply(split(data1, ~ county+drug), function(x) prop.test(table(x$value)))

The result is a list of htest objects which you can extract values from by again using lapply. Either one at a time…

lapply(prop.res, `[[`, "p.value")
# $Erie.drug1
# Negative 
#    0.375 
# 
# $Orange.drug1
# Negative 
#  0.21875 
# 
# $Erie.drug2
# Negative 
#   0.6875 
# 
# $Orange.drug2
# [1] 1
#

Or multiple.

lapply(prop.res, `[`, c("p.value", "conf.int"))
# $Erie.drug1
# $Erie.drug1$p.value
# Negative 
#    0.375 
# 
# $Erie.drug1$conf.int
# [1] 0.0050507634 0.7164179361
# attr(,"conf.level")
# [1] 0.95
# 
# 
# $Orange.drug1
# $Orange.drug1$p.value
# Negative 
#  0.21875 
# 
# $Orange.drug1$conf.int
# [1] 0.35876542 0.99578926
# attr(,"conf.level")
# [1] 0.95
# 
# 
# $Erie.drug2
# $Erie.drug2$p.value
# Negative 
#   0.6875 
# …

Obtaining proportions and confidence intervals with prop.test() for multiple groups in R

There are 1 best solutions below

Related Questions in R

Related Questions in STATISTICS

Related Questions in CONFIDENCE-INTERVAL

Related Questions in RATE

Related Questions in PROPORTIONS

Trending Questions

Popular # Hahtags

Popular Questions