I need to get the positivity rate by county for multiple drugs and multiple counties. The real data includes more drugs and more counties. The NA values mean that the test was not ordered so they should not be counted as the total number of samples.
Also, I need to use prop.test(x, y, conf.level = 0.95) to obtain the proportions and the confidence interval for each county.
Here is a mock data:
county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data <- data.frame(county, drug1, drug2)
The problem is that I cannot use a simple groupby and summarise to calculate the rate, like shown below. I have to use the prop.test() and get a rate and a confidence interval for each county.
data |>
group_by(county) |>
summarise(drug1 = sum(drug1=="Positive", na.rm = TRUE)/
sum(!is.na(drug1)),
drug2 = sum(drug2=="Positive", na.rm = TRUE)/
sum(!is.na(drug2))
) |>
ungroup()
It's better to keep the data in entirely "long" format. You can convert it by using f.ex.
meltfromreshape2.Then you
splitinto groups, by specifying factors, or by using a formula (as shown).lapplyover the groups, sending atableof the values column toprop.test.The result is a list of
htestobjects which you can extract values from by again usinglapply. Either one at a time…Or multiple.