How to improve efficiency for doing chi-squre test for over 10 outcomes and 5 variables?

35 Views Asked by At
data <- data.frame(
  sex = factor(c("M", "F", "M")),
  ageid = factor(c(8, 6, 7)),
  married = factor(c(2, 1, 2)),
  cagv_typ = factor(c("non-primary", "primary", "non-primary")),
  sq5_1 = factor(c(1, 1, 1)),
  sq5_2 = factor(c(0, 1, 0))
)

Among this dataframe, sex and married are variable, and the rest of them are outcomes. Actually I have more than 10 outcome variables and 5 subgroup variables.

At first, I made the following codes:

chisq_test <- function(data, var1, var2) {
  contingency_table <- table(data[[var1]], data[[var2]])
  test_result <- chisq.test(contingency_table)
  return(test_result)
}

chisq_test(data = sq_catvar, var1 = "sex", var2 = "cagv_typ")

However, I found it still is super time-consuming if I manually input the outcome and variables one by one. Thus, I wonder if there is better approach to do chi-square test with reduced time.

Thank you in advance.

Best wishes

1

There are 1 best solutions below

3
Allan Cameron On BEST ANSWER

You can use expand.grid to get all the combinations you are looking for:

combos <- expand.grid(x = names(data)[c(1, 3)], y = names(data)[-c(1, 3)])

combos
#>         x        y
#> 1     sex    ageid
#> 2 married    ageid
#> 3     sex cagv_typ
#> 4 married cagv_typ
#> 5     sex    sq5_1
#> 6 married    sq5_1
#> 7     sex    sq5_2
#> 8 married    sq5_2

And we can use apply to iterate down this data frame and apply your chisq_test function to each combination of variables. This will return a list of 8 chi-square tests:

combos$pval <- apply(combos, 1, function(x) chisq_test(data, x[1], x[2])$p.val)

combos
#>         x        y      pval
#> 1     sex    ageid 0.2231302
#> 2 married    ageid 0.2231302
#> 3     sex cagv_typ 0.6650055
#> 4 married cagv_typ 0.6650055
#> 5     sex    sq5_1 0.5637029
#> 6 married    sq5_1 0.5637029
#> 7     sex    sq5_2 0.6650055
#> 8 married    sq5_2 0.6650055

This will easily scale up to five x variables and 10 y variables using the same code.

Please remember that if you are carrying out 50 Chi square tests, the p values will not be valid due to multiple hypothesis testing, and you will need a Bonferroni correction or similar to take account of the fact that you would expect 2 or 3 "significant" results purely by chance with this many significance tests.

Created on 2023-09-12 with reprex v2.0.2