This is a 'many models' problem.
I have a set of 16 binary candidate variables. I want to see the 'contribution' of each to a set of outcome variables (n=15), where the other covariates change depending on the candidate to be used.
Candidates = c("C1","C2",...)
covariates = c("age","sex"...)
outcomes = c("Cancer","Heart disease",...)
First I nested the data by outcome:
ndat <- data |> group_by(outcome) |> nest()
The model to be used for each candidate is written like this...
lm(value ~ C1 + age + sex, data = df)}
model2 = function(df){
lm(value ~ C2 + sex + ethnicity, data = df)}
I then apply each model to the nested data like this:
ndat <- ndat |>
mutate(m1 = map(data, model1),
m2 = map(data, model2),
...
)
ndat is now a grouped tibble, the first column is a vector of the outcomes. Each column is then a model (model1, model2 etc), and the elements in each of these columns are lists for the linear models for each outcome.
What I need is to now extract the Wald p-value of the candidate variable from each model. In each column the candidate is different, and each column is a list of lists (I think). How can I do this cleanly?
I've tried using broom::glance but this returns the overall model metrics, p-values etc. I've also tried all manner of subscripting the ndat tibble, and although I can extract a single Wald value for a candidate in the first model with summary(ndat$model1[[1]])$coefficients[,4][[1]], I can't scale this up to apply to all the models in all the model columns.
What I'm hoping for, is for each ndat model column to turn into a vector of the p-value of the candidate variable for that model, on the row corresponding to the correct outcome.
Here's a reprex using mtcars.
data <- mtcars |>
mutate(c1 = rbinom(nrow(mtcars),prob=0.05, size = 1),
c2 = rbinom(nrow(mtcars), prob = 0.1, size =1),
c3 = rbinom(nrow(mtcars), prob = 0.5, size = 1))
#> Error in mutate(mtcars, c1 = rbinom(nrow(mtcars), prob = 0.05, size = 1), : could not find function "mutate"
candidates <- c("c1","c2","c3")
covars <- c("disp","hp","drat","wt")
outcomes <- c("mpg","qsec")
outcome_cols <- names(data)[names(data) %in% outcomes]
dat_long <- data |>
pivot_longer(cols=all_of(outcome_cols), names_to = "outcome", values_to = "value")
#> Error in pivot_longer(data, cols = all_of(outcome_cols), names_to = "outcome", : could not find function "pivot_longer"
dat_n <- dat_long |>
group_by(cyl) |>
nest()
#> Error in nest(group_by(dat_long, cyl)): could not find function "nest"
c_models <- c("c1_mod","c2_mod","c3_mod")
c1_mod <- function(df){
lm(value ~ c1 + disp + hp, data = df)
}
c2_mod <- function(df){
lm(value ~ c2 + disp + drat, data = df)
}
c3_mod <- function(df){
lm(value ~ c3 + drat + wt, data = df)
}
dat_n <- dat_n |>
mutate(c1 = map(data, c1_mod),
c2 = map(data, c2_mod),
c3 = map(data, c3_mod))
#> Error in mutate(dat_n, c1 = map(data, c1_mod), c2 = map(data, c2_mod), : could not find function "mutate"
#this works for 1 model
glancec1 <- dat_n |>
mutate(glance = map(c1, broom::glance)) |>
unnest(glance) |>
select(cyl, BIC, adj.r.squared) |>
mutate(model = "c1")
#> Error in mutate(select(unnest(mutate(dat_n, glance = map(c1, broom::glance)), : could not find function "mutate"
#but this does not (when trying to extend the above for all the models)
glances <- data.frame()
for (i in c_models) {
print(i)
glance <- dat_n |>
mutate(glance = map(i, broom::glance)) |>
unnest(glance) |>
select(cyl, BIC, adj.r.squared) |>
mutate(model = as.character(i))
glances <- bind_rows(glances, glance)
}
#> [1] "c1_mod"
#> Error in mutate(select(unnest(mutate(dat_n, glance = map(i, broom::glance)), : could not find function "mutate"
Created on 2023-06-23 with reprex v2.0.2
A colleague stepped in here and gave me a lovely answer, posting for future reference!