How to extract the Wald p-value from a specific set of linear models with nested data

35 Views Asked by At

This is a 'many models' problem.

I have a set of 16 binary candidate variables. I want to see the 'contribution' of each to a set of outcome variables (n=15), where the other covariates change depending on the candidate to be used.

Candidates = c("C1","C2",...)
covariates = c("age","sex"...)
outcomes = c("Cancer","Heart disease",...)

First I nested the data by outcome:

ndat <- data |> group_by(outcome) |> nest()

The model to be used for each candidate is written like this...

lm(value ~ C1 + age + sex, data = df)}

model2 = function(df){
lm(value ~ C2 + sex + ethnicity, data = df)}

I then apply each model to the nested data like this:

ndat <- ndat |>
mutate(m1 = map(data, model1),
m2 = map(data, model2),
...
)

ndat is now a grouped tibble, the first column is a vector of the outcomes. Each column is then a model (model1, model2 etc), and the elements in each of these columns are lists for the linear models for each outcome.

What I need is to now extract the Wald p-value of the candidate variable from each model. In each column the candidate is different, and each column is a list of lists (I think). How can I do this cleanly?

I've tried using broom::glance but this returns the overall model metrics, p-values etc. I've also tried all manner of subscripting the ndat tibble, and although I can extract a single Wald value for a candidate in the first model with summary(ndat$model1[[1]])$coefficients[,4][[1]], I can't scale this up to apply to all the models in all the model columns.

What I'm hoping for, is for each ndat model column to turn into a vector of the p-value of the candidate variable for that model, on the row corresponding to the correct outcome.

Here's a reprex using mtcars.

data <- mtcars |>
  mutate(c1  = rbinom(nrow(mtcars),prob=0.05, size = 1),
         c2 = rbinom(nrow(mtcars), prob = 0.1, size =1),
         c3 = rbinom(nrow(mtcars), prob = 0.5, size = 1))
#> Error in mutate(mtcars, c1 = rbinom(nrow(mtcars), prob = 0.05, size = 1), : could not find function "mutate"


candidates <- c("c1","c2","c3")
covars <- c("disp","hp","drat","wt")
outcomes <- c("mpg","qsec")

outcome_cols <- names(data)[names(data) %in% outcomes]

dat_long <- data |>
  pivot_longer(cols=all_of(outcome_cols), names_to = "outcome", values_to = "value")
#> Error in pivot_longer(data, cols = all_of(outcome_cols), names_to = "outcome", : could not find function "pivot_longer"

dat_n <- dat_long |>
  group_by(cyl) |>
  nest()
#> Error in nest(group_by(dat_long, cyl)): could not find function "nest"

c_models <- c("c1_mod","c2_mod","c3_mod")

c1_mod <- function(df){
  lm(value ~ c1 + disp + hp, data = df)
}

c2_mod <- function(df){
  lm(value ~ c2 + disp + drat, data = df)
}

c3_mod <- function(df){
  lm(value ~ c3 + drat + wt, data = df)
}

dat_n <- dat_n |>
  mutate(c1 = map(data, c1_mod),
         c2 = map(data, c2_mod),
         c3 = map(data, c3_mod))
#> Error in mutate(dat_n, c1 = map(data, c1_mod), c2 = map(data, c2_mod), : could not find function "mutate"

#this works for 1 model
glancec1 <- dat_n |>
  mutate(glance = map(c1, broom::glance)) |>
  unnest(glance) |>
  select(cyl, BIC, adj.r.squared) |>
  mutate(model = "c1")
#> Error in mutate(select(unnest(mutate(dat_n, glance = map(c1, broom::glance)), : could not find function "mutate"

#but this does not (when trying to extend the above for all the models)

glances <- data.frame()


for (i in c_models) {
  print(i)
  glance <- dat_n |>
    mutate(glance = map(i, broom::glance)) |>
    unnest(glance) |>
    select(cyl, BIC, adj.r.squared) |>
    mutate(model = as.character(i))
  
  glances <- bind_rows(glances, glance)
}
#> [1] "c1_mod"
#> Error in mutate(select(unnest(mutate(dat_n, glance = map(i, broom::glance)), : could not find function "mutate"

Created on 2023-06-23 with reprex v2.0.2

1

There are 1 best solutions below

1
Claire Welsh On

A colleague stepped in here and gave me a lovely answer, posting for future reference!

    dat_n |>
        mutate(glance = map(.data[[cmod]], broom::glance)) |>
        unnest(glance) |>
        select(cyl, BIC, adj.r.squared) |>
        mutate(model = cmod)
}
glancec1 <- glance_fun("c1", dat_n)
c_models <- c("c1","c2","c3")
glances <- bind_rows(map(c_models, glance_fun, dat_n))```