I want to calculate confidence intervals on a distribution of slope estimates from bootstrapped linear regression models AND extract regression summary statistics (e.g., r.squared) for each of the bootstrapped models on grouped data. I figured out how to calculate the confidence intervals on these data using dplyr::group_modify()
and rsample::int_pctl
(related question here), but I can't figure out how to keep the regression summary stats (e.g., using broom::glance()
) with each model at the same time that I calculate the confidence intervals.
library(tidyverse)
library(tidymodels)
set.seed(27)
# Here are the data
dat <-
structure(list(site = c("mb", "mb", "mb", "mb", "mb", "mb", "mb",
"mb", "sp", "sp", "sp", "sp", "sp", "sp", "sp", "sp"), year = c(2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015), yday = c(15, 15, 35, 35, 48, 48, 69,
69, 15, 15, 37, 37, 49, 49, 69, 69), samp_depth_cat2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Mid-2",
"Bottom"), class = "factor"), analyte = c("NO3", "NO3", "NO3",
"NO3", "NO3", "NO3", "NO3", "NO3", "NH4", "NH4", "NH4", "NH4",
"NH4", "NH4", "NH4", "NH4"), conc = c(44.8171069465267, 44.7775358035268,
33.3678662097523, 33.0710828871279, 25.8427604055115, 26.9309658742058,
23.7585524380667, 17.5240386949382, 8.35832733633183, 9.29280745341615,
10.0797380595417, 10.2322058970515, 13.7930668951239, 15.6226805882773,
25.3003042764332, 16.8723637466981)), row.names = c(NA, -16L), class = c("tbl_df",
"tbl", "data.frame"))
# This is how I bootstrap the regressions and calculate confidence intervals:
lm_boot_CI <-
dat %>%
group_by(site, year, samp_depth_cat2, analyte) %>%
group_modify(
~ bootstraps(., times = 100, apparent = TRUE) %>%
mutate(
model = map(splits, ~ lm(conc ~ yday, data = .)),
coefs = map(model, tidy)
) %>%
int_pctl(coefs)
)
# I am able to keep the row of summary stats for each model and unnest it
# IF I do a separate bootstrapping routine on the same data:
lm_boot_R2 <-
dat %>%
group_by(site, year, samp_depth_cat2, analyte) %>%
group_modify(
~ bootstraps(., times = 100, apparent = TRUE) %>%
mutate(
model = map(splits, ~ lm(conc ~ yday, data = .)),
coefs = map(model, tidy),
glanced = map(model, glance)
)
) %>%
unnest(glanced)
So the question is, how do I integrate these two code chunks to accomplish both at once on the same bootstrapped models (i.e., use int_pctl()
and glance()
)?
I don't know if you can do this all in one go, because the "shape" of the output you are looking for is so different. I think this makes
group_modify()
not a great option since it is mostly for "dataframe in, dataframe out".One way to approach this would be to start out by training all the models, using
map()
two layers deep:That
boots
column has the models plus the coefficients. Now you can compute the bootstrap confidence intervals, and get out the per-modelglance()
results.Created on 2021-09-08 by the reprex package (v2.0.1)
These are quite different "shapes", though, like I said, so I'm not sure what you want to do from here.