I want to calculate confidence intervals on a distribution of slope estimates from bootstrapped linear regression models AND extract regression summary statistics (e.g., r.squared) for each of the bootstrapped models on grouped data. I figured out how to calculate the confidence intervals on these data using dplyr::group_modify() and rsample::int_pctl (related question here), but I can't figure out how to keep the regression summary stats (e.g., using broom::glance()) with each model at the same time that I calculate the confidence intervals.

library(tidyverse)
library(tidymodels)

set.seed(27)

# Here are the data
dat <- 
  structure(list(site = c("mb", "mb", "mb", "mb", "mb", "mb", "mb", 
  "mb", "sp", "sp", "sp", "sp", "sp", "sp", "sp", "sp"), year = c(2015, 
  2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
  2015, 2015, 2015, 2015), yday = c(15, 15, 35, 35, 48, 48, 69, 
  69, 15, 15, 37, 37, 49, 49, 69, 69), samp_depth_cat2 = structure(c(1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Mid-2", 
  "Bottom"), class = "factor"), analyte = c("NO3", "NO3", "NO3", 
  "NO3", "NO3", "NO3", "NO3", "NO3", "NH4", "NH4", "NH4", "NH4", 
  "NH4", "NH4", "NH4", "NH4"), conc = c(44.8171069465267, 44.7775358035268, 
  33.3678662097523, 33.0710828871279, 25.8427604055115, 26.9309658742058, 
  23.7585524380667, 17.5240386949382, 8.35832733633183, 9.29280745341615, 
  10.0797380595417, 10.2322058970515, 13.7930668951239, 15.6226805882773, 
  25.3003042764332, 16.8723637466981)), row.names = c(NA, -16L), class = c("tbl_df", 
  "tbl", "data.frame"))

# This is how I bootstrap the regressions and calculate confidence intervals:
  lm_boot_CI <-
    dat %>% 
    group_by(site, year, samp_depth_cat2, analyte) %>% 
    group_modify(
      ~ bootstraps(., times = 100, apparent = TRUE) %>%
        mutate(
          model = map(splits, ~ lm(conc ~ yday, data = .)),
          coefs = map(model, tidy)
        ) %>%
        int_pctl(coefs)
    )

# I am able to keep the row of summary stats for each model and unnest it 
# IF I do a separate bootstrapping routine on the same data:
  lm_boot_R2 <-
    dat %>% 
    group_by(site, year, samp_depth_cat2, analyte) %>% 
    group_modify(
      ~ bootstraps(., times = 100, apparent = TRUE) %>%
        mutate(
          model = map(splits, ~ lm(conc ~ yday, data = .)),
          coefs = map(model, tidy),
          glanced = map(model, glance)
        )
    ) %>% 
    unnest(glanced)

So the question is, how do I integrate these two code chunks to accomplish both at once on the same bootstrapped models (i.e., use int_pctl() and glance())?

1

There are 1 best solutions below

2
On

I don't know if you can do this all in one go, because the "shape" of the output you are looking for is so different. I think this makes group_modify() not a great option since it is mostly for "dataframe in, dataframe out".

One way to approach this would be to start out by training all the models, using map() two layers deep:

library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

set.seed(27)

# Here are the data
dat <- 
    structure(list(site = c("mb", "mb", "mb", "mb", "mb", "mb", "mb", 
                            "mb", "sp", "sp", "sp", "sp", "sp", "sp", "sp", "sp"), 
                   year = c(2015, 
                            2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
                            2015, 2015, 2015, 2015), yday = c(15, 15, 35, 35, 48, 48, 69, 
                                                              69, 15, 15, 37, 37, 49, 49, 69, 69), 
                   samp_depth_cat2 = structure(c(1L, 
                                                 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                               .Label = c("Mid-2", 
                                                          "Bottom"), class = "factor"), 
                   analyte = c("NO3", "NO3", "NO3", 
                               "NO3", "NO3", "NO3", "NO3", "NO3", "NH4", "NH4", "NH4", "NH4", 
                               "NH4", "NH4", "NH4", "NH4"), conc = c(44.8171069465267, 44.7775358035268, 
                                                                     33.3678662097523, 33.0710828871279, 25.8427604055115, 26.9309658742058, 
                                                                     23.7585524380667, 17.5240386949382, 8.35832733633183, 9.29280745341615, 
                                                                     10.0797380595417, 10.2322058970515, 13.7930668951239, 15.6226805882773, 
                                                                     25.3003042764332, 16.8723637466981)), 
              row.names = c(NA, -16L), class = c("tbl_df", 
                                                 "tbl", "data.frame"))

lm_boot_mods <- 
    dat %>%
    nest(data = c(conc, yday)) %>%
    mutate(boots = map(
        data,  
        ~ bootstraps(., times = 100, apparent = TRUE) %>%
            mutate(model = map(.$splits, ~ lm(conc ~ yday, data = .)),
                   coefs = map(model, tidy),
                   glanced = map(model, glance))
    ))

lm_boot_mods
#> # A tibble: 2 × 6
#>   site   year samp_depth_cat2 analyte data             boots                 
#>   <chr> <dbl> <fct>           <chr>   <list>           <list>                
#> 1 mb     2015 Mid-2           NO3     <tibble [8 × 2]> <bootstraps [101 × 5]>
#> 2 sp     2015 Bottom          NH4     <tibble [8 × 2]> <bootstraps [101 × 5]>

That boots column has the models plus the coefficients. Now you can compute the bootstrap confidence intervals, and get out the per-model glance() results.

lm_boot_ci <-
    lm_boot_mods %>%
    mutate(pctl = map(boots, ~ int_pctl(., coefs)),
           glanced = map(boots, "glanced"))
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `(Intercept)`, `yday`.

#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `(Intercept)`, `yday`.

lm_boot_ci
#> # A tibble: 2 × 8
#>   site   year samp_depth_cat2 analyte data             boots     pctl    glanced
#>   <chr> <dbl> <fct>           <chr>   <list>           <list>    <list>  <list> 
#> 1 mb     2015 Mid-2           NO3     <tibble [8 × 2]> <bootstr… <tibbl… <list …
#> 2 sp     2015 Bottom          NH4     <tibble [8 × 2]> <bootstr… <tibbl… <list …

lm_boot_ci %>%
    select(pctl, glanced)
#> # A tibble: 2 × 2
#>   pctl             glanced     
#>   <list>           <list>      
#> 1 <tibble [2 × 6]> <list [101]>
#> 2 <tibble [2 × 6]> <list [101]>

Created on 2021-09-08 by the reprex package (v2.0.1)

These are quite different "shapes", though, like I said, so I'm not sure what you want to do from here.