I have the following code, that performs bootstrapping and calculates the confidence interval.
library(resample)
ibrary(broom)
library(dplyr)
library(purrr)
library(tibble)
lm_est <- function(split, ...) {
lm(mpg ~ disp + hp, data = analysis(split)) %>%
tidy()
}
set.seed(52156)
car_rs <-
bootstraps(mtcars, 500, apparent = TRUE) %>%
mutate(results = map(splits, lm_est))
int_pctl(car_rs, results) # this is important
It produces
> int_pctl(car_rs, results)
# A tibble: 3 × 6
term .lower .estimate .upper .alpha .method
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) 27.7 31.0 34.1 0.05 percentile
2 disp -0.0431 -0.0295 -0.0123 0.05 percentile
3 hp -0.0643 -0.0281 -0.00930 0.05 percentile
But it runs very slowly. How can I speed it up with parallelization? Note that the output of the parallelization needs to be able to be processed by int_pctl().
I tried this but failed:
library(parallel)
# set the number of cores to use for parallelization
cores <- detectCores() - 1
cl <- makeCluster(cores)
# use mcmapply to parallelize the bootstrapping process
car_rs$results <- mcmapply(lm_est, car_rs$splits, mc.cores = cores, mc.preschedule = TRUE)
stopCluster(cl)
There are parallel versions of
purrr::map*()
functions in the furrr package that you can use.Created on 2023-01-26 by the reprex package (v2.0.1)