Want to preface this with heaps of appreciate for gtsummary
-- wonderful package.
After using tidymodels
, GLM
, and gtsummary
for a while, I've been trying to understand gtsummary
's computations for GLM
model performance and confidence intervals.
Can the anyone and/or Dr. Sjoberg + gtsummary team explain the following questions 1 & 2
Question 1: Why are standard errors different when using broom::tidy() vs. parameters::model_parameters() functions to extract model residual data? (Bolded text in print outs shows differences)
library(gtsummary)
library(parameters)
library(rsample)
library(broom)
trial2 <- trial %>% select(age, grade, response, trt) %>%
drop_na()
model_trial2 <- glm(response ~ age + grade + trt,
data = trial2,
family=binomial(link="logit"))
broom::tidy(model_trial2, exponentiate = TRUE)
# # A tibble: 5 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 0.184 **0.630** -2.69 0.00715
# 2 age 1.02 0.0114 1.67 0.0952
# 3 gradeII 0.852 **0.395** -0.406 0.685
# 4 gradeIII 1.01 0.385 0.0199 0.984
# 5 trtDrug B 1.13 **0.321** 0.387 0.699
preadmission_model_parameters <- model_trial2 %>% parameters::model_parameters(exponentiate = TRUE)
preadmission_model_parameters
# Parameter | Odds Ratio | SE | 95% CI | z | p
# ---------------------------------------------------------------
# (Intercept) | 0.18 | **0.12** | [0.05, 0.61] | -2.69 | 0.007
# age | 1.02 | 0.01 | [1.00, 1.04] | 1.67 | 0.095
# grade [II] | 0.85 | **0.34** | [0.39, 1.85] | -0.41 | 0.685
# grade [III] | 1.01 | 0.39 | [0.47, 2.15] | 0.02 | 0.984
# trt [Drug B] | 1.13 | **0.36** | [0.60, 2.13] | 0.39 | 0.699
Question 2: (a) What method does gtsummary use to produce confidence intervals? (b) can the user define (stratified or unstratified) k-fold cross-validation or bootstraps to produce confidence intervals?
(Bolded differences in confidence intervals for the reg_intervals() bootstrapped confidence intervals and the unknown method gtsummary tbl_regression() confidence intervals.)
library(gtsummary)
library(parameters)
library(rsample)
library(broom)
trial2 <- trial %>% select(age, grade, response, trt) %>%
drop_na()
bootstraps(trial2, times = 10)
trial_bootrapped_confidence_intervals <- reg_intervals(response ~ age + grade + trt,
data = trial2,
model_fn = "glm",
keep_reps = TRUE,
family=binomial(link="logit"))
trial_bootrapped_confidence_intervals_exp <- trial_bootrapped_confidence_intervals %>%
select(term:.alpha) %>%
mutate(across(.cols = c(.lower, .estimate, .upper), ~exp(.))) %>%
as_tibble()
trial_bootrapped_confidence_intervals_exp
# # A tibble: 4 × 5
# term .lower .estimate .upper .alpha
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 age 0.997 1.02 1.04 0.05
# 2 gradeII **0.400** 0.846 **1.86** 0.05
# 3 gradeIII 0.473 1.01 2.10 0.05
# 4 trtDrug B 0.600 1.14 2.22 0.05
model_trial2_tbl_regression <-
glm(response ~ age + grade + trt,
data = trial2,
family=binomial(link="logit")) %>%
tbl_regression(
exponentiate = T
) %>%
add_global_p()
model_trial2_tbl_regression_metrics <- model_trial2_tbl_regression$table_body %>%
select(
label,
estimate,
std.error,
statistic,
conf.low ,
conf.high,
p.value
)
model_trial2_tbl_regression_metrics
# A tibble: 8 × 7
# label estimate std.error statistic conf.low conf.high p.value
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Age 1.02 0.0114 1.67 0.997 1.04 0.0909
# 2 Grade NA NA NA NA NA 0.894
# 3 I NA NA NA NA NA NA
# 4 II 0.852 0.395 -0.406 **0.389** **1.85** NA
# 5 III 1.01 0.385 0.0199 0.472 2.15 NA
# 6 Chemotherapy Treatment NA NA NA NA NA 0.699
# 7 Drug A NA NA NA NA NA NA
# 8 Drug B 1.13 0.321 0.387 0.603 2.13 NA
The issue is with the exponentiation (applied as the family is binomial). Broom::tidy does not exponentiate the standard errors but parameters does. You can see this with
broom::tidy(model_trial2, exponentiate = TRUE)
andbroom::tidy(model_trial2, exponentiate = FALSE)
, which return the same standard errors.parameters::model_parameters(exponentiate = TRUE)
andparameters::model_parameters(exponentiate = FALSE)
return different standard errors. When exponentiate is FALSE for parameters, the standard errors match. This is discussed in Check exponentiate behavior in tidy methods #422To create a custom tidier for gtsummary, see FAQ + Gallery