How to calculate SMD between 3 groups or more?

1.7k Views Asked by At

I'm interested in calculating pairwise standardized mean differences(SMD) by one stratifying variable. Usually this is calculated between two groups, but can we make this calculation in 3 groups or more?

P.S. I'm a big fan of gtsummary package, so I attempted to do this analysis using example 2 from this amazing package as follows:

library(tidyverse)
library(gtsummary)
#> #BlackLivesMatter
add_difference_ex2 <-
  trial %>%
  mutate(trt=ifelse(age<40,"Drug C", trt)) %>% 
  select(trt, age, marker, grade, stage) %>%
  tbl_summary(
    by = trt,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no",
    include = c(age, marker, trt)
  ) %>%
  add_n() %>%
  add_difference(adj.vars = c(grade, stage))
#> 11 observations missing `trt` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `trt` column before passing to `tbl_summary()`.
#> Error: 'tbl_summary'/'tbl_svysummary' object must have a `by=` value with exactly two levels

Created on 2021-10-27 by the reprex package (v2.0.1)

2

There are 2 best solutions below

5
On BEST ANSWER

To add the pairwise standardized mean differences (SMD), you first need to define a function that will calculate and return the pairwise SMD estimates. Once you've done that, you can add it to the gtsummary table using the generic function add_stat(). Example Below!

library(gtsummary)
library(tidyverse)

# function to calculate pairwise smd
pairwise_smd <- function(data, variable, by, ...) {
  data <- 
    dplyr::select(data, all_of(c(variable, by))) %>%
    rlang::set_names(c("variable", "by")) %>%
    dplyr::filter(complete.cases(.)) %>%
    arrange(desc(.data$by))
  
  tibble(exclude = unique(data$by)) %>%
    mutate(
      include = map_chr(.data$exclude, ~unique(data$by) %>% setdiff(.x) %>% paste(collapse = " vs. ")),
      data_subset = 
        map(
          .data$exclude, 
          ~data %>%
            filter(!.data$by  %in% .x) %>%
            mutate(by = factor(.data$by))
        ),
      smd = map_dbl(.data$data_subset, ~smd::smd(.x$variable, .x$by)$estimate)
    ) %>%
    select(include, smd) %>%
    spread(include, smd)
}

tbl <-
  trial %>%
  select(age, grade, stage) %>%
  tbl_summary(
    by = grade,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no"
  ) %>%
  add_stat(fns = everything() ~ pairwise_smd)

enter image description here Created on 2021-10-27 by the reprex package (v2.0.1)

0
On

UPDATE: The function is now incorporated into a package, which can be loaded with:

install.packages("devtools")
devtools::install_github("zheer-kejlberg/Z.gtsummary.addons")
library(Z.gtsummary.addons)

I've written some code (heavily inspired by the answer of @DanielD.Sjoberg) to do just this with gtsummary tables at https://github.com/zheer-kejlberg/gtsummary-SMDs.

With the use of %>% add_SMD() you can get pairwise SMDs between any arbitrary number of groups. With the use of the ref_group = TRUE argument, you can also limit the SMDs to just being calculated between the first group and each of the other groups.

The function works on both tbl_summary and tbl_svysummary (with weighting)

Another neat thing is that it can supply both overall SMDs (like add_difference("smd") does) and the per-level-of-categorical-variables SMD -- or even both simultaneously. This is specified via the location argument.

Se an example of its use here:

trial %>% mutate(
  w = weightit(grade ~ age + stage + trt, data = ., focal="I")$weights) %>% # create ATT weights
  survey::svydesign(~1, data = ., weights = ~w) %>% # create the svydesign object
  tbl_svysummary(by = grade, include = c(trt, age, stage)) %>%
  add_SMD(location = "both")

enter image description here