How to subset a survey object in R and use the tbl_svysummary with the add_difference function

Question

How to subset a survey object in R and use the tbl_svysummary with the add_difference function

144 Views Asked by r_rabbit At 07 February 2024 at 19:51

I have a survey that I need to filter to include only two specific categories for comparison. A simulation of it can be reproduced with this code:

library(survey)
library(gtsummary)
data(api)

svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) 

svy <- subset(svy, stype %in% c("E", "H"))

And I am trying to use this code to create a table with the differences:

svy %>%
  tbl_svysummary(by = stype, include = c(api00, stype),
                 statistic = list(all_categorical() ~ "{p}%",
                     all_continuous() ~ "{mean}"),
    digits = ~ 2,
    missing = "no") %>% 
    add_difference()

And getting this error: Error: 'tbl_summary'/'tbl_svysummary' object must have a by= value with exactly two levels.

How could I drop the level to remain just 2 and therefore use add_difference?

I use the filter in the dataframe, but I'm afraid it will cause me to lose the lifting weights.

Editing this post from the answer from @astra bellow, I've made a small example with my data, but it not works. Works only with the example above

install.packages("PNSIBGE")
library(PNSIBGE)
library(tidyverse)
pns <- get_pns(year = 2019, labels = T)
pns.2 <- subset(pns, C009  %in% c("Branca", "Preta")) 
pns.2$variables$C009 <- droplevels(pns.2$variables$C009)
pns.2 %>%
    tbl_svysummary(by = C009, include = c(C006)) %>% 
    add_difference()

Original Q&A

There are 3 best solutions below

**Alex Strashny** · Answer 1 · 2024-02-07T21:55:14.917000

Alex Strashny On 07 February 2024 at 21:55

Use the droplevels() function:

svy$variables$stype %<>% droplevels

**Thomas Lumley** · Answer 2 · 2024-02-09T01:16:50.603000

This is your problem:

> str(pns.2$variables$C009)
 Factor w/ 6 levels "Branca","Preta",..: 1 4 2 2 4 2 1 1 4 4 ...
> table(pns.2$variables$C009)

  Branca    Preta  Amarela    Parda Indígena Ignorado 
   99019    28304     1698   148273     2064       24 
> svytable(~C009,pns.2)
C009
  Branca    Preta  Amarela    Parda Indígena Ignorado 
91037722 21786515        0        0        0        0

The issue is that a survey design object isn't just a data frame. In particular, your one has post-stratification information in it. In that situation (and some others) the survey package removes observations by setting the weight to zero rather than by dropping rows (dropping rows is a computational shortcut that isn't always available).

So, you can't just use droplevels, because all the levels are (correctly) still there. If you're working with functions from the survey package you can use factor with an explicit set of levels

> svytotal(~factor(C009,levels=c("Branca","Preta")), pns.2,na.rm=TRUE)
                                                     total     SE
factor(C009, levels = c("Branca", "Preta"))Branca 91037722 686604
factor(C009, levels = c("Branca", "Preta"))Preta  21786515 338021

or equivalently

> pns.2<-update(pns.2, C009new=factor(C009,levels=c("Branca","Preta")))
> svytotal(~C009new, pns.2,na.rm=TRUE)
                 total     SE
C009newBranca 91037722 686604
C009newPreta  21786515 338021

but that doesn't seem to work with tbl_svysummary.

**Joseph Larmarange** · Answer 3 · 2024-02-09T17:50:28.990000

First of all, the default test used by add_difference() is a "t.test" and such test is not available for a table produced by tbl_svysummary(). You should use "smd" method. See https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-

Second, to manipulate survey objects, it is often easier to use srvyr package which allows to use dplyr verbs.

library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#> 
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#> 
#>     dotchart
library(srvyr)
#> 
#> Attachement du package : 'srvyr'
#> L'objet suivant est masqué depuis 'package:stats':
#> 
#>     filter
library(gtsummary)
library(forcats)

data(api)

svy <- apiclus1 |> 
  as_survey_design(id = dnum, weights = pw, fpc = fpc) |> 
  filter(stype %in% c("E", "H")) |> 
  mutate(stype = fct_drop(stype))

svy  |> 
  tbl_svysummary(
    by = stype,
    include = api00,
    statistic = all_continuous() ~ "{mean} ({sd})"
  ) |> 
  add_difference(test = all_continuous() ~ "smd") |> 
  as_kable()

Characteristic	E, N = 4,874	H, N = 474	Difference	95% CI
api00	649 (106)	619 (97)	0.30	0.21, 0.40

^{Created on 2024-02-09 with reprex v2.0.2}

How to subset a survey object in R and use the tbl_svysummary with the add_difference function

There are 3 best solutions below

Related Questions in R

Related Questions in SURVEY

Related Questions in GTSUMMARY

Trending Questions

Popular # Hahtags

Popular Questions