Rounding in frequency tables in R

125 Views Asked by At

I was wondering if anyone proficient in R/RMarkdown would be able to guide me with an issue. I am looking to generate a frequency table and so far, I have been using tableby of the arsenal package as it is easy and convenient to integrate in a RMarkdown docx/html. However, I have been asked to provide rounded frequencies (to the nearest 5 or 10) and have been trying to find ways to do it without much success.

I have generated a fake simple dataset as I cannot share my data for confidentialy reason and this is how I would do a normal table.

set.seed(1234)

library(dplyr)
library(arsenal)

x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()

df <- data.frame(x1,x2)

Control_notrounded <- tableby.control(digits=0,digits.pct=2,cat.stats=c("countpct","Nmiss2"))

table <- tableby(x1~x2,control=Control_notrounded,data=df)
print(summary(table))

However, even though rounding to the nearest 10 with a traditional rounding function is performed by passing digits=-1, this does not seem to be a working approach with that function as I get a warning indicating that digits must be >=0.

Control_rounded <- tableby.control(digits=-1,digits.pct=2,cat.stats=c("countpct","Nmiss2"))
table2 <- tableby(x1~x2,control=Control_rounded,data=df)
print(summary(table2))

Is there any way to do that? Otherwise, would anyone have an alternative package that would allow to create relatively straightforwardly frequency tables with rounded values?

1

There are 1 best solutions below

2
On BEST ANSWER

I can recommend using the gtsummary package for creating baseline tables instead - then try the following round_5_gtsummary() function from this little GitHub package:

set.seed(1234)
library(dplyr)
library(gtsummary)
library(stringr)

x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()
df <- data.frame(x1,x2)

install.packages("devtools")
devtools::install_github("zheer-kejlberg/Z.gtsummary.addons")
library(Z.gtsummary.addons)

df %>% tbl_summary(by = "x1") %>% 
  add_overall(last = TRUE) %>% 
  round_5_gtsummary()  %>%
  add_p()

Result: enter image description here


WEIGHTED VERSION

# Create IPT weights
library(WeightIt)
df$w <- weightit(x1~x2, data = df, estimand = "ATT", focal = "Man")$weights

Use survey to create a svydesign object. Then apply tbl_svysummary() to that:

library(survey)
df %>% survey::svydesign(~1, data = ., weights = ~w) %>%
  tbl_svysummary(by = "x1", include=c(x2)) %>%
  add_overall(last = TRUE) %>%
  round_5_gtsummary() %>%
  add_p()

ALTERNATIVE WAY:

To use the built-in tbl_summary(digits=) argument to separately round the counts and percentages, you can do:

library(gtsummary)
library(dplyr)
set.seed(1234)

round_5 <- function(vec) {
  fun <- function(x) {
    if (x < 1) { return(round(x*100/5)*5)
    } else { return(round(x/5)*5) }
  }
  vec <- purrr::map_vec(vec, .f = fun)
}

df <- data.frame(
  x1 = c(rep("Man", 40), rep("Woman", 60)) %>% as.factor(),
  x2 = sample(c("Sick", "Healthy"), 100, replace = TRUE) %>% as.factor()
)

df %>% 
  tbl_summary(
    by = "x1",
    digits = all_categorical() ~ round_5
  ) %>% 
  add_overall(last = TRUE) %>% 
  add_p()

Results:

enter image description here

Note, this version doesn't recalculate percentages after rounding the counts; rather, it just rounds both separately.