For starters, the data comes from the us_contagious_diseases dataset, and the packages are tidyverse and ggpubr
library(dslabs)
library(ggpubr)
library(tidyverse)
data("us_contagious_diseases")
I modified this dataset via the code below:
sdf <- us_contagious_diseases %>% filter(., disease == 'Rubella' | disease == 'Mumps') %>% transmute(., disease, count, population, state)
Then I created a boxplot comparing the numbers of Rubella and Mumps cases in each State:
sdf_plot <- ggplot(sdf, mapping = aes(x = disease, y = count)) + geom_boxplot(outlier.shape = NA) + facet_wrap('state', scales = 'free') + stat_compare_means(method = 't.test', label.y.npc = 0.8)
The thing is, there are FIFTY ONE plots in this figure!!! That's wayyyy to huge to include in my report. More importantly, many of these comparisons don't have significant p-values. Is there a way I can pull just those plots that have a p value less than 0.01?
I guess you need to pre-calculate the p-values:
Then plot using this filter: