Colors in grouped boxplots in R

58 Views Asked by At

I am creating grouped boxplots for a stratified sample using ggplot. As a made up example, on the y-axis is the value (say, test scores) and on the x-axis are the categories (White, Black, and Asian. But I also am gruoping eaching category by another variable (say, male vs. female). So, there will be 6 boxes.

I want the categories to have the colors red, blue, and purple. But I want the sex grouping variable to make the colors light and dark (e.g., male asian = light purple and female asians = dark purple).

Is this possible? Thanks!

I have searched pretty extensively online and not found a solution. I am sure I can't be the first person doing this, but I haven't found an explanation for how to do this yet.

1

There are 1 best solutions below

0
On

There are a couple of different ways to do this. You could use the alpha aesthetic:

ggplot(df, aes(Race, Score, fill = Race, alpha = Sex,, 
               group = interaction(Sex, Race))) +
  geom_boxplot(outlier.colour = NA) +
  geom_point(size = 0.5, position = position_jitterdodge(0.1), alpha = 0.2) +
  scale_fill_manual(NULL, values = c("deepskyblue", "purple", "red")) +
  scale_y_continuous(limits = c(0, 6), expand = c(0, 0)) +
  theme_minimal(base_size = 20) +
  labs(x = NULL) +
  scale_alpha_manual(values = c(0.5, 1)) +
  guides(alpha = guide_legend(override.aes = list(fill = "gray"))) +
  theme(panel.grid.major.x = element_blank())

enter image description here

Or fill by the interaction of Sex and Race

ggplot(df, aes(Race, Score, fill = interaction(Sex, Race, sep = " "), 
               group = interaction(Sex, Race))) +
  geom_boxplot(alpha = 0.7, outlier.colour = NA) +
  geom_point(size = 0.5, position = position_jitterdodge(0.1), alpha = 0.2) +
  scale_fill_manual(NULL, values = c("deepskyblue3", "deepskyblue",
                                     "purple3", "purple",
                                     "red3", "red")) +
  scale_y_continuous(limits = c(0, 6), expand = c(0, 0)) +
  theme_minimal(base_size = 20) +
  labs(x = NULL) +
  theme(panel.grid.major.x = element_blank())

enter image description here

Personally, I find this use of color gratuitous and confusing. When presenting data, color should be used to clearly identify groupings, but there is no need to do this if the grouping are already apparent on the axis. I think the following is cleaner, classier and easier to understand at a glance (as well as being colorblind safe)

ggplot(df, aes(Race, Score, fill = Sex, group = interaction(Sex, Race))) +
  geom_boxplot(alpha = 0.5, outlier.colour = NA, width = 0.3,
               position = position_dodge(0.5)) +
  geom_point(size = 0.5, position = position_jitterdodge(0.05, 0, 0.5), 
             alpha = 0.2) +
  scale_fill_manual(NULL, values = c("gold", "deepskyblue4")) +
  scale_y_continuous(limits = c(0, 6), expand = c(0, 0)) +
  theme_minimal(base_size = 20) +
  labs(x = NULL) +
  theme(panel.grid.major.x = element_blank(), legend.position = "top")

enter image description here