ggplot scatterplot for 2 categorical variables, 1 categorical variable by color

196 Views Asked by At

I like the ability to easily separate data into different series using ggboxplot. The x-axis labels can remain easy to read while a 2nd categorical variable is shown via adjacent colored series.

p <- ggboxplot(df_dummy, x="Trt_Amend", y="Carbon_percent", color="Trt_CC",
               palette=c("red", "blue"),
               main="Great Plot Title",
               xlab="1st Categorical Variable",
               ylab="Continuous Variable") +
  theme(plot.title = element_text(hjust = 0.5)) + # Center plot title.
  grids(linetype="dashed") +
  border("black")
ggpar(p, x.text.angle=45,
      legend.title="2nd Categorical Variable",
      font.main=14,
      ylim=c(0.6, 1.6))

enter image description here

Using boxplots isn't always appropriate though, like when each group has a low number of observations (< 20). Can someone help me figure out how to do this in a ggplot using geom_point?

# How to separate colored series using geom_point?
ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color=Trt_CC)) +
  geom_point()

enter image description here

Thanks for reading!

1

There are 1 best solutions below

2
On BEST ANSWER

The first step would be to dodge your points using position = position_dodge(.75) or to add some jitter using position_jitterdodge() as I do below. The rest of the code is - similar to ggpubr:: ggboxplot - just styling.

Using some fake random example data:

set.seed(123)

df_dummy <- data.frame(
  Trt_Amend = paste0("Group", 1:5),
  Trt_CC = rep(factor(0:1), each = 5),
  Carbon_percent = rnorm(80, mean = 1, sd = .1)
)

library(ggplot2)

ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color = Trt_CC)) +
  geom_boxplot(width = .6, outlier.shape = NA) +
  geom_point(
    position = position_jitterdodge(jitter.width = .3)
  ) +
  scale_color_manual(values = c("red", "blue")) +
  labs(
    title = "Great Plot Title",
    x = "1st Categorical Variable",
    y = "Continuous Variable",
    color = "2nd Categorical Variable"
  ) +
  ylim(0.6, 1.6) + 
  theme_bw(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5),
    panel.grid = element_line(linetype = "dashed"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top"
  )

EDIT You could use two stat_summary layers to add errorbars and the mean:

ggplot(df_dummy, aes(Trt_Amend, Carbon_percent, color = Trt_CC)) +
  geom_point(
    position = position_jitterdodge(jitter.width = .3)
  ) +
  stat_summary(
    fun.data = "mean_sdl", fun.args = list(mult = 1), position = position_dodge(width = .75),
    geom = "errorbar",
    width = .3
  ) +
  stat_summary(
    fun = "mean", position = position_dodge(width = .75),
    geom = "point", size = 4
  ) + 
  scale_color_manual(values = c("red", "blue")) +
  labs(
    title = "Great Plot Title",
    x = "1st Categorical Variable",
    y = "Continuous Variable",
    color = "2nd Categorical Variable"
  ) +
  ylim(0.6, 1.6) + 
  theme_bw(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5),
    panel.grid = element_line(linetype = "dashed"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top"
  )

enter image description here