R: Make a box plot using different shapes for data points with no overlapping data points using geom_dotplot()

141 Views Asked by At

I have a dataset that looks like this. It's saved as a file called "prescriptivism_scores.csv":

Usage_Guide Usage_Problem Prescriptivism_Index
Book 1 who/whom 2
Book 2 who/whom 2
Book 3 who/whom 2.5
Book 4 who/whom 4
Book 5 who/whom 2
Book 6 who/whom 1.5
Book 7 who/whom 3
Book 8 who/whom 2
Book 9 who/whom 4
Book 10 who/whom 4
Book 11 who/whom 2

I used this code

library(ggplot2)

df <- read.csv(file = 'prescriptivism_scores.csv')
ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem)) +
  geom_boxplot(color = "#838383") +
  
  geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, color = "#00853E", fill = "#C4D600", stackratio = 1.5) +
 
  scale_x_discrete(name = "Usage Problem", 
                   breaks = c("different_to_than_from", "I_for_me", "lay_lie", "less_fewer", "none", "singular_they", "split_infinitive", "who_whom"),
                   labels = c("DIFF TO/THAN/FROM", "I FOR ME", "LAY/LIE", "LESS/FEWER", "NONE", "SG THEY", "SPLIT INF", "WHO/WHOM")) +
  
  ylab("Prescriptivism Index") +
  
  stat_summary(fun.y = mean, geom="point", shape=3, size=3, color="#EF4B81") +
 
  theme(panel.background = element_blank()) +
  
  geom_hline(yintercept = 1, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
  geom_hline(yintercept = 2, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
  geom_hline(yintercept = 3, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
  geom_hline(yintercept = 4, linetype = "dashed", linewidth = 0.25, color = "darkgray")

to create this box plot

Box plot with no overlapping data points

I'm happy with everything about this box plot except for one thing: I want each dot in the plot to be a different shape to represent the different books in the "Usage_Guide" column of my data. I want to do this so I know which data points correspond to which books.

I've tried adding "shape = Usage_Guide" to the aes() function.

ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, shape = Usage_Guide)) +

But when I do, the dots don't actually change shape, and the dotplot changes so that the dots are overlapping. It also adds dashes to the plot:

Box plot with no box, overlapping data points, and dashes added

If I try to change the color of the dots instead of the shape, I get closer to my end goal, but strange things also happen.

For example, adding a color call to the aes() function and changing the fill to white in the geom_dotplot() function, as shown in the code below, changes the colors of the dots and maintains the box plots, but it causes the data points to overlap.

ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, color = Usage_Guide)) +
  geom_boxplot(color = "#838383") +
  
  geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, fill = "#FFFFFF", stackratio = 1.5) +

Box plot with colored overlapping data points

But just reversing the fill and outline so the fill call is in the aes() function the color call is in the geom_dotplot() function breaks something so that the box plots no longer show.

ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, fill = Usage_Guide)) +
  geom_boxplot(color = "#838383") +
  
  geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, color = "#FFFFFF", stackratio = 1.5) +

Box plot with no box and overlapping colored data points

How can I maintain the look of my original box plot, but with different shapes for each data point?

0

There are 0 best solutions below