I have a dataset that looks like this. It's saved as a file called "prescriptivism_scores.csv":
Usage_Guide | Usage_Problem | Prescriptivism_Index |
---|---|---|
Book 1 | who/whom | 2 |
Book 2 | who/whom | 2 |
Book 3 | who/whom | 2.5 |
Book 4 | who/whom | 4 |
Book 5 | who/whom | 2 |
Book 6 | who/whom | 1.5 |
Book 7 | who/whom | 3 |
Book 8 | who/whom | 2 |
Book 9 | who/whom | 4 |
Book 10 | who/whom | 4 |
Book 11 | who/whom | 2 |
I used this code
library(ggplot2)
df <- read.csv(file = 'prescriptivism_scores.csv')
ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem)) +
geom_boxplot(color = "#838383") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, color = "#00853E", fill = "#C4D600", stackratio = 1.5) +
scale_x_discrete(name = "Usage Problem",
breaks = c("different_to_than_from", "I_for_me", "lay_lie", "less_fewer", "none", "singular_they", "split_infinitive", "who_whom"),
labels = c("DIFF TO/THAN/FROM", "I FOR ME", "LAY/LIE", "LESS/FEWER", "NONE", "SG THEY", "SPLIT INF", "WHO/WHOM")) +
ylab("Prescriptivism Index") +
stat_summary(fun.y = mean, geom="point", shape=3, size=3, color="#EF4B81") +
theme(panel.background = element_blank()) +
geom_hline(yintercept = 1, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
geom_hline(yintercept = 2, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
geom_hline(yintercept = 3, linetype = "dashed", linewidth = 0.25, color = "darkgray") +
geom_hline(yintercept = 4, linetype = "dashed", linewidth = 0.25, color = "darkgray")
to create this box plot
Box plot with no overlapping data points
I'm happy with everything about this box plot except for one thing: I want each dot in the plot to be a different shape to represent the different books in the "Usage_Guide" column of my data. I want to do this so I know which data points correspond to which books.
I've tried adding "shape = Usage_Guide" to the aes() function.
ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, shape = Usage_Guide)) +
But when I do, the dots don't actually change shape, and the dotplot changes so that the dots are overlapping. It also adds dashes to the plot:
Box plot with no box, overlapping data points, and dashes added
If I try to change the color of the dots instead of the shape, I get closer to my end goal, but strange things also happen.
For example, adding a color call to the aes() function and changing the fill to white in the geom_dotplot() function, as shown in the code below, changes the colors of the dots and maintains the box plots, but it causes the data points to overlap.
ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, color = Usage_Guide)) +
geom_boxplot(color = "#838383") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, fill = "#FFFFFF", stackratio = 1.5) +
Box plot with colored overlapping data points
But just reversing the fill and outline so the fill call is in the aes() function the color call is in the geom_dotplot() function breaks something so that the box plots no longer show.
ggplot(df, aes(y = Prescriptivism_Index, x = Usage_Problem, fill = Usage_Guide)) +
geom_boxplot(color = "#838383") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5, color = "#FFFFFF", stackratio = 1.5) +
Box plot with no box and overlapping colored data points
How can I maintain the look of my original box plot, but with different shapes for each data point?