I'm trying to make a box plot showing expression of Gene between types A and B. But I would like to color the points based on the List mentioned in the following dataframe "df":
Samples Type List GeneA
Sample1 B Other -4.778968547
Sample2 B Other -4.63232938
Sample3 B Other -5.695251042
Sample4 A Sample4 2.820003188
Sample5 A Other 7.487856546
Sample6 A Other 2.290055318
Sample7 A Other -1.183807203
Sample8 B Other -4.534681343
Sample9 A Other -5.140540608
Sample10 B Other -5.695251042
Sample11 B Other -5.695251042
Sample12 B Other -5.695251042
Sample13 A Other -5.071179371
Sample14 A Other 1.117824251
Sample15 A Other 4.42672296
Sample16 B Other -2.607036764
Sample17 B Other -4.154979727
Sample18 A Other -4.773270932
Sample19 B Other -5.695251042
Sample20 A Other 0.472999278
Sample21 A Other -0.12535742
Sample22 A Other -4.32895912
Sample23 A Other 0.342990853
Sample24 B Sample24 -5.169967041
Sample25 B Other -4.628633712
Sample26 A Other 0.18030665
Sample27 B Other -5.695251042
Sample28 A Sample28 3.274762509
Sample29 B Other 1.133797461
Sample30 B Other -0.489134592
Sample31 A Other -0.580311566
Sample32 A Other -0.801258402
Sample33 B Other -5.695251042
Sample34 B Sample34 -5.695251042
Sample35 B Other -3.627831566
Sample36 B Other -5.126528687
Sample37 B Other -3.658755234
Sample38 B Other -3.563236707
I plotted the above data like following:
q <- ggplot(df, aes(Type, GeneA))
q + geom_boxplot() + geom_jitter(width = 0.2, aes(colour = List)) +
labs(y = "GeneA expression (logCPM)")
In the legend I could see the colors of the points. But strangely I see two black points for the Type B. Whats wrong here?
geom_boxplot
plots outliers as black points. You can disable this with theoutlier.shape
argument.