Consider the following code that should be making two identical bar charts, except for axis labels. The data.frame dat
is representative of the kind of data I have, just colour responses. It would seem a good idea to use those colours in plotting.
colOrder = c('green', 'blue', 'red', 'orange', 'purple')
dat <- data.frame(q = rep(c('red', 'orange', 'green', 'blue', 'purple'), c(1:5)))
# using geom_bar
ggplot(dat, aes(x = factor(q, levels = colOrder) )) +
geom_bar(fill = colOrder)
#using geom_col
dc <- as.data.frame(table(dat$q))
names(dc) <- c('q', 'n')
ggplot(dc, aes(x = factor(q, levels = colOrder), y = n )) +
geom_col(fill = colOrder)
Why are these plots not identical? The geom_col
seems to be filling using the colour order in dc$q
while geom_bar correctly takes colOrder
as requested. Am I missing something in geom_col
documentation? bug?
The issue is that you pass the fill colors as an argument to
geom_bar/col
. Doing so the colors are assigned to the categories according to the order as they appear in the datasets used bygeom_bar/col
. Whilegeom_col
uses the data as is,geom_bar
first computes the counts viastat="count"
and reorders the data according to the variable mapped onx
. As a result, when adding the vector of fill colors to these datasets, you get different colors for some categories.You can see this clearly by using
layer_data()
which allows to retrieve the data used by ageom
layer under the hood:As is evident for
geom_bar
aka the first plot the data is ordered according tocolOrder
after the statistical transformation, whereas forgeom_col
aka the second plot the data is not reordered.If you want the same or consistent colors then map on the
fill
aesthetic and set your colors viascale_fill_manual
(or if you have a column with color names as in your example by usingscale_fill_identity()
):