How to set a fixed fill in geom_col equivalent to geom_bar in ggplot

42 Views Asked by At

Consider the following code that should be making two identical bar charts, except for axis labels. The data.frame dat is representative of the kind of data I have, just colour responses. It would seem a good idea to use those colours in plotting.

colOrder = c('green', 'blue', 'red', 'orange', 'purple')
dat <- data.frame(q = rep(c('red', 'orange', 'green', 'blue', 'purple'), c(1:5)))

# using geom_bar
ggplot(dat, aes(x = factor(q, levels = colOrder) )) +
    geom_bar(fill = colOrder)

#using geom_col
dc <- as.data.frame(table(dat$q))
names(dc) <- c('q', 'n')
ggplot(dc, aes(x = factor(q, levels = colOrder), y = n )) +
    geom_col(fill = colOrder)

Why are these plots not identical? The geom_col seems to be filling using the colour order in dc$q while geom_bar correctly takes colOrder as requested. Am I missing something in geom_col documentation? bug?

1

There are 1 best solutions below

0
On

The issue is that you pass the fill colors as an argument to geom_bar/col. Doing so the colors are assigned to the categories according to the order as they appear in the datasets used by geom_bar/col. While geom_col uses the data as is, geom_bar first computes the counts via stat="count" and reorders the data according to the variable mapped on x. As a result, when adding the vector of fill colors to these datasets, you get different colors for some categories.

You can see this clearly by using layer_data() which allows to retrieve the data used by a geom layer under the hood:

colOrder = c('green', 'blue', 'red', 'orange', 'purple')
dat <- data.frame(q = rep(c('red', 'orange', 'green', 'blue', 'purple'), 1:5))

library(ggplot2)

p1 <- ggplot(dat, aes(x = factor(q, levels = colOrder) )) +
  geom_bar(fill = colOrder)

dc <- as.data.frame(table(dat$q))
names(dc) <- c('q', 'n')
p2 <- ggplot(dc, aes(x = factor(q, levels = colOrder), y = n )) +
  geom_col(fill = colOrder)

layer_data(p1, 1)[c("x", "y", "group", "fill")]
#>   x y group   fill
#> 1 1 3     1  green
#> 2 2 4     2   blue
#> 3 3 1     3    red
#> 4 4 2     4 orange
#> 5 5 5     5 purple

layer_data(p2, 1)[c("x", "y", "group", "fill")]
#>   x y group   fill
#> 1 2 4     2  green
#> 2 1 3     1   blue
#> 3 4 2     4    red
#> 4 5 5     5 orange
#> 5 3 1     3 purple

As is evident for geom_bar aka the first plot the data is ordered according to colOrder after the statistical transformation, whereas for geom_col aka the second plot the data is not reordered.

If you want the same or consistent colors then map on the fill aesthetic and set your colors via scale_fill_manual (or if you have a column with color names as in your example by using scale_fill_identity()):

p3 <- ggplot(dat, aes(x = factor(q, levels = colOrder) )) +
  geom_bar(aes(fill = q)) +
  scale_fill_manual(values = colOrder)

p4 <- ggplot(dc, aes(x = factor(q, levels = colOrder), y = n )) +
  geom_col(aes(fill = q)) +
  scale_fill_manual(values = colOrder)

library(patchwork)

(p1 + p2) / (p3 + p4)