I have the following dataframe df:
df <- structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4), Group = c("A",
"B", "A", "B", "A", "B", "B", "A"), sumGp = c(1L, 0L, 162L, 32L,
9L, 2L, 0L, 0L), n = c(2L, 30L, 181L, 60L, 27L, 17L, 33L, 3L),
pct = c(0.5, 0, 0.895027624309392, 0.533333333333333, 0.333333333333333,
0.117647058823529, 0, 0)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I want to visualize this with dodged geom_col, adding labels to it only when sumGp is not zero. My trick is to color the text white when sumGp == 0 (I understand this might not be the best way to label the numbers, but let's just do it this way to reproduce my main problem in this question).
Let's set up a ggplot function to make sure my problem is not related to accidentally changed codes:
library(ggplot2)
library(dplyr)
library(scales)
plot_geom_text_with_dodge_geom_col <- function(df){
ggplot(df |> mutate(label_pct = paste0(sumGp, "/", n)),
aes(as.character(ID), pct, fill = Group, label = label_pct)) +
geom_col(position = "dodge", col = "black") +
geom_text(aes(color = grepl("^0/", label_pct)), position = position_dodge(0.9), vjust = -0.5,
show.legend = F) +
scale_fill_manual(name = "Identity", values = c("A" = "#4A765e", "B" = "orange3")) +
scale_color_manual(name = "Identity", values = c("black", "white")) +
scale_y_continuous(label = scales::percent) +
labs(x = "ID") +
theme_bw() +
theme(panel.grid = element_blank(),
panel.border = element_rect(color = "black", linewidth = 1),
plot.tag = element_text(face = "bold"),
legend.title = element_text(face = "bold", size = 15),
legend.text = element_text(size = 12),
axis.title = element_text(face = "bold", size = 15),
axis.text = element_text(size = 12, face = "bold"),
legend.position = "top")
}
Everything goes well with this code:
plot_geom_text_with_dodge_geom_col(df)

But when the values of "A" and "B" flipped, the behavior of geom_text becomes weird, where position_dodge does not seem to affect the text on the dodged bar if one of them contained zero:
df2 <- df |> mutate(Group = case_match(Group, "A" ~ "B", "B" ~ "A"))
plot_geom_text_with_dodge_geom_col(df2)

Does anyone know what is happening? Someone suggested to add aes(group = Group) to fix the problem, but that did not answer why ploting with df is fine without grouping. Also, that did not answer why only the pair with sumGp == 0 was affected without grouping (in df2).


The issue is the grouping and can be fixed by mapping on the
group=aes. I haven't digged deeper into your code. But especially when several variables and aesthetics are involved I would recommend to map on aesthetics locally and/or to explicitly map on thegroup=aes so that bars and text or ... are dodged by the same variable.Why does this happen?
The underlying issue is the grouping or more precisely how the
groupvariable is set internally byggplot2. As document in several places, e.g. in the docs thegroupvariable is set usingplyr::id()(see here) based on all discrete variables mapped on aesthetics with thelabelaes (and the facetting) variables being the only exceptions. Additionally, it's important to note that the value set for thegroupvariable also depends on the order of the aesthetics insideaes().Before I go on, to show clearly what's going on I slightly changed the setup of your example, i.e. I added
label_pctand acolorcolumn to the original dataset outside of your plotting function.For the
geom_colthegroupis set according toIDandGroupaka the variables mapped onxandfill. For your example this also means that each observation gets assigned to its own group. In contrast, for thegeom_textthegroupalso accounts for the variable mapped oncolor. As a consequence, already fordfdoes thegrouping differ for thegeom_coland thegeom_text. This can be seen by callingplyr::id()and checked using e.g.layer_data:This said, even for
dfare the labels assigned to the right columns only by coincidence.Now, when looking at
df2we see that for thegeom_colthe values assigned to thegroupsimply get swapped when swappingAandB. Hence, the bars are swapped too.However, this is not the case for the
geom_textlayer. Here, the values assigned togroupare swapped for all rows except for the first two. As a result the grouping for the first two rows or labels is the same as in the case ofdfand the labels are no longer assigned or aligned to the correct bars.