Prevent overlapping of labels in stacked bar chat (geom_text)

64 Views Asked by At

I am trying to create a stacked bar chart with percentages in R. The problem that I have is that some of the labels are overlapping in the bar (see the two 3%-labels in the bar at the bottom position): enter image description here

This is my reproducible code:

Position <- rep(c("LeiterIn", "AssistentIn", "ElementarpädagogIn"), each = 30) 
Subjektiver_Gesundheitszustand <- c(1,3,4,5,2,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,
                                    3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5, 
                                    3,4,5,3,4,5)
dat <-  data.frame(Position, Subjektiver_Gesundheitszustand)

dat %>%
  dplyr::count(Position, Subjektiver_Gesundheitszustand) %>%
  group_by(Position) %>%
  mutate(Pct = n / sum(n)) %>%
  ggplot(aes(fill = factor(Subjektiver_Gesundheitszustand), x = Pct, y = fct_rev(Position))) +
  geom_bar(position = position_fill(reverse = TRUE), stat = "identity") +
  geom_text(aes(label = paste0(sprintf("%1.0f", Pct * 100), "%")), 
            position = position_stack(vjust = 0.5,  reverse =TRUE), size = 5)+
  scale_x_continuous(labels = scales::percent) +
  facet_wrap(vars(Position), ncol = 1, scales = "free_y") +
  scale_fill_manual(labels = c("sehr gut", "gut", "mittelmäßig", "schlecht", "sehr schlecht"), 
                    values = c("#A3E6B4","#86BD94", "#658F70","#4C6B54", "#2F4234")) +
  labs(title = "", y = "Position", x = "Percentage") +
  theme(legend.position = "bottom",
        legend.title = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.y = element_blank(),
        strip.background = element_rect(fill = NA),# remove facet strip background
        strip.text = element_text(hjust = 0, size = 30))type here

I tried geom_text_repel() but I don't like that it breaks with the horizontal aligning of the labels. In my opinion, it just looks chaotic:

enter image description here

I'd like to move only the small labels (i.e. the 3% labels) outside of the bar (ideally with a line/arrow connecting the label with the corresponding section in the bar) and keep all other labels aligned. Any idea how this is possible? Thanks in advance!

1

There are 1 best solutions below

0
L Tyrone On

I made an attempt using facet_wrap() but struggled to find an adequate solution so here's a workflow I use in these situations. It involves:

  • creating a 'dummy' integer variable for what becomes the y-axis
  • calculating x-axis and y-axis label locations using df data and cumulative sum of values
  • alternating y-axis label placement for values < 5% based on odd/even row_number() to reduce likelihood of adjacent low value labels overlapping
  • setting coord_flip(clip = "off") to switch axes
  • adding geom_rect() to improve label readability (omit if not wanted)

Note that because coord_flip(clip = "off"), setting the locations for the labels will seem 'backwards' e.g. x values become y and vice versa. To keep the workflow intuitive, I have named the label location variables so they mirror where they are declared in aes(). For example, xlabloc and dummyX will end up on the y-axis.

It can take a lot of fine-tuning in regards to getting the label placements how you want them, but an advantage of this method is that it is fully customisable. I have arbitrarily aligned the labels < 5% to the centre of their associated bar segment. If this doesn't suit, there's nothing stopping you using the same workflow principles to shift them to somewhere else more preferable.

Plot aesthetics are dependent on output plot dimensions. As such, some values such as ymin and ymax in geom_rect() will need adjusting if you change your ggsave() dimensions. The example images below use width = 6 and height = 5. It is possible to create and use a plot scale variable e.g. scalevar <- 6 / 5 and use it to scale ggplot() aesthetics and this can reduce the amount of fine-tuning required.

OPTION 1: values < 5% outside bar:

library(ggplot2)
library(dplyr)

Position <- rep(c("LeiterIn", "AssistentIn", "ElementarpädagogIn"), each = 30) 
Subjektiver_Gesundheitszustand <- c(1,3,4,5,2,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,
                                    3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5, 
                                    3,4,5,3,4,5)

dat <-  data.frame(Position, Subjektiver_Gesundheitszustand) %>%
  count(Position, Subjektiver_Gesundheitszustand) %>%
  group_by(Position) %>%
  mutate(Pct = n / sum(n) * 100,
         Pctlab = paste0(sprintf("%1.0f", Pct), "%"),
         Pctcum = cumsum(Pct),
         dummyX = case_when(Position == "AssistentIn" ~ 3,
                            Position == "ElementarpädagogIn" ~ 2,
                            TRUE ~ 1),
         xlabloc = case_when(Pct < 5 & row_number() %% 2 == 0 ~ 
                               row_number() %% 2 + 0.7  * dummyX,
                             Pct < 5 & row_number() %% 1 == 0 ~ 
                               row_number() %% 2 + 0.3  * dummyX,
                             TRUE ~ 1 * dummyX),
         ylabloc = (lag(Pctcum, default = 0) + Pctcum) / 2)

ggplot() +
  geom_bar(data = dat, aes(x = dummyX,
                           y = Pct,
                           group = dummyX,
                           fill = factor(Subjektiver_Gesundheitszustand)),
           stat = "identity",
           width = 0.5) +
  coord_flip(clip = "off") +
  geom_text(data = dat,
            aes(x = xlabloc, 
                y = ylabloc,
                label = Pctlab),
            size = 4) +
  annotate("text", 
           x = unique(dat$dummyX) + 0.435, 
           y = 0,
           label = unique(dat$Position),
           hjust = 0,
           colour = "black",
           size = 8) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  scale_fill_manual(labels = c("sehr gut", "gut", "mittelmäßig", "schlecht", "sehr schlecht"), 
                    values = c("#A3E6B4","#86BD94", "#658F70","#4C6B54", "#2F4234")) +
  labs(y = "Position", x = "Percentage") +
  theme(legend.position = "bottom",
        legend.title = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.y = element_blank(),
        panel.grid = element_blank(),
        panel.background = element_blank())

resut1

OPTION 2: values < 5% inside bar: note that this uses the same ggplot() code as above, but geom_bar(width = 0.6):

# Modified xlabloc locations
dat <-  dat %>%
  mutate(xlabloc = case_when(Pct < 5 & row_number() %% 2 == 0 ~ 
                               row_number() %% 2 + 0.8  * dummyX,
                             Pct < 5 & row_number() %% 1 == 0 ~ 
                               row_number() %% 2 + 0.2  * dummyX,
                             TRUE ~ 1 * dummyX))

result2