How to add direct labels to a bar chart in ggplot for numeric x axis

2.4k Views Asked by At

I am trying to create a bar chart in ggplot where the widths of the bars are associated with a variable Cost$Sum.of.FS_P_Reduction_Kg. I am using the argument width=Sum.of.FS_P_Reduction_Kg to set the width of the bars according to a variable.

I want to add direct labels to the chart to label each bar, similar to the image documented below. I am also seeking to add in x axis labels corresponding to the argument width=Sum.of.FS_P_Reduction_Kg. Any help would be greatly appreciated. I am aware of ggrepel but haven't been able to get the desired effect so far.

Example of graph with direct labels and numerical x axis

I have used the following code:

# Plot the data 
P1 <- ggplot(Cost,
       aes(x = Row.Labels,
           y = Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost,
           width = Average.of.FS_Annual_P_Reduction_Kg, label = Row.Labels)) +
  geom_col(fill = "grey", colour = "black") + 
  geom_label_repel(
    arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),
    force = 10,
    xlim  = NA) +
  facet_grid(~reorder(Row.Labels, 
                      Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost), 
             scales = "free_x", space = "free_x") +
  labs(x = "Measure code and average P reduction (kg/P/yr)",
       y = "Mean annual TOTEX (£/kg) of P removal (thousands)") +
  coord_cartesian(expand = FALSE) +     # remove spacing within each facet
  theme_classic() +
  theme(strip.text = element_blank(),   # hide facet title (since it's same as x label anyway)
        panel.spacing = unit(0, "pt"),  # remove spacing between facets
        plot.margin = unit(c(rep(5.5, 3), 10), "pt"), # more space on left for axis label
        axis.title=element_text(size=14),
        axis.text.y = element_text(size=12),
        axis.text.x = element_text(size=12, angle=45, vjust=0.2, hjust=0.1)) + 
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

P1 = P1 + scale_y_continuous(labels = function(x) format(x/1000))
P1

The example data table can be reproduced with the following code:

> dput(Cost)
structure(list(Row.Labels = structure(c(1L, 2L, 6L, 9L, 4L, 3L, 
5L, 7L, 8L), .Label = c("Change the way P is applied", "Improve management of manure", 
"In channel measures to slow flow", "Keep stock away from watercourses", 
"No till trial ", "Reduce runoff from tracks and gateways", "Reversion to different vegetation", 
"Using buffer strips to intercept pollutants", "Water features to intercept pollutants"
), class = "factor"), Average.of.FS_Annual_P_Reduction_Kg = c(0.11, 
1.5425, 1.943, 3.560408144, 1.239230769, 18.49, 0.091238043, 
1.117113762, 0.11033263), Average.of.FS_._Change = c(0.07, 0.975555556, 
1.442, 1.071692763, 1.212307692, 8.82, 0.069972352, 0.545940711, 
0.098636339), Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost = c(2792.929621, 
2550.611429, 964.061346, 9966.056875, 2087.021801, 57.77580744, 
165099.0425, 20682.62962, 97764.80805), Sum.of.Total_._Cost = c(358.33, 
114310.49, 19508.2, 84655, 47154.23, 7072, 21210, 106780.34, 
17757.89), Average.of.STW_Treatment_Cost_BASIC = c(155.1394461, 
155.1394461, 155.1394461, 155.1394461, 155.1394461, 155.1394461, 
155.1394461, 155.1394461, 155.1394461), Average.of.STW_Treatment_Cost_HIGH = c(236.4912345, 
236.4912345, 236.4912345, 236.4912345, 236.4912345, 236.4912345, 
236.4912345, 236.4912345, 236.4912345), Average.of.STW_Treatment_Cost_INTENSIVE = c(1023.192673, 
1023.192673, 1023.192673, 1023.192673, 1023.192673, 1023.192673, 
1023.192673, 1023.192673, 1023.192673)), class = "data.frame", row.names = c(NA, 
-9L))
2

There are 2 best solutions below

4
On BEST ANSWER

I think it will be easier to do a bit of data prep so you can put all the boxes in one facet with a shared x-axis. For instance, we can calc the cumulative sum of reduction Kg, and use that to define the starting x for each box.

EDIT -- added ylim = c(0, NA), xlim = c(0, NA), to keep ggrepel::geom_text_repel text within positive range of plot.

library(ggplot2)
library(ggrepel)
library(stringr) 
library(dplyr)

Cost %>%
  arrange(desc(Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost)) %>%
  mutate(Row.Labels = forcats::fct_inorder(Row.Labels),
         cuml_reduc = cumsum(Average.of.FS_Annual_P_Reduction_Kg),
         bar_start  = cuml_reduc - Average.of.FS_Annual_P_Reduction_Kg,
         bar_center = cuml_reduc - 0.5*Average.of.FS_Annual_P_Reduction_Kg) %>%
  ggplot(aes(xmin = bar_start, xmax = cuml_reduc,
             ymin = 0, ymax = Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost)) +
  geom_rect(fill = "grey", colour = "black") +
  geom_text_repel(aes(x = bar_center, 
                      y = Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost,
                      label = str_wrap(Row.Labels, 15)), 
                  ylim = c(0, NA), xlim = c(0, NA),  ## EDIT
                  size = 3, nudge_y = 1E4, nudge_x = 2, lineheight = 0.7, 
                  segment.alpha = 0.3) +
  scale_y_continuous(labels = scales::comma) +
  labs(x = "Measure code and average P reduction (kg/P/yr)",
       y = "Mean annual TOTEX (£/kg) of P removal (thousands)")

enter image description here

0
On

You could experiment with scaling the values a little bit, e.g. using logarithmization. Since I prefer baseplots over gglplot2 I show you a base solution using barplot accordingly.

First, we transform the firs column into rownames and delete it.

cost <- `rownames<-`(Cost[-1], Cost[,1])

Defining widths in barplot is quite straightforward, since it has an option width= where we put in the logarithmized values of the according variable. For the bar-labels we need to calculate the positions and use text; to achieve line-wraps we may use strwrap. A label can conveniently left out if it's a hardship case (as #6 in the example). Finally we use (headless) arrows .

# logarithmize values
w <- log(w1 <- cost$Average.of.Cost_Per_Kg_P_Removal.undiscounted..LOW_Oncost)
# define vector labels inside / outside, at best by hand
inside <- as.logical(c(0, 1, 0, 1, 1, 0, 1, 1, 1))
# calculate `x0` values of labels
x0 <- w / 2 + c(0, cumsum(w)[- length(w)])
# define y values o. labels
y0 <- ifelse(inside, colSums(t(cost)) / 2, 1.5e5)
# make labels using 'strwrap' 
labs <- mapply(paste, strwrap(rownames(cost), 15, simplify=F), collapse="\n")
# define nine colors
colores <- hcl.colors(9, "Spectral", alpha=.7)

# the actual plot
b <- barplot(cs <- colSums(t(cost)), width=w, space=0, ylim=c(1, 2e5), 
             xlim=c(-1, 80), xaxt="n", xaxs="i", col=colores, border=NA,
             xlab="Measure code and average P reduction (kg/P/yr)",
             ylab="Mean annual TOTEX (£/kg) of P removal (thousands)")

# place lables, leave out # 6
text(x0[-6], y0[-6], labels=labs[-6], cex=.7)
# arrows
arrows(x0[c(1, 3)], 1.35e5, x0[c(1, 3)], cs[c(1, 3)], length=0)
# label # 6
text(40, 1e5, labs[6], cex=.7)
# arrow # 6
arrows(40, 8.4e4, x0[6], cs[6], length=0)
# make x axis
axis(1, c(0, cumsum(log(seq(0, 1e5, 1e4)[-1]))), 
     labels=format(c(0, cumsum(seq(0, 1e5, 1e4)[-1])), format="d"), tck=-.02)
# put it in a box
box()

Result

enter image description here

I hope I got the x axis values right.

You probably have to figure out a little how the probably new functions work, but it's quite easy using the help files, e.g. type ?barplot.