How to annotated the top of a stacked bar with the greatest height

104 Views Asked by At

I have the following chart and I want to annotate the bar with the highest value. The problem is the coordinate of x-axis has no values, it has text.

enter image description here

2

There are 2 best solutions below

0
On
  • Bars tick locations are usually 0 indexed, especially if the the tick labels are categorical.
  • The easiest option is to use .pivot_table to aggregate the mean for each group, and create a separate variable, tot for the maximum total bar height relative to the index.
    • The pivot_table index will be the x-axis and the column headers will be the bar groups.
  • pandas.DataFrame.plot with kind='bar' and stacked=True offers the easiest option for plotting stacked bars. pandas uses matplotlib as the default plotting backend.
  • Use .bar_label as explained in this answer and this answer, to annotate the bars.
    • The fmt parameter accepts a lambda expression, which is used to filter the labels to match tot. This works from matplotlib v3.7, otherwise a custom label parameter must be used, as shown in the linked answers.
    • The segments for each color group are in ax.containers, where ax.containers[0] is the bottom segments and ax.containers[1] is the top segments.
    • label_type='edge' is the default, which results in the annotation being the sum of the bar heights.
  • If the months are not ordered on the x-axis, then the 'month' column can be set with pd.Categorical and ordered.
    • from calendar import month_abbr to get an ordered list of abbreviated month names.
    • df.month = pd.Categorical(values=df.month, categories=month_abbr[1:], ordered=True)
  • Tested in python 3.12.0, pandas 2.1.2, matplotlib 3.8.1, seaborn 0.13.0
import seaborn as sns  # seaborn is only used for the sample data, but pandas and matplotlib are imported as dependencies
import numpy  # for sample data

# sample data: this is a pandas.DataFrame
df = sns.load_dataset('flights')[['month', 'passengers']]
np.random.seed(2023)
df['Gender'] = np.random.choice(['Male', 'Female'], size=len(df))

# pivot and aggregate the mean
pt = df.pivot_table(index='month', columns='Gender', values='passengers', aggfunc='mean')

# calculate the max value by the index
tot = pt.sum(axis=1).max()

# plot the stacked bars
ax = pt.plot(kind='bar', stacked=True, rot=0, figsize=(7, 5), xlabel='Month',
             ylabel='Mean Number of Passengers', title='Annotation Demonstration')

# annotate the top group of bars
ax.bar_label(ax.containers[1], fmt=lambda x: f'{x:0.0f}' if x == tot else '')

# move the legend: cosmetics
ax.legend(title='Gender', bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)

# remove the top and right spines: cosmetics
ax.spines[['top', 'right']].set_visible(False)

enter image description here

df.head()

  month  passengers  Gender
0   Jan         112  Female
1   Feb         118  Female
2   Mar         132    Male
3   Apr         129  Female
4   May         121  Female

pt

Gender      Female        Male
month                         
Jan     233.000000  259.250000
Feb     209.428571  270.800000
Mar     282.375000  245.750000
Apr     289.000000  245.166667
May     238.571429  318.400000
Jun     264.000000  378.400000
Jul     336.166667  366.500000
Aug     343.500000  358.666667
Sep     274.400000  322.428571
Oct     340.333333  192.833333
Nov     191.333333  274.333333
Dec     252.833333  270.833333

pt.sum(axis=1)

month
Jan    492.250000
Feb    480.228571
Mar    528.125000
Apr    534.166667
May    556.971429
Jun    642.400000
Jul    702.666667
Aug    702.166667
Sep    596.828571
Oct    533.166667
Nov    465.666667
Dec    523.666667
dtype: float64

tot

702.6666666666667
0
On

The data + example below demonstrates how to label the tallest bar. However, it assumes that the bars were drawn directly using matplotlib and that the data is a numpy array. If you produced your plot using pandas or some other plotting library, then the approach below would need to be modified.

enter image description here

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


#Synthetic data
np.random.seed(0)

month_names = pd.Series(
    pd.date_range(start='2023-01', periods=12, freq='M')
).dt.month_name().to_list()
month_names = [name[:3].upper() for name in month_names]

disturbances = np.stack([
    np.random.randn(12) * 4 + 50,  #orange bars
    np.random.randn(12) * 6 + 50], #blue bars
    axis=0
)
totals = disturbances.sum(axis=0) #total per month

#Plot
f, ax = plt.subplots(figsize=(10, 4))

bottom = np.zeros(12)
for dist in disturbances:
    bars = ax.bar(month_names, dist, bottom=bottom, label='')
    bottom += dist + 1.5

ax.xaxis.set_tick_params(rotation=45)
ax.set_xlabel('Month')
ax.set_ylabel('Disturbances')

#Make labels
#All labels empty except the largest value
labels = [''] * 12
labels[totals.argmax()] = totals[totals.argmax()].round()
ax.bar_label(bars, labels=labels);