I have a Pandas DataFrame with a column called "AXLES", which can take an integer value between 3-12. I am trying to use Seaborn's countplot() option to achieve the following plot:
- left y axis shows the frequencies of these values occurring in the data. The axis extends are [0%-100%], tick marks at every 10%.
- right y axis shows the actual counts, values correspond to tick marks determined by the left y axis (marked at every 10%.)
- x axis shows the categories for the bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
- Annotation on top of the bars show the actual percentage of that category.
The following code gives me the plot below, with actual counts, but I could not find a way to convert them into frequencies. I can get the frequencies using df.AXLES.value_counts()/len(df.index)
but I am not sure about how to plug this information into Seaborn's countplot()
.
I also found a workaround for the annotations, but I am not sure if that is the best implementation.
Any help would be appreciated!
Thanks
plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')
for p in ax.patches:
ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
EDIT:
I got closer to what I need with the following code, using Pandas' bar plot, ditching Seaborn. Feels like I'm using so many workarounds, and there has to be an easier way to do it. The issues with this approach:
- There is no
order
keyword in Pandas' bar plot function as Seaborn's countplot() has, so I cannot plot all categories from 3-12 as I did in the countplot(). I need to have them shown even if there is no data in that category. The secondary y-axis messes up the bars and the annotation for some reason (see the white gridlines drawn over the text and bars).
plt.figure(figsize=(12,8)) plt.title('Distribution of Truck Configurations') plt.xlabel('Number of Axles') plt.ylabel('Frequency [%]') ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0) ax.set_yticks(np.arange(0, 110, 10)) ax2 = ax.twinx() ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100) for p in ax.patches: ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))
You can do this by making a
twinx
axes for the frequencies. You can switch the two y axes around so the frequencies stay on the left and the counts on the right, but without having to recalculate the counts axis (here we usetick_left()
andtick_right()
to move the ticks andset_label_position
to move the axis labelsYou can then set the ticks using the
matplotlib.ticker
module, specificallyticker.MultipleLocator
andticker.LinearLocator
.As for your annotations, you can get the x and y locations for all 4 corners of the bar with
patch.get_bbox().get_points()
. This, along with setting the horizontal and vertical alignment correctly, means you don't need to add any arbitrary offsets to the annotation location.Finally, you need to turn the grid off for the twinned axis, to prevent grid lines showing up on top of the bars (
ax2.grid(None)
)Here is a working script: