How to separately normalize each distribution group

124 Views Asked by At

Lets say I have a dataframe such as:

CATEGORY  Value
a          v1
a          v2
a          v3
a          v4
a          v5
b          v6
b          v7
b          v8

Now, if i want to plot this distributions by category, i could use something like:

sns.histplot(data,"Value",hue="CATEGORY",stat="percent").

The problem with this is that category "a" represents 5/8 of the sample and "b" is 3/8. The histograms will reflect this. I want to plot in a way that each histogram will have an area of 1, instead of 5/8 and 3/8.

Below is an example of how it looks like now

enter image description here

But each of those areas should be one.

I thought of maybe iterating by category and plotting one by one

1

There are 1 best solutions below

0
On BEST ANSWER

As per this answer of the duplicate, use common_norm=False.

Also see seaborn histplot and displot output doesn't match.

This is not specific to stat='percent'. Other options are 'frequency', 'probability', and 'density'.

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')

fig, axes = plt.subplots(nrows=2, figsize=(20, 10), tight_layout=True)

sns.histplot(data=tips, x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=True, ax=axes[0])
sns.histplot(data=tips, x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=False, ax=axes[1])

axes[0].set_title('common_norm=True', fontweight='bold')
axes[1].set_title('common_norm=False', fontweight='bold')

handles = axes[1].get_legend().legend_handles

for ax in axes:
    for c in ax.containers:
        ax.bar_label(c, fmt=lambda x: f'{x:0.2f}%' if x > 0 else '', rotation=90, padding=3, fontsize=8, fontweight='bold')
    ax.margins(y=0.15)
    ax.spines[['top', 'right']].set_visible(False)
    ax.get_legend().remove()

_ = fig.legend(title='Day', handles=handles, labels=tips.day.cat.categories.tolist(), bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)

enter image description here

sns.displot

g = sns.displot(data=tips, kind='hist', x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=False, height=5, aspect=4)

ax = g.axes.flat[0]  # ax = g.axes[0][0] also works

for c in ax.containers:
    ax.bar_label(c, fmt=lambda x: f'{x:0.2f}%' if x > 0 else '', rotation=90, padding=3, fontsize=8, fontweight='bold')

enter image description here