KDE plots of multiple features of grouped data

55 Views Asked by At

I made a poorly worded post with no reproducible example which, understandably, failed to get a proper answer.

Now I am providing a minimal reproducible example - and I hope that this time I will get a better answer.

I have a pandas dataframe which contains 9 features. The records belong to several categories (captured in the column "CRG_mean"). For simplicity I will include only 2 such categories ("1" and "4"):

data = {'total_nr_trx_mean': {1: 77.34375, 2: 123.46875, 3: 118.6875, 13: 130.875},
 'nr_debit_trx_mean': {1: 47.90625, 2: 43.78125, 3: 68.25, 13: 85.9375},
 'volume_debit_trx_mean': {1: 4366543.875,
  2: 16487596.3125,
  3: 15126816.25,
  13: 4066733.09375},
 'nr_credit_trx_mean': {1: 29.4375, 2: 79.6875, 3: 50.4375, 13: 44.9375},
 'volume_credit_trx_mean': {1: 3981240.15625,
  2: 16384595.6875,
  3: 14826997.84375,
  13: 4098643.40625},
 'min_balance_mean': {1: -8024608.28125,
  2: -6247504.75,
  3: -51456047.96875,
  13: -7997062.75},
 'max_balance_mean': {1: 7918533.59375,
  2: 27815917.28125,
  3: -43278203.84375,
  13: 139045.3125},
 'credit_application_mean': {1: 0.0, 2: 0.0, 3: 0.125, 13: 0.1875},
 'nr_credit_applications_mean': {1: 0.0, 2: 0.0, 3: 0.15625, 13: 0.1875},
 'CRG_mean': {1: 1.0, 2: 1.0, 3: 4.0, 13: 4.0}}

I used the following code:

import warnings
warnings.filterwarnings('ignore')

import seaborn as sns
crg1 = selected.loc[selected['CRG_mean'] == 1]
crg4 = selected.loc[selected['CRG_mean'] == 4]

sns.set_style('whitegrid')
plt.figure()
fig, ax = plt.subplots(3,3,figsize=(18,18))

for j, feature in enumerate(mean_group_cols):
    plt.subplot(3,3,j+1)
    sns.kdeplot(crg1[feature], bw=0.5,label="Class 1")
    sns.kdeplot(crg4[feature], bw=0.5,label="Class 4")
    plt.xlabel(feature, fontsize=12)
    locs, labels = plt.xticks()
    plt.tick_params(axis='both', which='major', labelsize=12)
plt.show();

to produce this plot: enter image description here

I don't like the fact that in the loop I have to write a separate line for each category. Also, I don't like the fact that the legend is missing from all the plots.

@JohanC was kind to post as a comment the following advice - to replace the repeating lines with sns.kdeplot(c_agg_filtered[feature], bw_method=0.5, hue='Class', common_norm=False)

However, this produces an error "ValueError: The following variable cannot be assigned with wide-form data: hue"

So my questions are:

  1. How to produce the kde plots for each category (in the original data there are 7) in a single line in the loop?

  2. How can I add a legend? I don't mind whether the legend is put on each individual plot or if there is one legend for all the plots (the categories are the same in all 9 KDE plots).

0

There are 0 best solutions below