Why doesn't the MinMaxScaler change the sns.pairplot of the dataset?

78 Views Asked by mvinegret At 26 May 2022 at 15:40

I'm trying to create a pairplot of my dataset, where the variables are vastly different numbers (some are in the 0-1 range, some, like age and Monthly Income, can go way higher) and I want to scale those variables that go above 1 to 0-1 using the following code:

scale_vars=['MonthlyIncome','age','NumberOfTime30-59DaysPastDueNotWorse','DebtRatio','NumberOfOpenCreditLinesAndLoans',
            'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines','NumberOfTime60-89DaysPastDueNotWorse',
            'NumberOfDependents']
scaler=MinMaxScaler(copy=False)
train2[scale_vars]=scaler.fit_transform(train2[scale_vars])

My problem is that after scaling the variables and creating the pairplot again, it doesn't change at all. Do you know what might be the cause for this? Here's the code I use to create a pairplot:

g=sns.pairplot(train2, hue='SeriousDlqin2yrs', diag_kws={'bw':0.2})

where SeriousDlqin2yrs is the Y variable.

Original Q&A

There are 1 best solutions below

Arne On 26 May 2022 at 21:19 BEST ANSWER

The plots are expected to look the same, but not exactly - the tick labels should be different. The scaler does a linear transformation, and seaborn chooses the axis limits based on the range of values, so the arrangement of points in the scatter plots does not change.

Since I do not have your data, here is the same effect with Ronald Fisher's classic iris dataset:

import pandas as pd
import seaborn as sns; sns.set()
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler

iris_dict = load_iris(as_frame=True)
iris = iris_dict['data']
iris['species'] = iris_dict['target']

g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})

scale_vars = ['sepal length (cm)', 'sepal width (cm)', 
              'petal length (cm)', 'petal width (cm)']
scaler = MinMaxScaler(copy=False)
iris[scale_vars] = scaler.fit_transform(iris[scale_vars])

g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})

Note that the column names should have been changed when the scaling was done, because these are no longer centimeters.

Why doesn't the MinMaxScaler change the sns.pairplot of the dataset?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCALE

Related Questions in PAIRPLOT

Trending Questions

Popular # Hahtags

Popular Questions