I have scaled my dataset using the MinMaxScaler form sklearn like this:
from sklearn.preprocessing import MinMaxScaler
# create a StandardScaler object
self.scaler = MinMaxScaler(feature_range=(0, 1))
# fit the scaler to the dataset
self.scaler.fit(self.X_org)
# transform dataset using the scaler
self.X_scalled = pd.DataFrame(self.scaler.transform(self.X_org), columns=self.X_org.columns)
return self.X_scalled
However, I am now using the last 10% of the entire dataset for a validation run also scaling the data with the scaler from the training dataset like so:
X_input_val_data_scalled = pd.DataFrame(self.scaler.transform(X_input_val_data), columns=X_input_val_data.columns)
Now my challenge:
In the training X_org set I get a nicely scaled dataset from 0 to 1. In the scaled validation X dataset I get completely wired data ranging from 7.5 to 8...
What am I doing wrong?
That's is acutally how it is supposed to be a min-max scaler does the scaling as below:
where x is the data where the min-max-scaler is trained on. If data belongs to x then data - min(x) is a positive number smaller than max(x) - min(x) hence the ratio will lie between 0 and 1 but otherwise which is the case in you validation data the ratio doesn't have to be between 0 and 1.