I got X_test values outside the range I specified in the normalization function, why I am getting those and how can I solve it? (This range [:,14:] in X_train and X_test where set because, in my dataset, the numerical values start in that column)
from sklearn.preprocessing import MinMaxScaler
scalar = MinMaxScaler(feature_range=(-1,1))
X_train[:,14:]=scalar.fit_transform(X_train[:,14:])
X_test[:,14:]=scalar.transform(X_test[:,14:])
By plotting the X_train and X_test, we can appreciate that the values in X_train are within the range, while in the X_test there are some values outside that range.
This is X_train plot
This is X_test plot
Why is this happening?
You do everything right, and its the normal behavior.
Let's have a look at the offical docs to give you an idea what is going on, the only difference is that we use the feature_range=(0, 1) instead of (-1,1).
what happened here? The training data is transformed by:
where max and min in the feature range
So we getting in the range from 0 - 1
Now we are running it for the new test set, where we don't fit the scaler again, as you are doing also in your case:
So as you can see, the output is also outsite the range. That happens because for the first value the formula is: