I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler()
and SMOTE()
.
I am working on the loan_status dataset.
I have done the following split.
X = df.drop(['Loan_Status'],axis=1).values # independant features
y = df['Loan_Status'].values# dependant variable
This is how my training data's distribution looks like.
this is the code snippet that i tried to execute for class-balancing.
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import make_pipeline
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
pipeline = make_pipeline(over,under)
x_sm,y_sm = pipeline.fit_resample(X_train,y_train)
it gave me a ValueError with the following traceback:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_64588/3438707951.py in <module>
4 pipeline = make_pipeline(over,under)
5
----> 6 x_copy,y_copy = pipeline.fit_resample(x_train_copy,y_train_copy)
~\Anaconda3\lib\site-packages\imblearn\pipeline.py in fit_resample(self, X, y, **fit_params)
351 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
352 if hasattr(last_step, "fit_resample"):
--> 353 return last_step.fit_resample(Xt, yt, **fit_params_last_step)
354
355 @if_delegate_has_method(delegate="_final_estimator")
~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
77 X, y, binarize_y = self._check_X_y(X, y)
78
---> 79 self.sampling_strategy_ = check_sampling_strategy(
80 self.sampling_strategy, y, self._sampling_type
81 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)
532 return OrderedDict(
533 sorted(
--> 534 _sampling_strategy_float(sampling_strategy, y, sampling_type).items()
535 )
536 )
~\Anaconda3\lib\site-packages\imblearn\utils\_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
391 ]
392 ):
--> 393 raise ValueError(
394 "The specified ratio required to generate new "
395 "sample in the majority class while trying to "
ValueError: The specified ratio required to generate new sample in the majority class while trying to remove samples. Please increase the ratio.
You have to increase the sampling strategy for the
SMOTE
because((y_train==0).sum())/((y_train==1).sum())
is higher than0.1
. It seems that your starting imbalance ratio is about (by eye)0.4
. Try:Finally you probably want an equal final ratio (after the under-sampling) so you should set the sampling strategy to
1.0
for theRandomUnderSampler
:Try this way and if you have other problems give me a feedback.