when I use Undersampling code, But it seems to drop Major Class up to same ratio that the number of major Class is as same as the number of minor class.(50% vs 50%)
I want to make 70% for Major class when there is a 30% of Minor class.
How can I handle this problem and what is the parameter for setting weight between Major and Minor Class?
sampler = RandomUnderSampler(ratio={1: 1000, 0: 65})
X_rs, y_rs = sastrong textmpler.fit_sample(X, y)
print('Random undersampling {}'.format(Counter(y_rs)))
Answer
So basically, the
RandomUnderSampler(sampling_strategy = X)
uses a strategy in which, the minority class is X percent of the majority class. Therefore, if you choose that X=1, you will have a result similar toauto
which makes the two classes 100% balanced.Now if you choose 0.9, you will make the
minority
class 90% of themajority
class.Therefore, if you want your total set to be 70%-30% you will need to do some math (jokes).
sampling_strategy = 0.5
, because we are using ratios, if the majority class is double the size of the minority class, we get 1/3 vs 2/3, which is approximately what you are looking for.TLDR
sampling_strategy
takes the ratio and you want a 30/70 ratio. Therefore, you have to pass it 3/7 which is0.428
.