In Random under sampling, How can I define drop ratio?

1.5k Views Asked by At

when I use Undersampling code, But it seems to drop Major Class up to same ratio that the number of major Class is as same as the number of minor class.(50% vs 50%)

I want to make 70% for Major class when there is a 30% of Minor class.

How can I handle this problem and what is the parameter for setting weight between Major and Minor Class?

sampler = RandomUnderSampler(ratio={1: 1000, 0: 65})
X_rs, y_rs = sastrong textmpler.fit_sample(X, y)
print('Random undersampling {}'.format(Counter(y_rs)))
1

There are 1 best solutions below

0
On

Answer

So basically, the RandomUnderSampler(sampling_strategy = X) uses a strategy in which, the minority class is X percent of the majority class. Therefore, if you choose that X=1, you will have a result similar to auto which makes the two classes 100% balanced.

Now if you choose 0.9, you will make the minority class 90% of the majority class.

Therefore, if you want your total set to be 70%-30% you will need to do some math (jokes).

sampling_strategy = 0.5, because we are using ratios, if the majority class is double the size of the minority class, we get 1/3 vs 2/3, which is approximately what you are looking for.

TLDR

sampling_strategy takes the ratio and you want a 30/70 ratio. Therefore, you have to pass it 3/7 which is 0.428 .