I want to perform oversampling using the SMOTE algorithm in python using the library imblearn.over_sampling
. My input data has four target classes. I don't want to oversample all the minority class distribution to match with the majority class distribution. I want to oversample each of my minority classes differently.
When I am using SMOTE(sampling_strategy = 1, k_neighbors=2,random_state = 1000)
, I got following error.
ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.
Then, as per the error, I used a dictionary for "sampling_strategy" as follows,
SMOTE(sampling_strategy={'1.0':70,'3.0':255,'2.0':50,'0.0':150},k_neighbors=2,random_state = 1000)
But, it is giving following error,
ValueError: The {'2.0', '1.0', '0.0', '3.0'} target class is/are not present in the data.
Does anyone know how we can define a dictionary to oversample the data differently using SMOTE?
You have to specify the number of samples you want for each class and pass this dictionary to SMOTE object.
Code:
Output:
Code:
Output:
For more information see the documentation here.
The error you are getting is because the labels specified in the dictionary and the actual labels don't match.