I have a highly imbalanced dataset:
from collections import Counter
unique1, counts1 = np.unique(labels_ds , return_counts=True)
dict(zip(unique1, counts1))
print('Original dataset shape {}' .format(counts1))
#returns
#Original dataset shape [ 353 88 6656 1 1757 52 5480 226 452 125 5992 559 497 6134 29747]
In this case I can't just simply use RandomOverSampler
as it returns n=29747*1 samples
nor SMOTE
because it requires at least k_neighbors=1
Therefore, I want to duplicate the sample of the minority class and then apply SMOTE
.
Note: my features and labels are in seperate arrays and they are of shape # (27, 28, 500, 500, 15)
and # (27, 500, 500, 1)
respectively.
idx = np.where(labels == 6)
print(idx)
# returns (array([23]), array([459]), array([429]), array([0]))
So my question is, how can I duplicate (x6) both features and labels of the minority class (perhaps based on the index of the minority class label)?
Edit my entire image is gridded as 9x3 ==>27 "Cells" each cell has 28 images (satellite images, 28 is with respect to time), 500x500 is the height and width, 15 is the number of the features, and 1 because it's a pixel-based classification and thus I have "an image of labels so to speak"