Duplicating samples of time series

352 Views Asked by At

I have a highly imbalanced dataset:

from collections import Counter
unique1, counts1 = np.unique(labels_ds  , return_counts=True)
dict(zip(unique1, counts1))
print('Original dataset shape {}' .format(counts1))     

#returns
#Original dataset shape [  353    88  6656     1  1757    52  5480   226   452   125  5992   559 497  6134 29747]

In this case I can't just simply use RandomOverSampler as it returns n=29747*1 samples nor SMOTE because it requires at least k_neighbors=1 Therefore, I want to duplicate the sample of the minority class and then apply SMOTE. Note: my features and labels are in seperate arrays and they are of shape # (27, 28, 500, 500, 15) and # (27, 500, 500, 1) respectively.

idx = np.where(labels == 6)
print(idx)
# returns (array([23]), array([459]), array([429]), array([0]))

So my question is, how can I duplicate (x6) both features and labels of the minority class (perhaps based on the index of the minority class label)?

Edit my entire image is gridded as 9x3 ==>27 "Cells" each cell has 28 images (satellite images, 28 is with respect to time), 500x500 is the height and width, 15 is the number of the features, and 1 because it's a pixel-based classification and thus I have "an image of labels so to speak"

0

There are 0 best solutions below