I am trying to use an unbalanced dataset to feed a neural network. I am using colab. I found this code on kaggle which uses keras ImageDataGenerator for augmentation and SMOTE to oversample the data:
Augmentation:
ZOOM = [.99, 1.01]
BRIGHT_RANGE = [0.8, 1.2]
HORZ_FLIP = True
FILL_MODE = "constant"
DATA_FORMAT = "channels_last"
work_dr = ImageDataGenerator(rescale = 1./255, brightness_range=BRIGHT_RANGE, zoom_range=ZOOM, data_format=DATA_FORMAT, fill_mode=FILL_MODE, horizontal_flip=HORZ_FLIP)
train_data_gen = work_dr.flow_from_directory(directory=WORK_DIR, target_size=DIM, batch_size=6500, shuffle=False)
Then he uses next() iterator to load the images:
train_data, train_labels = train_data_gen.next()
print(train_data.shape, train_labels.shape)
Which gives the following outuput:
(6400, 176, 176, 3) (6400, 4)
At this point it has already consumed about 70% of my RAM on Colab not to mention the time taken to load the images. Notice, the batch size is set to 6500 which is a very large but if I set it to something like 32 or 64, then only the first batch is loaded when I use next() Then, to oversample the data, he uses SMOTE:
#Performing over-sampling of the data, since the classes are imbalanced
sm = SMOTE(random_state=42)
train_data, train_labels = sm.fit_resample(train_data.reshape(-1, IMG_SIZE * IMG_SIZE * 3), train_labels)
train_data = train_data.reshape(-1, IMG_SIZE, IMG_SIZE, 3)
print(train_data.shape, train_labels.shape)
This should give the following output:
(12800, 176, 176, 3) (12800, 4)
But instead it overloads my memory and Colab crashes due do RAM shortage. I am not very good at coding so I am having difficulty implementing what I want. What I want is to feed batches of augmented and oversampled data to my neural network without loading the entire dataset at once and thus saving memory. My question is, is there a way to do this? If so, could you please show me how to do it?
I came across the same problem. You can run the code by copying it in kaggle itself and it runs very smoothly on kaggle. Hope this helps!!