Image augmentation with SMOTE oversampling as batches without running out of RAM

967 Views Asked by Rubaiyat Riddhee At 17 August 2025 at 18:26

I am trying to use an unbalanced dataset to feed a neural network. I am using colab. I found this code on kaggle which uses keras ImageDataGenerator for augmentation and SMOTE to oversample the data:

Augmentation:

ZOOM = [.99, 1.01]
BRIGHT_RANGE = [0.8, 1.2]
HORZ_FLIP = True
FILL_MODE = "constant"
DATA_FORMAT = "channels_last"

work_dr = ImageDataGenerator(rescale = 1./255, brightness_range=BRIGHT_RANGE, zoom_range=ZOOM, data_format=DATA_FORMAT, fill_mode=FILL_MODE, horizontal_flip=HORZ_FLIP)

train_data_gen = work_dr.flow_from_directory(directory=WORK_DIR, target_size=DIM, batch_size=6500, shuffle=False)

Then he uses next() iterator to load the images:

train_data, train_labels = train_data_gen.next()
print(train_data.shape, train_labels.shape)

Which gives the following outuput:

(6400, 176, 176, 3) (6400, 4)

At this point it has already consumed about 70% of my RAM on Colab not to mention the time taken to load the images. Notice, the batch size is set to 6500 which is a very large but if I set it to something like 32 or 64, then only the first batch is loaded when I use next() Then, to oversample the data, he uses SMOTE:

#Performing over-sampling of the data, since the classes are imbalanced

sm = SMOTE(random_state=42)

train_data, train_labels = sm.fit_resample(train_data.reshape(-1, IMG_SIZE * IMG_SIZE * 3), train_labels)

train_data = train_data.reshape(-1, IMG_SIZE, IMG_SIZE, 3)

print(train_data.shape, train_labels.shape)

This should give the following output:

(12800, 176, 176, 3) (12800, 4)

But instead it overloads my memory and Colab crashes due do RAM shortage. I am not very good at coding so I am having difficulty implementing what I want. What I want is to feed batches of augmented and oversampled data to my neural network without loading the entire dataset at once and thus saving memory. My question is, is there a way to do this? If so, could you please show me how to do it?

Original Q&A

There are 1 best solutions below

Suru On 14 May 2022 at 15:12

I came across the same problem. You can run the code by copying it in kaggle itself and it runs very smoothly on kaggle. Hope this helps!!

Image augmentation with SMOTE oversampling as batches without running out of RAM

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in KERAS

Related Questions in GOOGLE-COLABORATORY

Related Questions in SMOTE

Trending Questions

Popular # Hahtags

Popular Questions