I'm trying to augment the ISIC 2019 dataset images with 9 classes. The 'NV' class is overrepresented (12876 of a total of 25331 images) so I'd like to exclude it from the augmentation process but later on recombine the augmented images and the unchanged 'NV' images.
I'd like to have a have a ImageDataGenerator object like this with a training / validation split so I can use it as an "on-the-fly"-augmentation.
It wasn't possible for me to combine two ImageDataGenerators - as proclaimed on the Internet (here).
I tried the following code but can't figure out how to write an "on-the-fly" data generator. It doesn't work for me as it doesn't even save any images. Also converting the images and saving them for further use will probably take too much time (I'm using Google Drive with only 15GB storage).
from keras.preprocessing.image import ImageDataGenerator
import os
import shutil
# Definieren data folders
data_directory = "/content/dataset" # Path to the ISIC 2019 images
target_directory = "/content/dataset_aug" # Path to the augmented images already sorted in subfolders
# Create dataset_aug folder
os.makedirs(target_directory, exist_ok=True)
# Class names
class_names = ["AK", "BCC", "BKL", "DF", "MEL", "NV", "SCC", "UNK", "VASC"]
# Augmentation parameters
datagen_aug = ImageDataGenerator(
rescale=1/255.,
rotation_range=180,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest'
)
# Iterate over all classes and do an augmentation
for class_name in class_names:
class_directory = os.path.join(data_directory, class_name)
target_class_directory = os.path.join(target_directory, class_name)
# Create target directory
if not os.path.exists(target_class_directory):
os.makedirs(target_class_directory)
if class_name == "NV":
# Copy all NV images unchanged to the target directory
image_files = os.listdir(class_directory)
for image_file in image_files:
source_path = os.path.join(class_directory, image_file)
target_path = os.path.join(target_class_directory, image_file)
shutil.copy(source_path, target_path)
else:
# Do the augmentation for the other classes and save them in their target directories
image_generator = datagen_aug.flow_from_directory(
class_directory,
target_size=(224, 224),
batch_size=32,
class_mode=None,
save_to_dir=target_class_directory,
save_prefix='aug_',
save_format='png'
)
num_augmented_images = 9200 # Number of augmented images per class
for i in range(num_augmented_images):
batch = next(image_generator)
if (i + 1) % 100 == 0:
print(f"Generated {i+1} augmented images for class {class_name}")
print("Data augmentation completed.")
There are a few ways you can approach this:
Use two separate ImageDataGenerator instances - one for augmenting the other classes, and one without augmentation for the "NV" class. Then concatenate or merge the outputs when loading the data.
Subclass ImageDataGenerator to customize the augmentation logic. In the flow and flow_from_directory methods, you can check the class name and apply different augmentation depending on the class.
Manually apply augmentation on the "NV" images first to create more samples. Then combine the augmented "NV" images with the originals and pass the full dataset through ImageDataGenerator.
Here is an example of approach 1:
The key idea is to generate the augmented and non-augmented images separately, and then concatenate them when loading the data.
2nd approach:
This approach augmenting the dataset while excluding the 'NV' class: