validation_split parameter is able to allow ImageDataGenerator to split the data sets reading from the folder into 2 different disjoint sets. Is there any way to create 3 sets - of training, validation, and evaluation datasets using it?
I am thinking about splitting the dataset into 2 datasets, then splitting the 2nd dataset into another 2 datasets
datagen = ImageDataGenerator(validation_split=0.5, rescale=1./255)
train_generator = datagen.flow_from_directory(
TRAIN_DIR,
subset='training'
)
val_generator = datagen.flow_from_directory(
TRAIN_DIR,
subset='validation'
)
Here I am thinking about splitting the validation dataset into 2 sets using val_generator. One for validation and the other for evaluation? How should I do it?
I like working with the
flow_from_dataframe()method ofImageDataGenerator, where I interact with a simple Pandas DataFrame (perhaps containig other features), not with the directory. But you can easily change my code if you insist onflow_from_directory().So this is my go-to function, e.g. for a regression task, where we try to predict a continuous
y:Things to notice:
train_test_split) which are used to filter the DataFrame index.validation_splitparameter for the training generatorimages_dfis a DataFrame somewhere in global memory with proper columns likeimg_fileandy.shufflevalidation and test generatorsThis can be further generalized for multiple outputs, classification, what have you.