validation_split parameter is able to allow ImageDataGenerator to split the data sets reading from the folder into 2 different disjoint sets. Is there any way to create 3 sets - of training, validation, and evaluation datasets using it?
I am thinking about splitting the dataset into 2 datasets, then splitting the 2nd dataset into another 2 datasets
datagen = ImageDataGenerator(validation_split=0.5, rescale=1./255)
train_generator = datagen.flow_from_directory(
TRAIN_DIR,
subset='training'
)
val_generator = datagen.flow_from_directory(
TRAIN_DIR,
subset='validation'
)
Here I am thinking about splitting the validation dataset into 2 sets using val_generator. One for validation and the other for evaluation? How should I do it?
I mostly have been splitting data in 80/10/10 for training, validation and test respectivelly.
When working with keras I favor the
tf.data
API as it provides a good abstraction for complex input pipelinesIt does not provide a simple
tf.data.DataSet.split
functionality thoughI have this function (that I found from someone's code and my source is missing) which I consistently use
Firstly read your data set, and get its size(with cardianlity method), then pass it into the function and you're good to go!
This function can be given a flag to shuffle the original dataset before creating the splits, this is useful to have more realistic validation and test metrics.
The seed for shuffling is fixed so that we can run the same function and the splits remain the same, which we want for consistent results.