I have read that one can increase steps per epoch by 2 or 3 times when applying image augmentation. I have also done that in the past with no issue, however this time I've got an error telling me I have run out of training data
Another issue I have is my steps per epoch in keras is always showing the same number even when I changed batchsize.
Here are some specs and sameple code:
training data size: 6149
validation data size: 1020
batchsizes tried: 32 and 64
augmentation and preprocessing steps:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet import preprocess_input
datagen = ImageDataGenerator(rotation_range=90,
horizontal_flip=True,
vertical_flip=True,
# featurewise_center=True,
zoom_range=0.3,
shear_range=0.5,
preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory('data/train', seed=42)
validation_generator = datagen.flow_from_directory('data/validation', seed=42)
>Found 6149 images belonging to 102 classes.
>Found 1020 images belonging to 102 classes.
fitting:
batch_size = 64 #or 32
model.fit(train_generator,
epochs=100,
batch_size=batch_size,
# steps_per_epoch=3*training_size//batch_size, - gives error
validation_data=validation_generator,
callbacks=[early_stopping])
example output with batch_size 32 or 64:
Epoch 1/100
193/193 [==============================] - 3284s 17s/step - loss: 0.3660 - categorical_accuracy: 0.5357 - val_loss: 0.4022 - val_categorical_accuracy: 0.5176
error shown when I specified steps_per_epoch:
Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs
batches (in this case, 57600 batches). You may need to use the repeat() function when building your dataset
Summary:
- Why is my steps per epoch always at 193 regardless of changing the batchsize? (without specifying steps_per_epoch)
- Specifying steps_per_epoch resulted in error of not enough data.
- Could there be a problem with my code for the generator?
- side question: I have a large number of classes - 102, should my batch size be larger than that? I always overfit this data
You have not defined the
batch_size
of the dataset at the dataset generator, so model has taken by defaultbatch_size=32
and divided with the training data size to calculate the number of steps per epochs.steps_per_epoch = training_data size/batch_size
We should not specify
batch_size
inmodel.fit()
, If the dataset is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches) and as you have not defined batches in generator, so by defaultbatch_size=32
taken, which do not change when you specify thebatch_size=64
in model.fit(). Please refer to the mentioned link for more details.You can specify any number for the
batch_size
at the dataset generation, which will show effect in steps per epochs while model training:Please have a look at this replicated gist.