Steps per epoch (with image augmentation) not changing with batch size

259 Views Asked by At

I have read that one can increase steps per epoch by 2 or 3 times when applying image augmentation. I have also done that in the past with no issue, however this time I've got an error telling me I have run out of training data

Another issue I have is my steps per epoch in keras is always showing the same number even when I changed batchsize.

Here are some specs and sameple code:

training data size: 6149

validation data size: 1020

batchsizes tried: 32 and 64

augmentation and preprocessing steps:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet import preprocess_input
datagen = ImageDataGenerator(rotation_range=90,
                             horizontal_flip=True,
                             vertical_flip=True,
                             # featurewise_center=True,
                             zoom_range=0.3,
                             shear_range=0.5,
                             preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory('data/train', seed=42)
validation_generator = datagen.flow_from_directory('data/validation', seed=42)
>Found 6149 images belonging to 102 classes.
>Found 1020 images belonging to 102 classes.

fitting:

batch_size = 64  #or 32
model.fit(train_generator, 
          epochs=100,
          batch_size=batch_size,
          # steps_per_epoch=3*training_size//batch_size, - gives error
          validation_data=validation_generator,
          callbacks=[early_stopping])

example output with batch_size 32 or 64:

Epoch 1/100
193/193 [==============================] - 3284s 17s/step - loss: 0.3660 - categorical_accuracy: 0.5357 - val_loss: 0.4022 - val_categorical_accuracy: 0.5176

error shown when I specified steps_per_epoch:

Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 57600 batches). You may need to use the repeat() function when building your dataset

Summary:

  1. Why is my steps per epoch always at 193 regardless of changing the batchsize? (without specifying steps_per_epoch)
  2. Specifying steps_per_epoch resulted in error of not enough data.
  3. Could there be a problem with my code for the generator?
  4. side question: I have a large number of classes - 102, should my batch size be larger than that? I always overfit this data
1

There are 1 best solutions below

0
On

You have not defined the batch_size of the dataset at the dataset generator, so model has taken by default batch_size=32 and divided with the training data size to calculate the number of steps per epochs.

steps_per_epoch = training_data size/batch_size

We should not specify batch_size in model.fit(), If the dataset is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches) and as you have not defined batches in generator, so by default batch_size=32 taken, which do not change when you specify the batch_size=64 in model.fit(). Please refer to the mentioned link for more details.

You can specify any number for the batch_size at the dataset generation, which will show effect in steps per epochs while model training:

train_generator = datagen.flow_from_directory(data_train, batch_size=64, seed=42)
validation_generator = datagen.flow_from_directory(data_valid, batch_size=64, seed=42)

Please have a look at this replicated gist.