Why does my validation loss / accuracy fluctuate despite manual test show good results

383 Views Asked by At

I am training an EfficientNet lite (from scratch) on a dataset of ~10.000.000 images (128x128x1) with ~6500 classes. My training loss is converging as well as my training accuracy. However, my test loss/accuracy are fluctuating. When I test the CNN manually on some input it is looking very good and recognizes (nearly) everything correctly. Because my GPU memory is only 8GB I am training with batch size 256 and fp16 calculations.

Now my question is why does the train loss/acc is fluctuating so much and is there something to correct for that?

loss

accuracy

Here are some (maybe) important Details:

Loading the data Set:

tr_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    DATA_PATH,
    labels="inferred",
    label_mode="categorical",
    interpolation="bilinear",
    color_mode="grayscale",
    batch_size=bs,
    image_size=img_size,
    shuffle=True,
    seed=123,
    validation_split=val_split,
    subset="training"
)

My model official TF implementation:

def instantiate_char_cnn(include_augmentation=False, name=NAME):
    eff_net_lite = EfficientNetLiteB0(
        include_top=True,
        weights=None,
        input_shape=(img_size[0], img_size[1], 1),
        classes=len(ls),
        pooling="avg",
        classifier_activation="softmax",
    )

    if(img_augmentation):
        model = tf.keras.Sequential([
            tf.keras.layers.InputLayer(input_shape=(None, None, 1)),
            PreprocessTFLayer(),
            img_augmentation,
            eff_net_lite,
        ],
        name=name)

The custom layer for preprocessing:

@tf.function
def preprocess_tf(x):
    """
    Preprocessing for TF Lite.
    
    Args:
        x : a Tensor(batch_size, height, width, channels) of images to preprocess
        
    Return: 
        normalized and resized Tensor of images
    """
    
    batch, height, width, channels = x.shape
    
    # resize images
    x = tf.image.resize(x, img_size, method=tf.image.ResizeMethod.BILINEAR)
    
    # normalize image between [0, 1]
    x = tf.math.divide(x, tf.math.reduce_max(x))

    return x


class PreprocessTFLayer(tf.keras.layers.Layer):
    def __init__(self, name="preprocess_tf", **kwargs):
        super(PreprocessTFLayer, self).__init__(name=name, **kwargs)
        self.preprocess = preprocess_tf

    def call(self, input):
        return self.preprocess(input)

    def get_config(self):
        config = super(PreprocessTFLayer, self).get_config()
        return config
    
    def get_prunable_weights(self):
        return [] 

The keras layers for image augmentation:

from tensorflow.keras.layers.experimental.preprocessing import Resizing, Rescaling, RandomZoom, RandomRotation, RandomTranslation

img_augmentation = tf.keras.Sequential(
    [
        RandomErasing.RandomErasing(probability=0.4),
        
        # random data augmentation
        RandomZoom(height_factor=(-0.2, 1.0), width_factor=(-0.2, 1.0), 
            fill_mode='constant', interpolation='bilinear', fill_value=0.0
        ),
        RandomTranslation(0.2, 0.2, fill_mode="constant"),
        RandomRotation(factor=(-0.1, 0.1) , fill_mode='constant', interpolation='bilinear'),
    ],
    name = "img_augmentation"
)
1

There are 1 best solutions below

1
On

There could be many reasons behind this phenomenon and there could be human error involved. The key is how to troubleshoot. Manual inspection sometimes would not give you useful hints.

  1. Try your implementation on some simpler datasets, such as ImageNet or CIFAR-100 to see whether the same phenomenon could reproduce. This helps you make sure you don't have some bug in your evaluation code.
  2. Random shuffle and split your dataset to train, validation, and test sets. Train your model again to see whether the same phenomenon could reproduce. This helps you make sure that the distribution of the train, validation, and test sets are close and the phenomenon is not due to test set distribution mismatch.
  3. Turn off FP16 and reduce batch size to see whether the same phenomenon could reproduce. This helps you make sure FP16 is not causing any numerical issues.
  4. Use a more reliable implementation, such as the PyTorch or TensorFlow official neural network implementations (it could be a different network such as ResNet), for your task to see whether the same phenomenon could reproduce. This helps you make sure your model (EfficientNet) implementation is not too problematic.