Dropout with densely connected layer

1.2k Views Asked by At

Iam using a densenet model for one of my projects and have some difficulties using regularization.

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

So I decided to use dropout to avoid overfitting. When using Dropout, both validation and training loss decrease to about 0.13 during the first epoch and remain constant for about 10 epochs.

After that both loss functions decrease in the same way as without dropout, resulting in overfitting again. The final loss value is in about the same range as without dropout.

So for me it seems like dropout is not really working.

If I switch to L2 regularization though, Iam able to avoid overfitting, but I would rather use Dropout as a regularizer.

Now Iam wondering if anyone has experienced that kind of behaviour?

I use dropout in both the dense block (bottleneck layer) and in the transition block (dropout rate = 0.5):

def bottleneck_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=4 * self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[3,3], layer_name=scope+'_conv2')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        return x

def transition_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)
        x = Average_pooling(x, pool_size=[2,2], stride=2)

        return x
2

There are 2 best solutions below

2
On BEST ANSWER

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

This is not overfitting.

Overfitting starts when your validation loss starts increasing, while your training loss continues decreasing; here is its telltale signature:

enter image description here

The image is adapted from the Wikipedia entry on overfitting - diferent things may lie in the horizontal axis, e.g. depth or number of boosted trees, number of neural net fitting iterations etc.

The (generally expected) difference between training and validation loss is something completely different, called the generalization gap:

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

where, practically speaking, validation data is unseen data indeed.

So for me it seems like dropout is not really working.

It can very well be the case - dropout is not expected to work always and for every problem.

1
On

Interesting problem,
I would recommend plotting the validation loss and the training loss to see if it is really overfitting. if you see that the validation loss didn't change while the training loss dropped ( you will also probably see a large gap between them ) then it is overfitting.

If it is overfitting then try to reduce the number of layers or the number of nodes ( also play a little with the Dropout rate after you do that). Reducing the number of epochs could also be helpful.

If you would like to use a different method instead of dropout I would recommend using the Gaussian Noise layer.
Keras - https://keras.io/layers/noise/
TensorFlow - https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise