What layers are affected by dropout layer in Tensorflow?

4.9k Views Asked by At

Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.

What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?

In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?

import os

from tensorflow.keras import layers
from tensorflow.keras import Model
  
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = 'weights.h5'

pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()

last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])
2

There are 2 best solutions below

0
On BEST ANSWER

The dropout layer will affect the output of the previous layer.

If we look at the specific part of your code:

x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)  

In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.

2
On

Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.

  • Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
  • Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.

Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).