I have a deep neural network made of a combination o modules, such as an encoder, a decoder, etc. Before training, I load a part of its parameters from a pretrained model, just for a subset of modules. For instance, I could load a pretrained encoder. Then I want to freeze the parameters of the pretrained modules so that they are not trained with the rest. In Pytorch:
for param in submodel.parameters()
param.requires_grad = False
Now, should I keep applying dropout to these freezed modules while learning or should I deactivate it (see example below) ? Why?
def MyModel(nn.Module):
...
def forward(x):
if freeze_submodule:
self.submodule.eval() # disable dropout when submodule is frozen
x = self._forward(x)
if freeze_submodule:
self.submodule.train()
Freezing module
You can freeze parameters by setting
requires_grad_(False)
, which is less verbose:This will freeze all
submodel
parameters.You could also use
with torch.no_grad
context manager oversubmodel
forward
pass but it is less common indeed.eval
Running
submodule.eval()
puts certain layers in evaluation mode (BatchNorm
orDropout
). ForDropout
(inverted dropout actually) you can check how it works in this answer.No, as the weights will be unable to compensate dropout's effect which is one of it's goals (to make it more robust and spread information flow across more paths). They will be unable to do it as they are untrainable.
On the other hand, leaving dropout would add more noise and error to the architecture and might force your trainable part of the network to compensate for it, I'd go for experimenting.
Depends,
fastai
community uses smaller learning rates for pretrained modules, still leaving them trainable (see this blog post for example), which makes intuitive sense (task's distribution is somehow different than the one your backbone was pretrained, hence it's reasonable to assume weights need to be adjusted by some amount (possibly small) as well).