I have a question about the procedure for fine-tuning a pre-trained object detection model with GluonCV, described in this tutorial.
As far as I understand, the described procedure modifies all the weight values in the model. I wanted to only fine-tune the fully connected layer at the end of the network, and freeze the rest of the weights.
I assume that I should specify which parameters I want to modify when creating the Trainer:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001, 'wd': 0.0005, 'momentum': 0.9})
so, instead of net.collect_params(), I should list the parameters I’m interested in training, and run the rest of the process normally. However, I don’t know how to isolate these parameters precisely…I tried printing:
params = net.collect_params()
but, out of this list, I don’t know which ones correspond to the final FC layers. Any suggestions?
Let's say we have a pretrained Gluon model for a classification task:
To fine-tune this convolutional network, we want to freeze all the blocks except
Dense
.First, recall that
collect_params
method accepts a regexp string to choose specific block parameters by their names (or prefixes;prefix
parameter ofConv2D
,Dense
, or any other Gluon (hybrid) block). By default, the prefixes are class names, i.e. if a block isConv2D
then the prefix isconv0_
orconv1_
etc. Moreover,collect_params
returns an instance ofmxnet.gluon.parameter.ParameterDict
, which hassetattr
method.Solution:
or simply
Here we exclude all the parameters matching
dense
to get onlyconv
blocks and set theirgrad_req
attributes to'null'
. Now, training the modelnet
withmxnet.gluon.Trainer
will update onlydense
parameters.It is more convenient to have a pretrained model with separate attributes indicating specific blocks, e.g. the features block, anchor generators etc. In our case, we have a convolutional network that extracts features and passes them to an output block.
With this convnet declaration, we don't have to use regexps to access required blocks:
Gluon CV models follow exactly this pattern. See the documentation of the desired model and choose the blocks you would like to freeze. If the docs are empty, run
collect_params
to see all the parameters and filter out with regexp the ones to fine-tune and set the returned parameters'grad_req
to'null'
.