I am trying to finetune a pretrained model in mxnet: ResNet50_v1. This model does not have dropout and I would like to add it to avoid overfitting and make it look similar to the last layers of I3D_Resnet50_v1_Kinetics400. I tried to do the following but when training I get an error:
Last layers of original network (ResNet50_v1):
...
(8): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCHW)
)
(output): Dense(2048 -> 1000, linear)
My attempt:
classes = 2
model_name = 'ResNet50_v1'
finetune_net = get_model(model_name, pretrained=True)
with finetune_net.name_scope():
finetune_net.output = nn.Dense(2048, in_units=2048)
finetune_net.head = nn.HybridSequential()
finetune_net.head.add(nn.Dropout(0.95))
finetune_net.head.add(nn.Dense(2, in_units=2048))
finetune_net.fc = nn.Dense(2, in_units=2048)
finetune_net.output.initialize(init.Xavier(), ctx = ctx)
finetune_net.head.initialize(init.Xavier(), ctx = ctx)
finetune_net.fc.initialize(init.Xavier(), ctx = ctx)
finetune_net.collect_params().reset_ctx(ctx)
finetune_net.hybridize()
Last layers of the modified network (ResNet50_v1):
...
(8): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCHW)
)
(output): Dense(2048 -> 2048, linear)
(head): HybridSequential(
(0): Dropout(p = 0.95, axes=())
(1): Dense(2048 -> 2, linear)
)
(fc): Dense(2048 -> 2, linear)
)
Last layers of I3D_Resnet50_v1_Kinetics400:
...## Heading ##
(st_avg): GlobalAvgPool3D(size=(1, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCDHW)
(head): HybridSequential(
(0): Dropout(p = 0.8, axes=())
(1): Dense(2048 -> 2, linear)
)
(fc): Dense(2048 -> 2, linear)
This is what params of the modifies network look like
Parameter resnetv10_dense1_weight (shape=(2048, 2048), dtype=float32) write
Parameter resnetv10_dense1_bias (shape=(2048,), dtype=float32) write
Parameter resnetv10_dense2_weight (shape=(2, 2048), dtype=float32) write
Parameter resnetv10_dense2_bias (shape=(2,), dtype=float32) write
Parameter resnetv10_dense3_weight (shape=(2, 2048), dtype=float32) write
Parameter resnetv10_dense3_bias (shape=(2,), dtype=float32) write
Error when training:
/usr/local/lib/python3.7/dist-packages/mxnet/gluon/block.py:825: UserWarning: Parameter resnetv10_dense3_bias, resnetv10_dense3_weight, resnetv10_dense2_bias, resnetv10_dense2_weight is not used by any computation. Is this intended? out = self.forward(*args)
UserWarning: Gradient of Parameter resnetv10_dense2_bias
on context gpu(0) has not been updated by backward since last step
. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient
dense2 and dense3, the ones I have added as new dense layers are not being updated. dense1 was already in the model, I just changed the output from 1000 to 2048.
Any help woul be very much appreciated as I am quite stuck ...
Since you assign new layers to the model, you should reimplement
hybrid_forward
(orforward
) method to include them in computations: