RetinaNet feature maps dimensional issue

43 Views Asked by At

I've been reading a lot about object detection and specifically on RetinaNet. But the implementation in this part is not that clear to me.

it's said, the feature maps from all pyramid levels are passed to the weight shared sub-networks for classification and bounding box regression.

But how come this is possible, when the weights of the sub-networks are shared across all pyramid levels? The output would be of a different dimension, because from my understanding, the last layer of each sub-networks is fully connected to the output, if I'm not mistaken. In the original paper it's not clarified. Is there some zero padding happening here?

In the Faster-RCNN architectures, ROI pooling layer is applied to address this dimensional issue, but in this case I'm lost..

1

There are 1 best solutions below

0
On

All the subnetworks are fully-convolutional (with standard zero-padding). They don't care about the image dimension (height and width).

The channel dimension is kept the same through the FPN structure. That part is not weight-shared.