I am using Detectron2 (Mask-RCNN Model) and passed by:
_C.INPUT.MIN_SIZE_TEST = (800, 832, 864, 896)
_C.INPUT.MAX_SIZE_TEST = 1333
How is it possible to have different input image sizes? How are they entered into the model and Shouldn't the model have a consistent input size?
I tried to check the documentation but didnt find a clear answer.
With a given kernel size and stride, Convolutional layers can process any input size and return the feature map with corresponding output dimensions.
The subsequent FCs do require a fixed input vector. This is where the Mask-RCNN uses RoI (Region of Interest) align that converts the region proposal to a fixed size for subsequent processing by the network. It has the same goals as RoI pool in a Fast-RCNN model.
Hope this explains why the input size need not be fixed.