I'm reading paper about using CNN(Convolutional neural network) for object detection.
Rich feature hierarchies for accurate object detection and semantic segmentation
Here is a quote about receptive field:
The pool5 feature map is 6x6x256 = 9216 dimensional. Ignoring boundary effects, each pool5 unit has a receptive field of 195x195 pixels in the original 227x227 pixel input. A central pool5 unit has a nearly global view,
while one near the edge has a smaller, clipped support.
My questions are:
- What is definition of receptive field?
- How they compute size and location of receptive field?
- How we can compute bounding rect of receptive field using caffe/pycaffe?
1) It is the size of the area of pixels that impact the output of the last convolution.
2) For each convolution and pooling operation, compute the size of the output. Now find the input size that results in an output size of 1x1. Thats the size of the receptive field
3) You don't need to use a library to do it. For every 2x2 pooling the output size is reduced by half along each dimension. For strided convolutions, you also divide the size of each dimension by the stride. You may have to shave off some of the dimension depending on if you use padding for your convolutions. The simplest case is to use padding = floor(kernel size/2), so that a convolution dose not have any extra change on the output size.