I would like to know whether the sampling of positive/negative boxes during the loss calculation within the RPN of a Faster R-CNN w.r.t. to a FPN Backbone is done over all layers (i.e. if N=256, we only sample N boxes in total) or if the boxes are sampled layerwise (i.e. if N=256, we sample N*num_fpn_layers boxes in total). I.e. do we apply the loss layerwise or once over all layers?
Best Simon