Detections are way off and mAP is always zero with M-RCNN training

351 Views Asked by At

I'm attempting to apply Matterport's Mask-RCNN setup to my own data, but despite all of the great examples of impressive detections I've seen out there, I'm really struggling to get results that are at all promising, and so I'm suspecting that there's something fundamental I'm overlooking in my setup.

My dataset consists of aerial RGB shots of a city, with two classes: tree and background.

Image info: Aerial RGB photos, all 512x512, training: 324, validation: 36, using random crops of 128x128.

~46 trees per image on average.

Each training session ends up with something looking pretty similar to this:

image

With the following rough stats when testing on the validation set with no image cropping using the inspect_model.ipynb as a guide:

Original image shape:  [512 512   3]
Processing 1 images
image                    shape: (512, 512, 3)         min:   23.00000  max:  255.00000  uint8
molded_images            shape: (1, 512, 512, 3)      min:   23.00000  max:  255.00000  uint8
image_metas              shape: (1, 14)               min:    0.00000  max:  512.00000  int64
anchors                  shape: (1, 65280, 4)         min:   -0.17712  max:    1.11450  float32
gt_class_id              shape: (12,)                 min:    1.00000  max:    1.00000  int32
gt_bbox                  shape: (12, 4)               min:   20.00000  max:  512.00000  int32
gt_mask                  shape: (512, 512, 12)        min:    0.00000  max:    1.00000  float64
AP @0.50:    0.000
AP @0.55:    0.000
AP @0.60:    0.000
AP @0.65:    0.000
AP @0.70:    0.000
AP @0.75:    0.000
AP @0.80:    0.000
AP @0.85:    0.000
AP @0.90:    0.000
AP @0.95:    0.000
AP @0.50-0.95:   0.000

I keep getting the same results (seemingly high confidence with zero or very close to zero IoU, generally clustered at the tops of the images), even after implementing advice I've found elsewhere in the Mask-RCNN repo (for small datasets) such as only training on heads, initializing with coco weights but not for too long, adjusting my anchor scales to match the general sizes and aspect ratios of the annotations, etc.

So far I'm questioning:

  • Is my dataset simply too small for the complexity a Resnet101 backbone?
  • Maybe something is up with my annotations?
  • I'm screwing up a fundamental aspect of my config
  • Unknown unknowns

Checking out the losses, what obviously stands out is the high overall loss (epoch_loss) which increases with each training iteration (just heads -> resnet +4 -> all layers):

My config:

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     8
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.5
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 8
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  128
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  128
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              crop
IMAGE_SHAPE                    [128 128   3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               101
MEAN_PIXEL                     [107.  105.2 101.5]
MINI_MASK_SHAPE                (56, 56)
NAME                           tree
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 1.5]
RPN_ANCHOR_SCALES              (16, 32, 64, 128)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.9
RPN_TRAIN_ANCHORS_PER_IMAGE    64
STEPS_PER_EPOCH                500
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  False
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.005
0

There are 0 best solutions below