I am trying to compute a confusion matrix for my object detection model. However, I seem to stumble across some pitfalls. My current approach is to compare each predicted box with each ground truth box. If they have an IoU > some threshold, I insert the predictions into the confusion matrix. After the insertion, I delete the element in the predictions list and move on to the next element.
Because I also want the misclassified proposals to be inserted in the confusion matrix, I treat the elements with IoU lower than the threshold as confusion with the background. My current implementation looks like this:
def insert_into_conf_m(true_labels, predicted_labels, true_boxes, predicted_boxes):
matched_gts = []
for i in range(len(true_labels)):
j = 0
while len(predicted_labels) != 0:
if j >= len(predicted_boxes):
break
if bb_intersection_over_union(true_boxes[i], predicted_boxes[j]) >= 0.7:
conf_m[true_labels[i]][predicted_labels[j]] += 1
del predicted_boxes[j]
del predicted_labels[j]
else:
j += 1
matched_gts.append(true_labels[i])
if len(predicted_labels) == 0:
break
# if there are ground-truth boxes that are not matched by any proposal
# they are treated as if the model classified them as background
if len(true_labels) > len(matched_gts):
true_labels = [i for i in true_labels if not i in matched_gts or matched_gts.remove(i)]
for i in range(len(true_labels)):
conf_m[true_labels[i]][0] += 1
# all detections that have no IoU with any groundtruth box are treated
# as if the groundtruth label for this region was Background (0)
if len(predicted_labels) != 0:
for j in range(len(predicted_labels)):
conf_m[0][predicted_labels[j]] += 1
The row-normalized matrix looks like this:
[0.0, 0.36, 0.34, 0.30]
[0.0, 0.29, 0.30, 0.41]
[0.0, 0.20, 0.47, 0.33]
[0.0, 0.23, 0.19, 0.58]
Is there a better way to generate the confusion matrix for an object detection system? Or any other metric that is more suitable?
Here is a script to compute the confusion matrix from the detections.record file generated by the TensorFlow Object Detection API. Here is the article explaining how this script works.
In summary, here is the outline of the algorithm from the article:
You can also take a look at the script for more information.