I think I am making myself really confused with the generation of precision and recall curves. Ultimately the purpose is to get an idea of the quality of my detection network, taking into account false positives and false negatives. My detection dataset only detects one instance of one class (cat) in each frame, so it is a very simplified situation. So in each frame there is either exactly one cat or exactly no cats. So for each frame I compute the IoU (Intersection over Union) of the GT (ground-truth) and ES (estimated) bounding box. I also count all the times that there was a detection in frame, but no accompanying ground truth. These are then my false negatives (fn
). I then go over a range of thresholds and for each threshold count the number of IoUs that are over and below, and these are my tp
(true positive) and fp
(false positive) measurements respectively. Precision and recall are then: precision = tp / (tp + fp)
and recall = tp / (tp + fn)
.
Now the problem with this formulation is that precision and recall both get smaller as the thresholds get larger, when actually you would expect recall to drop. So I don't know where I am going wrong.
To make this less confusing, here is a toy example with synthetic IoU in python:
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data
# Suppose there are 1000 frames in the dataset
num_samples = 1000
# The false negative count is the sum of the number of frames where we did have a GT box, but the estimate was empty
num_false_negative = 300
# The total number of false positive + false negative samples
num_pos = num_samples - num_false_negative
# Some fake samples
ious = np.clip(0, 1, np.random.normal(0.6, 0.3, num_pos))
thresholds = np.linspace(0, 1, 10)
# Compute P/R for each threshold, by comparing the iou to the threshold
precisions, recalls = [], []
for thresh in thresholds:
tp = np.sum(ious >= thresh)
fp = np.sum(ious < thresh)
# precision is 'proportion of true positives over all positives'
precision = tp / (tp + fp)
# recall is 'proportion of true positives over all detections'
recall = tp / (tp + num_false_negative)
# but notice, that in my formulation, the recall will increase as the precision increases. Which means I am wrong!
precisions.append(precision)
recalls.append(recall)
plt.plot(recalls, precisions)
plt.show()
The output of this code looks something like this:
Compared to the closed shape of a 'normal' precision/recall curve:
I'm fairly sure the problem has to do with how I count false-negatives.