I'm performing a binary image segmentation task using Fully convolutional networks on some medical data. To compare the ground-truth with my predictions and measure performance I use the dice coefficient. I selected a testing image that contained no true positives to test the propensity of the model to predict false positives. In this particular prediction, there is a small number of false positive pixels, since there are no true positives in the image/groundtruth-annotation we get an extremely small dice coefficient of 0.0001. This makes sense from a mathematical context given the following definition of the dice coefficient.

D = 2TP/(2TP+FP+FN)

However, I find this slightly counterintuitive that the majority of the pixels in the mask are predicted correctly as true negative but the overall prediction mask is scored very badly due to the small number of false positives. Is there a way around this that makes sense like weighting the score by pixel frequencies in each image we test on? or another measure that considers true negatives?

Why does the dice score not consider true negatives?

0

There are 0 best solutions below