why is perfcurve() matlab function giving me straight lines and not a normal curve as expected?

841 Views Asked by At

I am trying to build a receiver operating characteristic (ROC) curves to evaluate the discriminating ability of my classifier to correctly classify diseased and non-diseased subjects.

I understand that the closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. My experiments gave me quite desirable value of area under curve (auc), i.e. 0.86458. However, the ROC curve (in which I included the cut-off points for tracing purposes) seems quite strange as it gave me straight lines as below:

enter image description here

... and not a curve I expected and as I normally see from any references like this:

enter image description here

Does it hav something to do with the number of observations used? (in this case I only have 50 samples). Or is this just fine as long as the the auc value is high and that the 'curve' comes above the 45-degree diagonal of the ROC space? I would be glad if someone can share their thoughts about it. Thank you!

By the way, I used the perfcurve() function in matlab:

% ROC comparison between the proposed approach and the baseline
[X1,Y1,T1,auc1,OPTROCPT1,SUBY5,SUBYNAMES1] = perfcurve(testLabel,predlabel_prop,1); 
[X2,Y2,T2,auc2,OPTROCPT2,SUBY2,SUBYNAMES2] = perfcurve(testLabel,predLabel_base,1);

figure;
plot(X1,Y1,'-r*',X2,Y2,'--ko');
legend('proposed approach','baseline','Location','east');
xlabel('False positive rate'); ylabel('True positive rate')
title('ROC comparison of the proposed approach and the baseline')

text(0.6,0.3,{'* - proposed method',strcat('Area Under Curve = ',...
    num2str(auc1))},'EdgeColor','r');

text(0.6,0.15,{'o - baseline',strcat('Area Under Curve = ',num2str(auc2))},'EdgeColor','k');
2

There are 2 best solutions below

0
On

Shape of your curve is just a result of high explanatory power of your model and limited number of observations (e.g. take a look at the example here http://nl.mathworks.com/help/stats/perfcurve.html).

0
On

You probably have too litte data.

You curve indicates your data set has 13 negative and 5 positive examples (in your test set?)

Furthermore, all but 4 have exactly the same score (maybe 0)? Or is that your cutoff?

Given this small sample size, I would not accept the hypothesis that your proposed method is better than the baseline, but accept the alternative - the methods perform as good as the other: the difference of 0.04 is much too small for this tiny sample size, the results are virtually identical. Any variation within the cut-off area (the diagonal part) can be much larger than this 0.04... On a different run, a different test set, the results may be the other way around.