I'm having a difficulty in understanding the given RFECV example in current documentation. In the plot it's been written as "nb of misclassifications", so i expect it to be "lower the better". But in the example plot the best has been chosen as the highest cross-validation score. So i naturally expect it to be something related to accuracy (scoring says accuracy in the code anyways). But then how it becomes higher than 1?
I am a bit confused on how to interpret these results. I would appreciate any help on this.
Thanks!
RFECV has a useful
verbose
option. Running withverbose=2
, you can see, that for a 2-fold cross-value check, as in example,grid_scores_
holds sum of both folds scores.In general, for a n-fold check,
grid_scores_
is sum of folds scores divided byn-1
, see in code. It seems to be a bug; see somewhat relevant issue on the tracker.