I am seeking some clarity surrounding the number associated with selector.grid_scores_
in RFECV.
I have used the following:
from sklearn.feature_selection import RFECV
estimator_RFECV = ExtraTreesClassifier(random_state=0)
estimator_RFECV = RFECV(estimator_RFECV, min_features_to_select = 20, step=1, cv=5, scoring='accuracy', verbose=1, n_jobs=-1)
estimator_RFECV = estimator_RFECV.fit(X_train, y_train)
Using estimator_RFECV.ranking_
, 27 features are selected through CV, however, when I look at estimator_RFECV.grid_scores_
, at 27, the value here (accuracy) is not the highest. Am I interpreting the grid_scores_
incorrect and I should not expect 27 to have the highest accuracy?
So estimator_RFECV.ranking_ will give us ranking of features or we can say respective importance of feature.
And yes, it's always possible that model with lesser number of feature can have higher accuracy, because some features which we may have considered that were irrelevant.
Also, the RFECV documentation link from the official documentation could be helpful.