I am creating a logistic regression model using python in jupyter notebook. I have reached the stage of doing Feature Selection using RFE. I specify 15 features, however, the code is outputting most/all of my features, not just 15
The code I am running is as follows:
# Starting with 15 features selected by RFE
logreg = LogisticRegression()
rfe = RFE(estimator=logreg, n_features_to_select=15)
rfe = rfe.fit(X_train, y_train)
list(zip(X_train.columns, rfe.support_, rfe.ranking_))
X_train had all the independent variables, and y_train had the dependent variable. In total, I have around 100 variables
I am not sure if I have typed something incorrectly, or if I am reading my output incorrectly
If you print
sum(rfe.support_)this will be 15 in your case with a True flag for the top 15 features.To reduce and only keep those, use