Feature Selection using RFE Not Returning Correct Number of Features

162 Views Asked by At

I am creating a logistic regression model using python in jupyter notebook. I have reached the stage of doing Feature Selection using RFE. I specify 15 features, however, the code is outputting most/all of my features, not just 15

The code I am running is as follows:

# Starting with 15 features selected by RFE

logreg = LogisticRegression()
rfe = RFE(estimator=logreg, n_features_to_select=15)  
rfe = rfe.fit(X_train, y_train)

list(zip(X_train.columns, rfe.support_, rfe.ranking_))

X_train had all the independent variables, and y_train had the dependent variable. In total, I have around 100 variables

I am not sure if I have typed something incorrectly, or if I am reading my output incorrectly

1

There are 1 best solutions below

0
seralouk On

If you print sum(rfe.support_) this will be 15 in your case with a True flag for the top 15 features.

To reduce and only keep those, use

X_train_selected = rfe.transform(X_train)