As suggested in many other posts e.g.,
there are ways of extracting relevant feature names. However, How do I make sure that feature names align/are in the same order as the model.coef_?
The structure I have is like:
num1_pre = Pipeline([ ... ])
num2_pre = Pipeline([ ... ])
cat1_pre = Pipeline([("cat_encode", OneHotEncoder())])
cat2_pre = Pipeline([("cat_encode", OrdinalEncoder()) ])
preprocessor = ColumnTransformer([
("num1_pre", num1_pre, num_col1),
("num2_pre", num2_pre, num_col2),
("cat1_pre", cat1_pre, cat_col1),
("cat2_pre", cat2_pre, cat_col2)
])
pipeline = Pipeline(steps=[
("preprocessor", preprocessor),
("poly_features", PolynomialFeatures(interaction_only=True, include_bias=False)),
("scaler", StandardScaler()),
("model", LinearRegression())
])
My first attempt is to make use of the get_feature_names_out(). I have used CV do obtain a good model:
model = grid_search.best_estimator_
model[:-2].get_feature_names_out()
returns input_features is not equal to feature_names_in_