In sklearn, does a fitted pipeline reapply every transform?

456 Views Asked by konel At 27 July 2025 at 13:26

Apologies if this is obvious but I couldn't find a clear answer to this:

Say I've used a pretty typical pipeline:

feat_sel = RandomizedLogisticRegression()
clf = RandomForestClassifier()
pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()),
            ('feature_selection', feat_sel),
            ('classification', clf)])
pl.fit(X,y)

Now when I apply pl on a new set,

pl.predict(X_classify);

is RandomizedLogisticRegression going to be reapplied or are the columns that were selected in training going to be used in the new data? If not is there a way for pipeline to differentiate between feature selectors and feature extractors/scalers/other transforms that should be applied on the new input? Until I'm sure, I'm skipping the pipeline feature and just doing each step manually and maintaning state.

Thanks!

Original Q&A

There are 1 best solutions below

Andreas Mueller On 22 June 2015 at 14:30 BEST ANSWER

The pipeline calls transform on the preprocessing and feature selection steps if you call pl.predict. That means that the features selected in training will be selected from the test data (the only thing that makes sense here).

It is unclear what you mean by "apply" here. Nothing new will be learned when calling "predict", but all steps will be used with "transform".

In sklearn, does a fitted pipeline reapply every transform?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in PIPELINE

Related Questions in FEATURE-SELECTION

Trending Questions

Popular # Hahtags

Popular Questions