How to correctly use model explainer with unseen data?

Question

How to correctly use model explainer with unseen data?

165 Views Asked by Alex Nikitin At 18 August 2025 at 00:12

I trained my classifier using a pipeline:

param_tuning = {

        'classifier__learning_rate': [0.01, 0.1],
        'classifier__max_depth': [3, 5, 7, 10],
        'classifier__min_child_weight': [1, 3, 5],
        'classifier__subsample': [0.5, 0.7],
        'classifier__n_estimators' : [100, 200, 500],
    }

cat_pipe = Pipeline(
    [
        ('selector', ColumnSelector(categorical_features)),
        ('encoder', ce.one_hot.OneHotEncoder())
    ]
)

num_pipe = Pipeline(
    [
        ('selector', ColumnSelector(numeric_features)),
        ('scaler', StandardScaler())
    ]
)

preprocessor = FeatureUnion(
    transformer_list=[

        ('cat', cat_pipe),
        ('num', num_pipe)
    ]
)

xgb_pipe = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('classifier', xgb.XGBClassifier())
    ]
)

grid = GridSearchCV(xgb_pipe, param_tuning, cv=5, n_jobs=-1, scoring='accuracy')

xgb_model = grid.fit(X_train, y_train)

The training data have categorical data, so the transformed data shape is (x , 100 ). After that, i try to explain model prediction on unseen data. Since i pass single unseen example directly to model, it preprocessed it in shape (x, 15) (because single observation does not have all examples all categorical data).

eli5.show_prediction(xgb['classifier'], xgb['preprocessor'].fit_transform(df), columns = xgb['classifier'].get_booster().feature_names))

And i got

ValueError: Shape of passed values is (1, 15), indices imply (1, 100).

This occurs because model was trained on whole preprocessed dataset with shape (x, 100), but i pass to explainer single observation with shape (1,15). How do i correctly pass unseen single observation to explainer?

Original Q&A

There are 1 best solutions below

**desertnaut** · Accepted Answer

We never use .fit_transform() on unseen data; the correct way is to use the .transform() method of the pre-processor already fitted with your training data (here xgb['preprocessor']). That way, we ensure that the (transformed) unseen data have the same features with our (transformed) training ones, and so they are compatible with the model built with the latter.

So, you should replace .fit_transform(df) here:

eli5.show_prediction(xgb['classifier'], xgb['preprocessor'].fit_transform(df), columns = xgb['classifier'].get_booster().feature_names))

with .transform(df):

eli5.show_prediction(xgb['classifier'], xgb['preprocessor'].transform(df), columns = xgb['classifier'].get_booster().feature_names))

How to correctly use model explainer with unseen data?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in ELI5

Trending Questions

Popular # Hahtags

Popular Questions