Partial dependence plot - model developed using scaled data, how to unscale for PDP?

41 Views Asked by At

I have made a Random Forest Classifier model in Python, and now want to make partial dependence plot (PDP). I used scaled data for training and testing the model, and make the PDP like this: PartialDependenceDisplay.from_estimator(best_clf, X_test_final, best_features). However, the x-axis values are scaled which limits interpretability.

Unscaling the data X_test_final before calling the PartialDependenceDisplay does not work, any suggestions on how I can change the x-axis values from scaled to unscaled? I have scaled my data using StandardScaler().

1

There are 1 best solutions below

0
Téo On

Unscaling standardised data is trivial. To standardise data you do: X' = (X - mean(X)) / std(X) so to unscale it, you just do X = (X' * std(X)) + mean(X).

If you want to just change the tick labels so that you can interpret the results in the original scale of the data, then you just need to do something like:

# Get the tick positions on the current axis
x_ticks = ax.get_xticks()

# Un-standardise the tick values
xt_unscaled = [(xt * x.std()) + x.mean() for xt in x_ticks]

# Format the ticks to strings (here to 1 d.p.)
xt_unscaled = [f'{xt:.1f}' for xt in xt_unscaled]

# Assign the unscaled tick values to the tick labels
# This retains the original tick positions etc, but
# lets you interpret them the way you want.
ax.set_xticklabels(xt_unscaled)

plt.show()