Logistic Regression Deviance Variance Across Numerical and Categorical Variables

55 Views Asked by At

I fitted a Logistic Regression model for a Customer Churn dataset with the following results

Logit Results

I tested this model with a validation set and calculated the ROC AUC score, which was approximately 0.85 – quite good. However, I still need to look into the deviance, which was calculated in the following way:

deviance_residuals = -2 * ((y_test * np.log(y_pred)) + ((1 - y_test) * np.log(1 - y_pred)))

Now, looking at the deviance I noticed some weird patterns:

As shown in the scatterplot below, I could almost fit a linear regression between 'Tenure' for churners and its deviance. Tenure vs Deviance scatterplot This makes me suspect that I might be missing something. I attempted applying log transformations to the tenure variable, but it didn't improve the situation.

Something similar happens with some categorical variables. As show in the image below, the two year contract churners have way more deviance than one year and month-to-month churners, so the model is doing wrong in this particular observations.

Contract Type vs Deviance boxplot

How can I solve it?

0

There are 0 best solutions below