I am working on a customer churn model and I have a few doubts to confirm if there is a target leak or not due to certain features. Please, find the details below:
Customer base: Live as of Dec'22 Response (Target) definition: Customers dropping their bank balances by 60% or more in the quarter: Feb Mar Apr (FMA'23) as compared to the previous year same quarter: Feb Mar Apr (FMA'22)
For example; If a customer's AQB (average quarterly balance) in Feb Mar Apr (FMA'22) was Rs. 1,00,000, which dropped to Rs. 40,000 in Feb Mar Apr (FMA'23), then the customer will be our target (Target = 1)
Problem Statement:
While fitting a model for prediction, should I use AQB of FMA'22 as a feature or will that cause target leak (since, it is being used in the calculation of target).
I fitted a model in both situations:
- Using FMA'22 AQB (Recall: 90%)
- Not using FMA'22 AQB (Recall: 78%)
There is a major drop of 12% in the model performance if I am not using the PRE-AQB (FMA'22) in the input features.
Request your help to understand if this is a case of target leak or not?
I fitted a model in both situations:
- Using FMA'22 AQB (Recall: 90%)
- Not using FMA'22 AQB (Recall: 78%)
There is a major drop of 12% in the model performance if I am not using the PRE-AQB (FMA'22) in the input features.
Request your help to understand if this is a case of target leak or not?
Have you tried computing the heat map for feature correlations using chi -squared and then with cramer's V. Please go through it and try training after dropping some of the highly correlated features (>.30).