I want to create a model to predict customer departure (churn) based on monthly time-series data. My data does not contain direct labels indicating churn, so I am considering using revenue (income) as an indicator - if revenue is zero for 3 consecutive months, you assume the customer has left.
Date | Revenue | Churned |
---|---|---|
Jan_2019 | 20 | 1 |
Feb_2019 | 0 | no prediction |
Mar_2019 | 0 | no prediction |
Apr_2019 | 0 | no prediction |
The main challenge is that I want the model to learn to predict churn based on one month's data, with no insight into future months. I am concerned that the model may learn a simple pattern: if revenue is zero in future months (3 months), the customer has left. I would like the model to focus on data from a given month, without "seeing" the future.
What model should I choose? What will be the best? How can I avoid data leakage?