What to do when one feature has very large importance/weight?

41 Views Asked by At

I am new to Data Science and currently am trying to predict customers churn for a company that offers of subscription-based bookings management software. Its customers are gyms. I have a small unbalanced dataset of a historical data (False 670, True 230) with 2 numerical predictors: age(days since subscription), number of active days in the last month(days on which a customer(gym) had bookings) and 1 categorical: logo (boolean, if a customers uploaded a logo in a software).

Predictors have following negative correlations with churn :

  • logo: 0.65
  • num_active_days_last_month: 0.40
  • age: 0.3

Feature importances look similar with Logo having the most weight.

When I predict, the model (logistic regression) classifies customers without logo as churners, even thought they are quite active.

For example the following two customers have almost the same probability to churn:

Customer 1:

  • logo: True
  • num_active_days_last_month: 1
  • age:30 days

Customer 2:

  • logo: False
  • num_active_days_last_month: 22
  • age: 250 days

I understand that this is what model learned from the dataset, but it just doesn’t make sense in my mind to have such strong importance assigned to something like Logo. Is there any way I can avoid completely excluding Logo from the predictors? maybe somehow decrease its importance?

Thank you in advance for any help/ suggestions i can get.

0

There are 0 best solutions below