I have a dataset for binary classification, to make things basic, I only use one feature.
- I made sure that feature is unique per sample (no sample has the same value for this feature, therefore it is possible to build a tree that classifies each sample to the correct label on the train set)
- I set the regularizations to zero: 'lambda_l1':0, 'lambda_l2':0.1, 'min_gain_to_split':0
- I have 267 samples (Number of positive: 98, number of negative: 169)
- I set the tree size to be much bigger: 'num_leaves': 8, 'max_depth': 20, 'max_bin':500
- I allow planty of time to train: num_boost_round=5000
But I still get AUC 0.869 on the Train set. What am I missing?
I tried to play with all available parameters, and I can't get the Train set AUC to 1.0
My full params Dictionary is:
params = { 'boosting': 'gbdt', 'objective': 'binary', 'metric': 'AUC', 'num_leaves': 8, 'max_depth': 20, 'learning_rate': 0.001, 'feature_fraction': 1, 'bagging_fraction': 1, 'bagging_freq': 0, 'verbose': 1, 'is_unbalance':'true', 'max_bin':500, 'min_data_in_leaf':1, 'lambda_l1':0, 'lambda_l2':0.1, 'min_gain_to_split':0 }
EDIT: with 3 features I am able to overfit to the train set. However, that doesn't change the original question. The decision tree should have overfitted on a single feature easily.
I was able to achieve 77% on yelp data predicting popular or non popular binary. use GridSearchCV to find the best parameters: {'learning_rate': 0.1, 'max_depth': 15, 'n_estimators': 100, 'num_leaves': 500}