xgb.cv
and sklearn.model_selection.cross_validate
do not produce the same mean train/test error even though I set the same seed/random_state and I make sure both methods use the same folds. The code at the bottom allows to reproduce my issue. (Early stopping is off by default).
I found out this issue is caused by the subsample
parameter (both methods produce the same result if this parameter is set to 1) but I cannot find a way to make both methods subsample in the same way. In addition to setting seed/random_state as shown in the code at the bottom, I also tried explicitly adding:
import random
random.seed(1)
np.random.seed(1)
at the beginning of my file but this does not resolve my issue either. Any ideas?
import numpy as np
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import cross_validate, StratifiedKFold
X = np.random.randn(100,20)
y = np.random.randint(0,2,100)
dtrain = xgb.DMatrix(X, label=y)
params = {'eta':0.3,
'max_depth': 4,
'gamma':0.1,
'silent': 1,
'objective': 'binary:logistic',
'seed': 1,
'subsample': 0.8
}
cv_results = xgb.cv(params, dtrain, num_boost_round=99, seed=1,
folds=StratifiedKFold(5, shuffle=False, random_state=1),
early_stopping_rounds=10)
print(cv_results, '\n')
xgbc = XGBClassifier(learning_rate=0.3,
max_depth=4,
gamma=0.1,
silent = 1,
objective = 'binary:logistic',
subsample = 0.8,
random_state = 1,
n_estimators=len(cv_results))
scores = cross_validate(xgbc, X, y,
cv=StratifiedKFold(5, shuffle=False, random_state=1),
return_train_score=True)
print('train-error-mean = {} test-error-mean = {}'.format(
1-scores['train_score'].mean(), 1-scores['test_score'].mean()))
Output:
train-error-mean train-error-std test-error-mean test-error-std
0 0.214981 0.030880 0.519173 0.129533
1 0.140039 0.018552 0.549549 0.034696
2 0.105100 0.017420 0.510501 0.040517
3 0.092474 0.012587 0.450977 0.075866
train-error-mean = 0.06994061572120636 test-error-mean = 0.4706015037593986
Output in case subsample is set to 1:
train-error-mean train-error-std test-error-mean test-error-std
0 0.180043 0.013266 0.491504 0.093246
1 0.117381 0.021328 0.488070 0.097733
2 0.074972 0.030605 0.530075 0.091446
3 0.044907 0.032232 0.519073 0.130802
4 0.032438 0.021816 0.481027 0.080622
train-error-mean = 0.032438271604938285 test-error-mean = 0.4810275689223057
I know for sure in the case of LGBM, but from the quick code at the XGB code (here) it seems to have a similar behaviour, so I assume the answer is relevant.
The trick is in the early stopping. The native
xgb.cv
defines a single iteration for which the mean CV score (or something similar to the mean, i forgot by now :) ) reaches plateau, while in sklearn cross validation models in each fold are trained independently and thus early stopping happens on different iterations for different folds.So, if you want to get identical results- disable early stopping (which is problematic, as you can over- or under-fit and you are not aware of it). If you want to use early stopping- there is no way to get identical results due to the difference in implementations