I trained an XGBoost classifier model using Grid-Search with the below params:
params = {
'max_depth':[5,6],
'min_child_weight': [1, 2, 3],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0]
}
xgb = XGBClassifier(device="cuda",learning_rate=0.02, n_estimators=1000, objective='binary:logistic', verbosity=0, tree_method="gpu_hist")
skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)
grid_search = GridSearchCV(estimator=xgb, param_grid=params, scoring='roc_auc', n_jobs=-1, cv=skf.split(x_train,y_train), verbose=100, return_train_score=True)
grid_search.fit(x_train, y_train)
And then I saved the best model as below:
from joblib import dump
joblib.dump(grid_search.best_estimator_, 'xgboost_grid_search.joblib')
When I load the model again, the predict_proba gives different results, this is how I load the model to get predictions:
import joblib
model = joblib.load("xgboost_grid_search.joblib")
model.predict_proba(x_test)
Here x_train and x_test contain numerical features. y_train and y_test are categorical values (either 0 or 1)
By reading through quite a few blogs, articles, stack-overflow answers, I have made sure the below conditions are met in both environments:
1. Correct python version - 3.11.5
2. Same/consistent joblib and xgboost pip versions - 1.2.0 and 2.0.0 respectively
3. Correct ordering of features in x_test as x_train and model.feature_names_in_
However, I am pointing out that the OS of these two environments is different: Mac M1 and Ubuntu (not sure if this is an issue).
Any help is appreciated and please let me know if I am doing something wrong.
Thanks in advance!
To reproduce the results you need to also set the random seed i.e. the
random_state
input argument inxgboost.XGBClassifier