in docs there is a EarlyStopping for XGBClassifier:

```
es = xgboost.callback.EarlyStopping(
    rounds=2,
    min_delta=1e-3,
    save_best=True,
    maximize=False,
    data_name="validation_0",
    metric_name="mlogloss",
    )
clf = xgboost.XGBClassifier(tree_method="hist", device="cuda", callbacks=[es])

X, y = load_digits(return_X_y=True)
clf.fit(X, y, eval_set=[(X, y)])```

but how "validation_0" refers to eval_set in clf.fit to let the EarlyStopping metric evaluate?

and I tried to apply it to XGBRegressor:

`import xgboost as xgb
from sklearn.model_selection import cross_val_predict, KFold
import pandas as pd
import numpy as np

class CustomEarlyStopping(xgb.callback.EarlyStopping):
    def __init__(self, rounds=2, min_delta=1e-3, save_best=True, maximize=False, data_name="validation_0", metric_name="rmse"):
        super().__init__(rounds=rounds, min_delta=min_delta, save_best=save_best, maximize=maximize, data_name=data_name, metric_name=metric_name)
    
# TRAIN MODEL (10x10-fold CV)
cvx = KFold(n_splits=10, shuffle=True, random_state=239)
es = CustomEarlyStopping()

model = xgb.XGBRegressor(colsample_bytree = 0.3, learning_rate = 0.1, max_depth = 10, alpha = 10, n_estimators = 500, n_jobs=-1, 
                     random_state=239,callbacks=[es])
model.set_params(tree_method='approx', device="cpu")

cv_preds = []
for i in range(0,10):
    cv_preds.append(cross_val_predict(model, np.asarray(X_train), np.asarray(y_train), cv=cvx, method='predict', n_jobs=1, verbose=2))`

I put data_name="validation_0" in EarlyStopping __init__ without naming test set in each cv fold. what is wrong with behavior of this code? thanks.

code of XGBRegressor returned this error:

ValueError: Must have at least 1 validation dataset for early stopping.

what should happen is cv_preds get filled with 10 ndarray of predicted y.

0

There are 0 best solutions below