XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

182 Views Asked by Julian L At 17 August 2025 at 05:36

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -

Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?

The code has structure similar to this:

train = module.trainloader
test = module.valloader

# Train on one minibatch to get started 
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())

params = {
    'learning_rate': 0.007,
    'updater':'refresh',
    'process_type': 'update',
}

# Get initial model training 
model = xgb.train(params, dtrain=X)

for i, (trainsample, valsample) in enumerate(zip(train, test)):
    X_train, y_train = trainsample
    X_test, y_test = valsample
    
    X_train = xgb.DMatrix(X_train, labels=y_train)
    
    X_test = xgb.DMatrix(X_test)

    model = xgb.train(params, dtrain=X_train, xgb_model=model)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    print(accuracy)

Original Q&A

XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

There are 0 best solutions below

Related Questions in XGBOOST

Related Questions in DECISION-TREE

Related Questions in BOOSTING

Trending Questions

Popular # Hahtags

Popular Questions