XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

188 Views Asked by At

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -

Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?

The code has structure similar to this:

train = module.trainloader
test = module.valloader

# Train on one minibatch to get started 
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())

params = {
    'learning_rate': 0.007,
    'updater':'refresh',
    'process_type': 'update',
}

# Get initial model training 
model = xgb.train(params, dtrain=X)

for i, (trainsample, valsample) in enumerate(zip(train, test)):
    X_train, y_train = trainsample
    X_test, y_test = valsample
    
    X_train = xgb.DMatrix(X_train, labels=y_train)
    
    X_test = xgb.DMatrix(X_test)

    model = xgb.train(params, dtrain=X_train, xgb_model=model)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    print(accuracy)
0

There are 0 best solutions below