XGBClassifier enable_categorical parameter does not seem to be working

25 Views Asked by At

I was under the impression that the enable_categorical parameter allows me to not do any manual label encoding. The error I am getting seems to contradict that? (I think)

The error that this code results in seems to be triggered by the calling the "fit" method on my "reg" object. Here is the error:

ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got ['Not Approved' 'Resolved-Approved' 'Resolved-Cancelled' 'Resolved-Not Approved' 'Resolved-Partially Approved' 'Resolved-Withdrawn']


FEATURES = ['Type', 'DivisionName', 'DepartmentName', 'WarehouseName', 'CategoryDesc']
TARGET = 'ClaimStatus'


X_train = train[FEATURES].astype('category')
y_train = train[TARGET].astype('category')


X_test = test[FEATURES].astype('category')
y_test = test[TARGET].astype('category')


reg = xgb.XGBClassifier(base_score=0.5, booster='gbtree',
                        n_estimators=1000,
                        early_stopping_rounds=50,
                        enable_categorical=True,
                        max_depth=5,
                        learning_rate=0.01)



reg.fit(X_train, y_train,
         eval_set=[(X_train, y_train), (X_test, y_test)],
         verbose=100)

1

There are 1 best solutions below

0
Ben Reiniger On

enable_categorical doesn't affect the target type; it's for performing bipartition splits of categorical features:
https://xgboost.readthedocs.io/en/release_2.0.0/tutorials/categorical.html

You may use sklearn's LabelEncoder to encode the target; XGBClassifier, being specifically a classifier, will treat the resulting integers just as class labels. (In a little surprised it's needed though; all sklearn classifiers handle that internally. I thought I remembered some extra parameter to use/not a label encoder, but I can't find it now...)