LGBMClassifier + Unbalanced data + GridSearchCV()

1.3k Views Asked by At

The dependent variable is binary, the unbalanced data is 1:10, the dataset has 70k rows, the scoring is the roc curve, and I'm trying to use LGBM + GridSearchCV to get a model. However, I'm struggling with the parameters as sometimes it doesn't recognize them even when I use the parameters as the documentation shows:

params = {'num_leaves': [10, 12, 14, 16],
          'max_depth': [4, 5, 6, 8, 10],
          'n_estimators': [50, 60, 70, 80],
          'is_unbalance': [True]} 

best_classifier = GridSearchCV(LGBMClassifier(), params, cv=3, scoring="roc_auc")
best_classifier.fit(X_train, y_train)

So:

  • What is the difference between putting the parameters in the GridsearchCV() and params?
  • As it's unbalanced data, I'm trying to use the roc_curve as the scoring metric as it's a metric that considers the unbalanced data. Should I use the argument scoring="roc_auc" put it in the params argument?
1

There are 1 best solutions below

0
On

The difference between putting the parameters in GridsearchCV()or params is mentioned in the docs of GridSearch:

When you put it in params:

Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

And yes you can put the scoring also in the params.