Imblearn Pipeline and HyperOpt Issue

444 Views Asked by At

Currently I am trying to oversample with SMOTE and then run my XGBClassifier in the Pipeline. For some reason I cannot get HyperOpt to play nice with the Pipeline.

The two below examples both run properly:

smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])

cv = StratifiedKFold(n_splits = 5)

score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

print(score)
model = XGBClassifier(random_state = 42)

def objective_pipe(params):
  model.set_params(**params)

  cv = StratifiedKFold(n_splits = 5)

  score = cross_val_score(model, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

  return {'loss': -score, 'params':params, 'status':STATUS_OK}

trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))

However the moment I put the Pipeline inside the objective function I end up getting NaN values for the score.

smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])

def objective_pipe(params):
  pipe.set_params(**params)

  cv = StratifiedKFold(n_splits = 5)

  score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()

  return {'loss': -score, 'params':params, 'status':STATUS_OK}

trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))

Maybe I am just missing something really simple, but not really sure how to get by this issue. Any suggestions/help/resources are welcome.

1

There are 1 best solutions below

0
On

I'm not exactly sure why but I had a similar issue and it went away by setting njobs=1. I think it has to do with the SMOTE's inability to run in a parallel fashion.