Currently I am trying to oversample with SMOTE and then run my XGBClassifier in the Pipeline. For some reason I cannot get HyperOpt to play nice with the Pipeline.
The two below examples both run properly:
smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])
cv = StratifiedKFold(n_splits = 5)
score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()
print(score)
model = XGBClassifier(random_state = 42)
def objective_pipe(params):
model.set_params(**params)
cv = StratifiedKFold(n_splits = 5)
score = cross_val_score(model, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()
return {'loss': -score, 'params':params, 'status':STATUS_OK}
trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))
However the moment I put the Pipeline inside the objective function I end up getting NaN values for the score.
smote = SMOTE(random_state = 42)
model = XGBClassifier(random_state = 42)
pipe = Pipeline([('smote', smote),
('model',model)])
def objective_pipe(params):
pipe.set_params(**params)
cv = StratifiedKFold(n_splits = 5)
score = cross_val_score(pipe, X_train, y_train, cv=cv, scoring='roc_auc', n_jobs=-1).mean()
return {'loss': -score, 'params':params, 'status':STATUS_OK}
trials = Trials()
best = fmin(fn=objective_pipe, space = params, algo=tpe.suggest, max_evals = 10, trials = trials, rstate=np.random.RandomState(42))
Maybe I am just missing something really simple, but not really sure how to get by this issue. Any suggestions/help/resources are welcome.
I'm not exactly sure why but I had a similar issue and it went away by setting njobs=1. I think it has to do with the SMOTE's inability to run in a parallel fashion.