TPOT best pipeline has no predict_proba() - how to prevent falling over?

16 Views Asked by At

I am running 5-fold X-validation on a dataset using tpot from a Jupyter notebook:

scores = []
preds = []
actual_labels = []
# Initialise the 5-fold cross-validation
kf = StratifiedKFold(n_splits=5,shuffle=True)
for train_index, test_index in kf.split(X, y):
    # Generate the training and test partitions of X and Y for each iteration of CV 
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # TPOT is a AutoML system, that will automatically search for the best pipeline for the task
    estimator = TPOTClassifier(generations=5, population_size=50, cv=5, random_state=42, verbosity=2, n_jobs=10)
 
    #As TPOT is a AutoML system, it does its own process of tuning rather than using grid search
    estimator.fit(X_train, y_train)

    # Predicting the test data with the optimised models
    predictions = estimator.predict(X_test)
    score = metrics.f1_score(y_test, predictions)
    scores.append(score)

    # Extract the probabilities of predicting the 2nd class, which will use to generate the PR curve
    probs = estimator.predict_proba(X_test)[:,1]
    preds.extend(probs)
    actual_labels.extend(y_test)

In one of the 5 runs the best pipeline is:

Best pipeline: SGDClassifier(ZeroCount(input_matrix), alpha=0.001, eta0=0.01, fit_intercept=False, l1_ratio=0.0, learning_rate=invscaling, loss=squared_hinge, penalty=elasticnet, power_t=1.0)

Because the loss is 'squared hinge', it has no predict_proba() attribute and the whole process falls over. If I was to build the classifier by hand, I understand that I'd need to change the loss to e.g. 'modified_huber', but how can I prevent tpot from falling over because of this? enter image description here

0

There are 0 best solutions below