Define kernel in scikit GaussianProcessRegressor using BayesSearchCV

423 Views Asked by At

Question: How do I define the kernel of a Gaussian Process Regressor using BayesSearchCV?

I'm trying to optimize hyperparameters in a gaussian process model using BayesSearchCV from skopt. It seems that I'm defining the kernel wrong and get a 'TypeError':

TypeError: Cannot clone object ''rbf'' (type <class 'str'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

Dummy-Code:

from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern

X,y = make_regression(100,10)

estimator = GaussianProcessRegressor()

param = {
    'kernel': ['rbf','matern'],
    'n_restarts_optimizer': (5,10),
    'alpha': (1e-5, 1e-2,'log-uniform')
}

opt = BayesSearchCV(
    estimator=estimator,
    search_spaces=param,
    cv=3,
    scoring="r2",
    random_state=42,
    n_iter=3,
    verbose=1,
)   

opt.fit(X, y)
1

There are 1 best solutions below

3
On BEST ANSWER

First, GPR does not seem to support string aliased kernels, at least that holds for the current release. That raises another issue however, if you supply the kernel parameter with a constructor list, skopt is unable to process it (unhashable type). This is still a standing issue as far as I'm aware, though there's a proposed workaround at the bottom of the issue page.

Another possible workaround is constructing different base estimators with a specific kernel:

from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern
from sklearn.pipeline import Pipeline

X,y = make_regression(100,10)

estimator_list = [GaussianProcessRegressor(kernel=RBF()),
                  GaussianProcessRegressor(kernel=Matern())]

pipe=Pipeline([('estimator',GaussianProcessRegressor())])

param = {
    'estimator': Categorical(estimator_list),
    'estimator__n_restarts_optimizer': (5,10),
    'estimator__alpha': (1e-5, 1e-2,'log-uniform')
}

opt = BayesSearchCV(
    estimator=pipe,
    search_spaces=param,
    cv=3,
    scoring="r2",
    random_state=42,
    n_iter=3,
    verbose=1,
)   

opt.fit(X, y)