I am trying to create a subclass from sklearn.svm.LinearSVC
for use as an estimator for sklearn.model_selection.GridSearchCV
. The child class has an extra function which in this example doesn't do anything. However, when I run this I end up with an error which I just can't seem to debug. If you copy-paste the code and run, it should reproduce the full error which ends with ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
Once I get his working, I hope to add more functionality to the method transform_this()
.
Can someone please tell me where I have gone wrong? Based this I first thought it was due to some issues with my data. However, since I've reproduced it using the sklearn built-in dataset it seems not to be the case. Also, I believe I'm subclassing this properly based on the response I got for my previous question here. Also, I learnt that the GridSearchCV doesn't seem to initialise the estimator in a different way (somehow it first uses default arguments as I see from this post)
from sklearn.datasets import load_breast_cancer
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
RANDOM_STATE = 123
class LinearSVCSub(LinearSVC):
def __init__(self, penalty='l2', loss='squared_hinge', additional_parameter1=1, additional_parameter2=100,
dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1,
class_weight=None, verbose=0, random_state=None, max_iter=1000):
super(LinearSVCSub, self).__init__(penalty=penalty, loss=loss, dual=dual, tol=tol,
C=C, multi_class=multi_class, fit_intercept=fit_intercept,
intercept_scaling=intercept_scaling, class_weight=class_weight,
verbose=verbose, random_state=random_state, max_iter=max_iter)
self.additional_parameter1 = additional_parameter1
self.additional_parameter2 = additional_parameter2
def fit(self, X, y, sample_weight=None):
X = self.transform_this(X)
super(LinearSVCSub, self).fit(X, y, sample_weight)
def predict(self, X):
X = self.transform_this(X)
super(LinearSVCSub, self).predict(X)
def score(self, X, y, sample_weight=None):
X = self.transform_this(X)
super(LinearSVCSub, self).score(X, y, sample_weight)
def decision_function(self, X):
X = self.transform_this(X)
super(LinearSVCSub, self).decision_function(X)
def transform_this(self, X):
return X
if __name__ == '__main__':
data = load_breast_cancer()
X, y = data.data, data.target
# Parameter tuning with custom LinearSVC
param_grid = {'C': [0.00001, 0.0001, 0.0005],
'dual': (True, False), 'random_state': [RANDOM_STATE],
'additional_parameter1': [0.90, 0.80, 0.60, 0.30],
'additional_parameter2': [20, 30]}
gs_model = GridSearchCV(estimator=LinearSVCSub(), verbose=1, param_grid=param_grid,
scoring='roc_auc', n_jobs=-1)
gs_model.fit(X, y)
You've got couple of problems:
LinearSVC
As soon as you correct for those you're fine to go: