I'm using a StackingClassifier in sklearn, where I want the component models to be custom classifiers. In order to do this, I wanted to test it out with some dummy code where the custom classifier is the exact same as an already existing model (KNN, in this example). However this throws an error, and I'm not sure I understand why, and looking for help with this. It's probably something fairly obvious (I'm new to trying to write custom classifiers and using ClassiferMixIn), but I can't seem to figure out what I'm missing:
Code -- the baseline example without my custom class (works):
from sklearn.ensemble import StackingClassifier
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
model = StackingClassifier(estimators=[
('tree', Pipeline([('tree', DecisionTreeClassifier(random_state=42))])),
('knn', Pipeline([('knn', KNeighborsClassifier())])),
])
model.fit(X, y)
Code -- the with my custom class (doesn't work):
class MyOwnClassifier(ClassifierMixin):
def __init__(self,classifier):
self.classifier = classifier
def fit(self, X, y):
self.classifier.fit(X,y)
return self
def predict(self, X):
return self.classifier.predict(X)
def predict_proba(self, X):
return self.classifier.predict_proba(X)
model = StackingClassifier(estimators=[
('tree', Pipeline([('tree', DecisionTreeClassifier(random_state=42))])),
('knn', Pipeline([('knn', MyOwnClassifier(KNeighborsClassifier()))])),
])
model.fit(X, y)
returns the error
AttributeError: 'MyOwnClassifier' object has no attribute 'classes_'
What really puzzles me about this is that in this answer, an identity transform could be used as part of the pipeline, and I can't imagine that object had 'classes_' either.
You've got 3 problems with your code:
StackingClassifier
expects an attributeclasses_
to be available on a fitted classifier, which is clearly stated in the error message. The linked example does have it, whereas yours doesn't. It can be checked if you run likedir(MyOwnClassifier(KNeighborsClassifier()).fit(X,y))
.BaseEstimator
is missing from your class definition (you can do without it, but its presence makes life easier)Pipelines
in you code are extraneous clutter that are not necessary to debug your code and only complicating debugging.Once you correct these problems you have a working code: