Stacking classifier: Using custom classifier returns error

1k Views Asked by At

I'm using a StackingClassifier in sklearn, where I want the component models to be custom classifiers. In order to do this, I wanted to test it out with some dummy code where the custom classifier is the exact same as an already existing model (KNN, in this example). However this throws an error, and I'm not sure I understand why, and looking for help with this. It's probably something fairly obvious (I'm new to trying to write custom classifiers and using ClassiferMixIn), but I can't seem to figure out what I'm missing:

Code -- the baseline example without my custom class (works):

from sklearn.ensemble import StackingClassifier
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

model = StackingClassifier(estimators=[
        ('tree', Pipeline([('tree', DecisionTreeClassifier(random_state=42))])),
        ('knn', Pipeline([('knn', KNeighborsClassifier())])),
    ])

model.fit(X, y)

Code -- the with my custom class (doesn't work):

class MyOwnClassifier(ClassifierMixin):
    def __init__(self,classifier):
        self.classifier = classifier
    
    def fit(self, X, y):
        self.classifier.fit(X,y)
        return self 
    
    def predict(self, X):
        return self.classifier.predict(X)
    
    def predict_proba(self, X):
        return self.classifier.predict_proba(X)

model = StackingClassifier(estimators=[
        ('tree', Pipeline([('tree', DecisionTreeClassifier(random_state=42))])),
        ('knn', Pipeline([('knn', MyOwnClassifier(KNeighborsClassifier()))])),
    ])

model.fit(X, y)

returns the error

AttributeError: 'MyOwnClassifier' object has no attribute 'classes_'

What really puzzles me about this is that in this answer, an identity transform could be used as part of the pipeline, and I can't imagine that object had 'classes_' either.

1

There are 1 best solutions below

2
On BEST ANSWER

You've got 3 problems with your code:

  1. StackingClassifier expects an attribute classes_ to be available on a fitted classifier, which is clearly stated in the error message. The linked example does have it, whereas yours doesn't. It can be checked if you run like dir(MyOwnClassifier(KNeighborsClassifier()).fit(X,y)).

  2. BaseEstimator is missing from your class definition (you can do without it, but its presence makes life easier)

  3. Pipelines in you code are extraneous clutter that are not necessary to debug your code and only complicating debugging.

Once you correct these problems you have a working code:

from sklearn.ensemble import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.base import ClassifierMixin, BaseEstimator

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

class MyOwnClassifier(ClassifierMixin, BaseEstimator):
    
    def __init__(self,classifier):
        self.classifier = classifier
        
    def fit(self, X, y):
        self.classifier.fit(X,y)
        self.classes_ = self.classifier.classes_
        return self
    
    def predict(self, X):
        return self.classifier.predict(X)
    
    def predict_proba(self, X):
        return self.classifier.predict_proba(X)

model = StackingClassifier(estimators=[
        ('tree', DecisionTreeClassifier(random_state=42)),
        ('knn', MyOwnClassifier(KNeighborsClassifier()))])

model.fit(X,y)
StackingClassifier(estimators=[('tree',
                                DecisionTreeClassifier(random_state=42)),
                               ('knn',
                                MyOwnClassifier(classifier=KNeighborsClassifier()))])