Not able to fit metaclassifier with StackingCVClassifier

255 Views Asked by At

I'm trying to perform text classification with stacking. I'm new in ML, so apologies if this is a silly question. I'm trying to train the same algorithm, LogisticRegression on different textual features to create different classifiers and then use a meta-classifier (also LogisticRegression) to join them all. The features I'm using are the words in the text and the corresponding Part-of-Speech tags.

The classifier that uses words as a feature is defined with the following pipeline:

lr =LogisticRegression()

words = make_pipeline(ColumnSelector(column='text'), 
                      CountVectorizer(analyzer='word', token_pattern=r'\w{1,}', max_features=5000),                     
                      lr)

The classifier that uses POS as a feature is defined with the following pipeline:

pos = make_pipeline(ColumnSelector(column='pos'),
                                    CountVectorizer(binary=True, ngram_range=(2,3), 
                                    max_features=5000),
                                    lr)

Finally, the metaclassifier is defined this way:

sclf = StackingCVClassifier(classifiers=[words, pos], 
                            meta_classifier=lr)

The problem comes when I try to train the classifier:

classifiers = {"Words": words,
               "POS": pos,
               "Stack": sclf}
for key in classifiers:
       classifier = classifiers[key]
       classifier.fit(X_train, Y_train)

Words and POS are fitted, but the Stack classifier is not and I get the following error:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

X_train contains a dataframe with a colum "text" that contains the raw text and a column "pos" that contains the raw POS tags, that's why I apply the transformations needed through the pipelines.

When I try the same with the StackingClassifier method, I don't have this problem. Any idea about what's going wrong?

Thanks!

0

There are 0 best solutions below