I'm trying to perform text classification with stacking. I'm new in ML, so apologies if this is a silly question. I'm trying to train the same algorithm, LogisticRegression on different textual features to create different classifiers and then use a meta-classifier (also LogisticRegression) to join them all. The features I'm using are the words in the text and the corresponding Part-of-Speech tags.
The classifier that uses words as a feature is defined with the following pipeline:
lr =LogisticRegression()
words = make_pipeline(ColumnSelector(column='text'),
CountVectorizer(analyzer='word', token_pattern=r'\w{1,}', max_features=5000),
lr)
The classifier that uses POS as a feature is defined with the following pipeline:
pos = make_pipeline(ColumnSelector(column='pos'),
CountVectorizer(binary=True, ngram_range=(2,3),
max_features=5000),
lr)
Finally, the metaclassifier is defined this way:
sclf = StackingCVClassifier(classifiers=[words, pos],
meta_classifier=lr)
The problem comes when I try to train the classifier:
classifiers = {"Words": words,
"POS": pos,
"Stack": sclf}
for key in classifiers:
classifier = classifiers[key]
classifier.fit(X_train, Y_train)
Words and POS are fitted, but the Stack classifier is not and I get the following error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
X_train contains a dataframe with a colum "text" that contains the raw text and a column "pos" that contains the raw POS tags, that's why I apply the transformations needed through the pipelines.
When I try the same with the StackingClassifier method, I don't have this problem. Any idea about what's going wrong?
Thanks!