My data is tabular, with samples in rows and features in columns.
I created a binary classification ensemble model using pycaret("rf", "knn", "nb", "lr", "et"), and I want to obtain sample-specific feature importance using LIME. (Not all samples)
My model's pipeline is as follows.
<bound method Pipeline.get_params of Pipeline(memory=FastMemory(location=/tmp/joblib),
steps=[('numerical_imputer',
TransformerWrapper(include=['Neurons_xcell', 'HSC_xcell',
'Th2.cells_xcell',
'Neutr...
VotingClassifier(estimators=[('Extra Trees Classifier',
ExtraTreesClassifier(n_jobs=20,
random_state=42)),
('Random Forest Classifier',
RandomForestClassifier(n_jobs=20,
random_state=42)),
('K Neighbors Classifier',
KNeighborsClassifier(n_jobs=20)),
('Naive Bayes', GaussianNB()),
('Logistic Regression',
LogisticRegression(max_iter=1000,
random_state=42))],
n_jobs=20, voting='soft'))])>
And here is my code.
import sys, os
import lime
import lime.lime_tabular
from pycaret.classification import *
from pycaret.classification import load_model
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
within_model = load_model("best_model")
df = pd.read_csv("my_input_data.csv", index_col=0)
train,test = train_test_split(df, test_size=0.2, random_state=62, stratify=df["resistance"])
def prob(data):
return np.array(list(zip(1-within_model.predict(data), within_model.predict(data))))
explainer = lime.lime_tabular.LimeTabularExplainer(train.values,feature_names = train.columns, class_names=['resistance', 'non-resistance'], kernel_width=5, mode="classification")
i = 1
exp = explainer.explain_instance(test.values[i], prob, num_features = len(test.columns))
However, I encounter an error in the explainer part: "KeyError: "None of [Index([''], dtype='object')] are in the [columns]."
I've checked other posts, but there are no spaces in the columns.
Additionally, when I tried creating a model using sklearn.ensemble.RandomForestClassifier instead of an ensemble model, no error occurred.
Could you please explain why this issue is occurring in such cases and how to resolve it to obtain feature importance?
Alternatively, could you recommend another method to extract sample-specific feature importance from my model, besides LIME?
I've checked other posts and attempted to resolve it, but the same error persists.