LIME, KeyError: "None of [Index([''], dtype='object')] are in the [columns]"

38 Views Asked by tony At 15 January 2024 at 08:11

My data is tabular, with samples in rows and features in columns.

I created a binary classification ensemble model using pycaret("rf", "knn", "nb", "lr", "et"), and I want to obtain sample-specific feature importance using LIME. (Not all samples)

My model's pipeline is as follows.

<bound method Pipeline.get_params of Pipeline(memory=FastMemory(location=/tmp/joblib),
         steps=[('numerical_imputer',
                 TransformerWrapper(include=['Neurons_xcell', 'HSC_xcell',
                                             'Th2.cells_xcell',
                                             'Neutr...
                 VotingClassifier(estimators=[('Extra Trees Classifier',
                                               ExtraTreesClassifier(n_jobs=20,
                                                                    random_state=42)),
                                              ('Random Forest Classifier',
                                               RandomForestClassifier(n_jobs=20,
                                                                      random_state=42)),
                                              ('K Neighbors Classifier',
                                               KNeighborsClassifier(n_jobs=20)),
                                              ('Naive Bayes', GaussianNB()),
                                              ('Logistic Regression',
                                               LogisticRegression(max_iter=1000,
                                                                  random_state=42))],
                                  n_jobs=20, voting='soft'))])>

And here is my code.

import sys, os

import lime
import lime.lime_tabular
from pycaret.classification import *
from pycaret.classification import load_model 
from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np


within_model = load_model("best_model")
df = pd.read_csv("my_input_data.csv", index_col=0)

train,test = train_test_split(df, test_size=0.2, random_state=62, stratify=df["resistance"])

def prob(data):
    return np.array(list(zip(1-within_model.predict(data), within_model.predict(data))))
    
explainer = lime.lime_tabular.LimeTabularExplainer(train.values,feature_names = train.columns, class_names=['resistance', 'non-resistance'], kernel_width=5, mode="classification")

i = 1
exp = explainer.explain_instance(test.values[i], prob, num_features = len(test.columns))

However, I encounter an error in the explainer part: "KeyError: "None of [Index([''], dtype='object')] are in the [columns]."

I've checked other posts, but there are no spaces in the columns.

Additionally, when I tried creating a model using sklearn.ensemble.RandomForestClassifier instead of an ensemble model, no error occurred.

Could you please explain why this issue is occurring in such cases and how to resolve it to obtain feature importance?

Alternatively, could you recommend another method to extract sample-specific feature importance from my model, besides LIME?

I've checked other posts and attempted to resolve it, but the same error persists.

Original Q&A

LIME, KeyError: "None of [Index([''], dtype='object')] are in the [columns]"

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYCARET

Related Questions in LIME

Related Questions in XAI

Trending Questions

Popular # Hahtags

Popular Questions