Kernel crashing when computing SHAP values

142 Views Asked by sssj At 19 November 2023 at 03:59

I am trying to calculate SHAP values. I have the following code for model evaluation and training

from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier 
from sklearn.metrics import classification_report 

# split the dataset into train and test
train_text, val_text, train_labels, val_labels = 
train_test_split(messages["text"].tolist(), messages["label"].tolist(), 
test_size=0.3, random_state=42) 

# Vectorize the text data 

print('starting tdidf vectorizer')
vectorizer = TfidfVectorizer(min_df=2, max_df=0.5, ngram_range=(1,2))
X_train_vec = vectorizer.fit_transform(train_text).toarray()
X_val_vec = vectorizer.transform(val_text).toarray()

print(len(train_labels), len([t for t in train_labels if t])) 
print(len(val_labels), len([t for t in val_labels if t])) 

# Train model on the training set 
rand_fore_max_feat='sqrt'
rand_fore_n_est = 1000
RandomForestClassifier_model = RandomForestClassifier(max_features = 
rand_fore_max_feat, n_estimators = rand_fore_n_est)

#rename model for ease of use
model = RandomForestClassifier_model

#* fit model
print('starting model fit')
model.fit(X_train_vec, train_labels)
print('finished model fit')

# make predictions usng a testing set
val_pred = model.predict(X_val_vec) 

# display a classification report
print(classification_report(val_pred, val_labels))

When I run the code below to calculate SHAP values, the kernel crashes.

feature_names = vectorizer.get_feature_names_out() 

subset_size = 10 

  

try: 

    explainer = shap.Explainer(model, X_train_vec, feature_names=feature_names) 

    shap_values = explainer(X_val_vec[:subset_size]) 

    print(shap_values.values.shape) 

except Exception as e: 

    print(f"An error occurred: {e}")

I tried running each line of code in the try block individually, and "shap_values = explainer(X_val_vec[:subset_size])" is the line that seems to make the kernel crash.

I tried changing my Python to 3.9.10 and updating Jupyter, ipywidgets, and ipykernel. I also tried changing the subset size from 10, to 5, to 1. I don't think the issue is with the sample size.

I tried lowering each text to 10 words and making a subset size of 3 with the following code

# X_val_text is your original text data and you want to reduce each 
#text to 10 words
X_val_text_subset = [" ".join(text.split()[:10]) for text in 
messages["text"]]

# Vectorize the modified text data
X_val_vec_subset = vectorizer.transform(X_val_text_subset)

# Choose a subset of instances (let's say 3)
subset_size = 3
X_val_vec_subset = X_val_vec_subset[:subset_size]

print(X_val_vec_subset)

This code outputs "(0, 56681) 0.26027819722272955 (0, 56625) 0.1740880007667988 (0, 55480) 0.18384870639744058 (0, 55457) 0.15617249149572865 (0, 47186) 0.3415585648445225 (0, 47183) 0.2530088534451547 (0, 42503) 0.12703787267515596 (0, 23941) 0.24448002523220852..."

Any ideas as to what I can do to stop this code from crashing?

Original Q&A

Kernel crashing when computing SHAP values

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in TF-IDF

Related Questions in SHAP

Trending Questions

Popular # Hahtags

Popular Questions