Using the iris dataset, a knn-classifier was tuned with iterative search for multiple classification. However, using loss accuracy in DALEX::model_parts() for variable importance, provides empty results.
I would appreciate any ideas. Thank you so much for your support!
library(tidyverse)
library(tidymodels)
library(DALEXtra)
tidymodels_prefer()
df <- iris
# split
set.seed(2023)
splits <- initial_split(df, strata = Species, prop = 4/5)
df_train <- training(splits)
df_test <- testing(splits)
# workflow
df_rec <- recipe(Species ~ ., data = df_train)
knn_model <- nearest_neighbor(neighbors = tune()) %>%
set_engine("kknn") %>%
set_mode("classification")
df_wflow <- workflow() %>%
add_model(knn_model) %>%
add_recipe(df_rec)
# cross-validation
set.seed(2023)
knn_res <-
df_wflow %>%
tune_bayes(
metrics = metric_set(accuracy),
resamples = vfold_cv(df_train, strata = "Species", v = 2),
control = control_bayes(verbose = TRUE, save_pred = TRUE))
# fit
best_k <- knn_res %>%
select_best("accuracy")
knn_mod <- df_wflow %>%
finalize_workflow(best_k) %>%
fit(df_train)
# variable importance
knn_exp <- explain_tidymodels(extract_fit_parsnip(knn_mod),
data = df_rec %>% prep() %>% bake(new_data = NULL, all_predictors()),
y = df_train$Species)
set.seed(2023)
vip <- model_parts(knn_exp, type = "variable_importance", loss_function = loss_accuracy)
plot(vip) # empty plot
You are getting
0for all your results because the the model type according to {DALEX} is"multiclass".These calculations would have worked well if the type is
"classification".This means that the prediction that happens will be the predicted probabilities (here we get 1s and 0s because the modeling is quite overfit)
When you use
loss_accuracy()as your loss function, it does that by using the following calculationsAnd we can see why this becomes an issue if we do the calculations steps by step. First we define the
observedas the outcome factorsince
observedis a factor vector, andpredictedis a numeric matrix we get back a logical matrix ofFALSEsince the values are never the same.So when we take the mean of this we get the expected
0.