I have built a random forest tidy model very similar to what Julia Silge has done in this video. I also plan to show variable importance plots based on the permutation method, however I would like to show box plots or violin plots, rather than points.
Here is an example, following Julia's code:
Data and Model Building
# DATA
library(tidyverse)
water_raw <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv")
# Data prep
water <- water_raw %>%
  filter(
    country_name == "Sierra Leone",
    lat_deg > 0, lat_deg < 15, lon_deg < 0,
    status_id %in% c("y", "n")
  ) %>%
  mutate(pay = case_when(
    str_detect(pay, "^No") ~ "no",
    str_detect(pay, "^Yes") ~ "yes",
    is.na(pay) ~ pay,
    TRUE ~ "it's complicated"
  )) %>%
  select(-country_name, -status, -report_date) %>%
  mutate_if(is.character, as.factor)
library(tidymodels)
set.seed(123)
water_split <- initial_split(water, strata = status_id)
water_train <- training(water_split)
water_test <- testing(water_split)
set.seed(234)
water_folds <- vfold_cv(water_train, strata = status_id)
water_folds
# Model building
library(themis)
ranger_recipe <-
  recipe(formula = status_id ~ ., data = water_train) %>%
  update_role(row_id, new_role = "id") %>%
  step_unknown(all_nominal_predictors()) %>%
  step_other(all_nominal_predictors(), threshold = 0.03) %>%
  step_impute_linear(install_year) %>%
  step_downsample(status_id)
ranger_spec <-
  rand_forest(trees = 1000) %>%
  set_mode("classification") %>%
  set_engine("ranger")
ranger_workflow <-
  workflow() %>%
  add_recipe(ranger_recipe) %>%
  add_model(ranger_spec)
doParallel::registerDoParallel()
set.seed(74403)
ranger_rs <-
  fit_resamples(ranger_workflow,
    resamples = water_folds,
    control = control_resamples(save_pred = TRUE)
  )
Here is Julia's VIP code:
library(vip)
imp_data <- ranger_recipe %>%
  prep() %>%
  bake(new_data = NULL) %>%
  select(-row_id)
ranger_spec %>%
  set_engine("ranger", importance = "permutation") %>%
  fit(status_id ~ ., data = imp_data) %>%
  vip(geom = "point")
My Attempt:
ranger_spec %>%
  set_engine("ranger", importance = "permutation") %>%
  fit(status_id ~ ., data = imp_data) %>%
  vip(pred_wrapper = predict, geom = "boxplot", nsim = 10, keep = TRUE)
However it continues to return this error:
Error: To construct boxplots for permutation-based importance scores you must specify keep = TRUE in the call vi() or vi_permute(). Additionally, you also need to set nsim >= 2.
Because I have done all of those things, I assume my error is with pred_wrapper, but I'm not sure. What am I doing wrong here?
Thanks ya'll!
                        
First, you may be interested in a resampling approach to estimating variable importance, where you yourself control the resampling and what gets extracted.
Second, I think something is not working quite right with
method = "permutation"for a tidymodels model. I can't get it to work, but I can get the permutation importance for the underlying model:Created on 2022-09-02 with reprex v2.0.2
Here is another resource for how to use vip, but you may want to look into using DALEX for permutation variable importance.