How to specify variable_splits for DALEX::model_profile()?

30 Views Asked by At

How do you specify variable_splits in DALEX::model_profile()?

I'm trying to get accumulate local effects for a random forest model with the above function. When doing so I get a warning...

In FUN(X[[i]], ...) : Variable: < MAP > has more than 201 unique values and all of them will be used as variable splits in calculating variable profiles. Use the variable_splits parameter to mannualy change this behaviour. If you believe this warning to be a false positive, raise issue at https://github.com/ModelOriented/ingredients/issues.

To my knowledge the DALEX package does not supply information on how to specify this, but I think it should be a list type output indicating splits for each variable, as you would get from ingredients::calculate_variable_split. However, when trying to run this function I get a curious error...not sure what's going on there

Error: 'calculate_variable_split' is not an exported object from 'namespace:ingredients'

Something to play with that reproduces the warning

library(tidyverse)
library(ranger)
library(DALEX)
data("iris")
glimpse(iris)


data<-rbind(iris,iris)%>%#double it to get >201 obs
  mutate(Species=as.factor(ifelse(Species=="setosa",1,0)))%>%#factor species 0,1
  mutate(toomany=sample(1:1000,300,replace=FALSE))#get variable with >201 unique obs

mod<-ranger(Species~.,data,keep.inbag =TRUE,importance='impurity',seed=4,probability=TRUE)#random forest probability model

ex<-DALEX::explain(model=mod,
                   data=data[,-5],
                   y=as.numeric(as.character(data$Species)),
                   label="Random Forest")#explainer

#works but see warning about variable_splits
ale<-model_profile(explainer=ex,type="accumulated")#get accumulated local effects

#doesn't work
#needs a list I think...but not sure exactly what it's looking for
ale<-model_profile(explainer=ex,type="accumulated",variable_splits=20)
1

There are 1 best solutions below

0
Kevin On

It is looking for a list of split points for each covariates. So something like this would work.

quant<-data[,-5]%>%reframe(across(everything(),~quantile(.x,c(5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100)/100)))%>%as.list()

ale<-model_profile(explainer=ex,type="accumulated",variable_splits=quant)