Averaging results from 'rr_profile' for multiple iterations of statistical models

78 Views Asked by At

Does anyone have any experience obtained averaged results from multiple model_profile outputs from DALEX?

EDIT

To clarify, the model_profile function outputs partial dependence plots for one model. However, I have a list of 500 models that have been generated using different testing/training splits, and downsampled data.

Averaging results for model_profile with variable_type = 'categorical' was relatively straight forward, as each category was extracted as one value from $agr_profiles, which could be merged together into a large data frame.

However, I am not sure how exactly to go about averaging the entire profile 500 times to generate averaged values for the model_profile with variable_type = 'continuous'.

Reproducible example

library("ranger") 
library(DALEX)

trainIdx_1 <- sample(nrow(titanic_imputed), 2/3 * nrow(titanic_imputed)) 
trainData_1 <- titanic_imputed[trainIdx_1, ]

trainIdx_2 <- sample(nrow(titanic_imputed), 2/3 * nrow(titanic_imputed)) 
trainData_2 <- titanic_imputed[trainIdx_2, ]

titanic_ranger_model_1 <- ranger(survived~., data = trainData_1, num.trees = 50,
                               probability = TRUE) 
titanic_ranger_model_2 <- ranger(survived~., data = trainData_2, num.trees = 50,
                                 probability = TRUE)

exp_1 <- explain(titanic_ranger_model_1, data = trainData_1) 
exp_2 <- explain(titanic_ranger_model_2, data = trainData_2)

model_profile_1 <- model_profile(exp_1, type = "partial") 
model_profile_2 <- model_profile(exp_2, type = "partial")

How could I obtain the average profile from model_profile_1 and model_profile_2?

2

There are 2 best solutions below

0
Jason Connelly On BEST ANSWER

I was able to get a decent looking mean line by joining both profiles data as one dataset, and plotting a line using ggplot's stat_summary_bin. It's not perfect, and may require some playing around with bin numbers for best results, but it works for me. Example code and output shown below. It looks better if more iterations are used

agrData = rbind(model_profile_1$agr_profiles, model_profile_2$agr_profiles)
plot(agrData) + stat_summary_bin(mapping = aes(y = `_yhat_`), fun = "mean", geom = "line")

Black line is mean plot

2
Qasim Alhammad On

thanks for sharing a reproducible example, I guess what you are trying to achieve here is to find the average prediction for each model profiler.

In your example you need to use below code

plot(model_profile_1$agr_profiles, model_profile_2$agr_profiles)

In you case you will get exactly the same results because you are using the exact same model to train the exact same data to do prediction. so no matter how many profiles you create you will get again the same results.

In order to get different results you need to build different models then only you can compare the models you built to compare models together and decide which model is working best.

if you look at the example mentioned in the DALEX package page xai-tools.drwhy.ai/DALEX.html, you will see that first model was built with ranger and second model was built with GBM.

if you don't want to use another model than ranger, you can create multiple models with rangers but change the setting for each model, like num.trees. only then you will get a different result.

I hope this answers your question.