Compute Gini Index on a nested/rsplit object

31 Views Asked by At

I used rsample::bootstraps function to create a nested object just as follows :

Sampled_Data=bootstraps(credit_data,times = 2,strata="Home",apparent = TRUE)

What I get is as follows :

  splits                id        
  <list>                <chr>     
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent  

I would like to compute the Gini Index based on Columns "Status" and "Expenses" for all the bootstrapped dataframes just like this :

library(pROC)
2*auc(credit_data$Status,credit_data$Expenses)-1

The problem is that i don't know how to do it without unnesting and doing a for loop.

It seems that purr package should be interesting to be used here but I'm not familiar with this.

What I would like to have :

  splits                id            Gini
  <list>                <chr>     
1 <split [34338/12635]> Bootstrap1    x
2 <split [34338/12592]> Bootstrap2    y
3 <split [34338/34338]> Apparent      z

Any help ?

Thanks

1

There are 1 best solutions below

0
On BEST ANSWER

I'll assume that you want to bootstrap this to get confidence intervals.

You would use apparent = TRUE for some types of intervals, so I'll omit that here.

library(tidymodels)
tidymodels_prefer()

data("credit_data")

# See ?int_pctl and
# https://www.tidymodels.org/learn/statistics/bootstrap
# for more info. 
get_gini <- function(split) {
  dat <- analysis(split)
  roc_res <- roc_auc(dat, truth = Status, Expenses)
  # Convert to gini stat
  roc_res %>% 
    mutate(
      .metric = "gini",
      .estimate = 2 * .estimate - 1
    ) %>% 
    # now use same fomrat as `tidy()`
    select(estimate = .estimate, term = .metric)
}

set.seed(1)
# Set times higher for bootstrap intervals
bts <- 
  bootstraps(credit_data, times = 50) %>% 
  mutate(gini = map(splits, get_gini))

int_pctl(bts, gini)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#> `gini`.
#> # A tibble: 1 × 6
#>   term   .lower .estimate .upper .alpha .method   
#>   <chr>   <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 gini  -0.0463  -0.00173 0.0377   0.05 percentile

Created on 2023-07-17 with reprex v2.0.2