I am new in R and have created a classification model using tidymodels and below is the result of collect_predictions(model)

collect_predictions(members_final) %>% print()

# A tibble: 19,126 x 6
   id               .pred_died .pred_survived  .row .pred_class died    
   <chr>                 <dbl>          <dbl> <int> <fct>       <fct>   
 1 train/test split      0.285          0.715     5 survived    survived
 2 train/test split      0.269          0.731     6 survived    survived
 3 train/test split      0.298          0.702     7 survived    survived
 4 train/test split      0.276          0.724     8 survived    survived
 5 train/test split      0.251          0.749    10 survived    survived
 6 train/test split      0.124          0.876    18 survived    survived
 7 train/test split      0.127          0.873    21 survived    survived
 8 train/test split      0.171          0.829    26 survived    survived
 9 train/test split      0.158          0.842    30 survived    survived
10 train/test split      0.150          0.850    32 survived    survived
# … with 19,116 more rows

it works with yardstick functions:

collect_predictions(members_final) %>%
  conf_mat(died, .pred_class)

          Truth
Prediction  died survived
  died       196     7207
  survived    90    11633

But when I pipe collect_predictions to caret::confusionMatrix() then it doesn't work

collect_predictions(members_final) %>% 
  caret::confusionMatrix(as.factor(died), as.factor(.pred_class))

############## output #################
Error: `data` and `reference` should be factors with the same levels.
Traceback:

1. collect_predictions(members_final) %>% caret::confusionMatrix(as.factor(died), 
 .     as.factor(.pred_class))

2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))

3. eval(quote(`_fseq`(`_lhs`)), env, env)

4. eval(quote(`_fseq`(`_lhs`)), env, env)

I am not sure what's wrong here so how can I fix it to use caret evaluation ?

Purpose of using caret evaluation is to find out the positive/negative class.

Is there any other way to find out positive/neg classes (levels(df$class) is this correct to find out positive classes used in model ?)

1

There are 1 best solutions below

1
On BEST ANSWER

If you have predictions, like your output of collect_predictions(), then you don't want to pipe it into a function from caret. It doesn't take the data as the first argument, the way that the yardstick functions do. Instead, pass in the arguments as vectors:

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
data("two_class_example", package = "yardstick")

confusionMatrix(two_class_example$predicted, two_class_example$truth)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Class1 Class2
#>     Class1    227     50
#>     Class2     31    192
#>                                           
#>                Accuracy : 0.838           
#>                  95% CI : (0.8027, 0.8692)
#>     No Information Rate : 0.516           
#>     P-Value [Acc > NIR] : <2e-16          
#>                                           
#>                   Kappa : 0.6749          
#>                                           
#>  Mcnemar's Test P-Value : 0.0455          
#>                                           
#>             Sensitivity : 0.8798          
#>             Specificity : 0.7934          
#>          Pos Pred Value : 0.8195          
#>          Neg Pred Value : 0.8610          
#>              Prevalence : 0.5160          
#>          Detection Rate : 0.4540          
#>    Detection Prevalence : 0.5540          
#>       Balanced Accuracy : 0.8366          
#>                                           
#>        'Positive' Class : Class1          
#> 

Created on 2020-10-21 by the reprex package (v0.3.0.9001)

Looks like your variable names will be died and .pred_class; you'll need to save the dataframe containing predictions as an object to access this.