I am new in R and have created a classification
model using tidymodels
and below is the result of collect_predictions(model)
collect_predictions(members_final) %>% print()
# A tibble: 19,126 x 6
id .pred_died .pred_survived .row .pred_class died
<chr> <dbl> <dbl> <int> <fct> <fct>
1 train/test split 0.285 0.715 5 survived survived
2 train/test split 0.269 0.731 6 survived survived
3 train/test split 0.298 0.702 7 survived survived
4 train/test split 0.276 0.724 8 survived survived
5 train/test split 0.251 0.749 10 survived survived
6 train/test split 0.124 0.876 18 survived survived
7 train/test split 0.127 0.873 21 survived survived
8 train/test split 0.171 0.829 26 survived survived
9 train/test split 0.158 0.842 30 survived survived
10 train/test split 0.150 0.850 32 survived survived
# … with 19,116 more rows
it works with yardstick
functions:
collect_predictions(members_final) %>%
conf_mat(died, .pred_class)
Truth
Prediction died survived
died 196 7207
survived 90 11633
But when I pipe collect_predictions
to caret::confusionMatrix()
then it doesn't work
collect_predictions(members_final) %>%
caret::confusionMatrix(as.factor(died), as.factor(.pred_class))
############## output #################
Error: `data` and `reference` should be factors with the same levels.
Traceback:
1. collect_predictions(members_final) %>% caret::confusionMatrix(as.factor(died),
. as.factor(.pred_class))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
I am not sure what's wrong here so how can I fix it to use caret evaluation ?
Purpose of using caret evaluation is to find out the positive/negative class.
Is there any other way to find out positive/neg classes (levels(df$class) is this correct to find out positive classes used in model ?)
If you have predictions, like your output of
collect_predictions()
, then you don't want to pipe it into a function from caret. It doesn't take the data as the first argument, the way that the yardstick functions do. Instead, pass in the arguments as vectors:Created on 2020-10-21 by the reprex package (v0.3.0.9001)
Looks like your variable names will be
died
and.pred_class
; you'll need to save the dataframe containing predictions as an object to access this.