different output for PR AUC for different R packages

339 Views Asked by At

I find different numeric values for the computation of the Area Under the Precision Recall Curve (PRAUC) with the dataset I am working on when computed via 2 different R packages: yardstick and caret.
I am afraid I was not able to reproduce this mismatch with synthetic data, but only with my dataset (this is strange as well)

In order to make this reproducible, I am sharing the prediction output of my model, you can download it here https://drive.google.com/open?id=1LuCcEw-RNRcdz6cg0X5bIEblatxH4Rdz (don't worry, it's a small csv).
The csv contains a dataframe with 4 columns:
yes probability estimate of being in class yes
no = 1 - yes
obs actual class label
pred predicted class label (with .5 threshold)

here follows the code to produce the 2 values of PRAUC

require(data.table)
require(yardstick)
require(caret)
pr <- fread('pred_sample.csv')
# transform to factors
# put the positive class in the first level
pr[, obs := factor(obs, levels = c('yes', 'no'))] 
pr[, pred := factor(pred, levels = c('yes', 'no'))] # this is actually not needed

# compute yardstick PRAUC
pr_auc(pr, obs, yes) # 0.315

# compute caret PRAUC
prSummary(pr, lev = c('yes', 'no')) # 0.2373

I could understand a little difference, due to the approximation when computing the area (interpolating the curve), but this seems way too high.

I even tried a third package, PRROC, and the result is still different, namely around .26.

0

There are 0 best solutions below