I am using the confusionMatrix function from the caret library in R to evaluate the performance of a couple of methods such as (elasticnet from glmnet library, gaussian processors from kernlib, randomforest ) on a two class data.
I can see sometimes for some of the methods, I am getting
Warning message: In confusionMatrix.default(pred, truth) : Levels are not in the same order for reference and data. Refactoring data to match.
and the performance is e.g 65 percent; however, if I relabel the levels (change the orders) of the predictions (in above example, pred), based on the "truth"; the performance becomes 25%.
I constructed the following toy data.
pred = c("a", "a", "a", "b")
pred = as.factor(pred)
levels(pred) = rev(levels(pred)) % given this line, I can either get 25% or 75%.
truth = c("a", "a", "b", "b")
truth = as.factor(truth)
confusionMatrix(pred, truth)
I understand it is intuitive, since it is a two-classed data. However, I wonder, if I do such to my favour; meaning if the performance is 25% (simply, accepting it as 75%).
See
?caret::confusionMatrix
, specifically the parameterpositive
On a second note, unless you're classes are roughly 50-50 you should probably evaluate your results with something other than a confusion matrix.