How to get confusion matrix with relative values from kNN models with more than two factor levels?

70 Views Asked by At

I built a simple kNN model with the packages mlr3 and mlr3learners, using the diabetes data set from the mclust package. I am trying to use the kNN model to predict the class category based on the three available numeric features (glucose,insulin,sspg), and evaluate it's performance using measures from the mlr3measures package and a confusion matrix as I am interesting in assessing whether some classes get misclassified more often than others. My question is how can I obtain a confusion matrix with relative values?

Using following code I get a confusion matrix with absolute values.

Code Example

# load packages
library(mlr3)
library(mlr3learners) # for classif.kknn
library(mlr3measures) # for confusion_matrix()
library(mclust) # for data(diabetes)

# load data
data(diabetes, package = "mclust")
diabetes <- as.data.table(diabetes)

# define task
diabetes_task <- as_task_classif(diabetes, 
                                 target = "class", 
                                 id = "diabetes")

# define ML algorithm
knn_model <- lrn('classif.kknn')

# partition data
splits <- partition(diabetes_task) 

# train model
knn_model$train(diabetes_task, 
                row_ids = splits$train)

# test model 
prediction <- knn_model$predict(diabetes_task, 
                                row_ids = splits$test)

# evaluate performance
prediction$confusion 

Confusion Matrix

           truth
 response   Chemical Normal Overt
   Chemical       10      2     0
   Normal          2     23     0
   Overt           0      0    11

Instead of this matrix, I'd like to have a confusion matrix with relative values. I just found out that the confusion_matrix() function from the mlr3measures package includes an argument to get relative values (i.e., relative = TRUE), but this function only works when the truth and response labels have only two factors. Apparently, this was rather straightforward to obtain in the old mlr package. Sorry, if the question is a little basic, but is there a simple way to obtain the relative values confusion matrix?

1

There are 1 best solutions below

0
On

As pointed out in the comments, you can simply divide by the sum of all values in the matrix:

prediction$confusion / sum(prediction$confusion)

          truth
response     Chemical     Normal      Overt
  Chemical 0.25000000 0.06250000 0.04166667
  Normal   0.00000000 0.45833333 0.00000000
  Overt    0.00000000 0.00000000 0.18750000