Sklearn - Multi-class confusion matrix for ordinal data

Question

Sklearn - Multi-class confusion matrix for ordinal data

853 Views Asked by Shlomi Schwartz At 01 July 2025 at 00:02

I've written a model that predicts on ordinal data. At the moment, I'm evaluating my model using quadratic cohen's kappa. I'm looking for a way to visualize the results using a confusion matrix, then calculate recall, precision and f1 score taking into account the prediction distance.

I.E predicting 2 when class was 1 is better than predicting 3 when class was 1.

I've written the following code to plot and calculate the results:

def plot_cm(df, ax):
    cf_matrix = confusion_matrix(df.x, df.y,normalize='true',labels=[0,1,2,3,4,5,6,7,8]) 
    
    ax = sns.heatmap(cf_matrix, linewidths=1, annot=True, ax=ax, fmt='.2f')
    ax.set_ylabel(f'Actual')
    ax.set_xlabel(f'Predicted')

    print(f'Recall score:',recall_score(df.x,df.y, average= 'weighted',zero_division=0))
    print(f'Precision score:',precision_score(df.x,df.y, average= 'weighted',zero_division=0))
    print(f'F1 score:',f1_score(df.x,df.y, average= 'weighted',zero_division=0))

Recall score: 0.53505
Precision score: 0.5454783454981732
F1 score: 0.5360650278722704

The visualization is fine, however, the calculation ignores predictions that where "almost" true. I.E predicted 8 when actual was 9 (for example).

Is there a way to calculate Recall, Precision and F1 taking into account the ordinal behavior of the data?

Original Q&A

There are 1 best solutions below

**igrinis** · Answer 1

A regular Precision (for class) is calculated as ratio of True Positives over Totally Detected for that class. Usually Truly Positive detection is defined in a binary fashion: you either correctly detected the class or not. There is no constriction whatsoever to make TP detection score for sample i fuzzy (or in other words lightly penalize close-to-class detections and make the penalty more severe as the difference grows):

TP(i) = max(0, (1 - abs(detected_class(i) - true_class(i))/penalty_factor) )

where TP_i is a value of "true positive detection" for samle i, and would be some number between [0,1] - this is . It is reasonable to make penalty_factor equal to the number of classes (it should be larger than 1). By changing it you can control how much "distant" classes would be penalized. For example if you decide that difference of more than 3 is enough to consider "not detected", set it to 3. If you set it to 1, you will get back to the "regular" precision formulation. I'm using max() to make sure that TP score will not become negative.

Now, to get the denominator right, you need to set it to the count of samples that got TP(i)>0. That is if you have a total 100 samples, and out of those 5 were detected with TP detection score of 1, and 6 got TP detection score 0.5, your Precision would be (5 + 6*0.5)/(5+6).

One issue here is that "precision per class" becomes meaningless as any class becomes somehow relevant to all classes, and if you need total precision "weighted" by class (for unbalanced classes case), you need to factor it in TP score considering true class of the sample i.

Employing the same logic, the Recall would be the sum of TP scores over the relevant population, i.e.

R = (sum of (weighted) TP scores)/(total amount of samples)

And, finally, F1 is a harmonic mean of Precision and Recall.

Sklearn - Multi-class confusion matrix for ordinal data

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in CONFUSION-MATRIX

Related Questions in ORDINAL

Related Questions in COHEN-KAPPA

Trending Questions

Popular # Hahtags

Popular Questions