Low interannotator agreement using krippendorff alpha or fleiss kappa

560 Views Asked by user3636159 At 20 October 2025 at 13:55

I have 3 categories, rated by 3 annotators each. In 52% of the cases, the 3 annotators agreed on the same category and in 43% two annotators agreed on one category and in only 5% of the times, each annotator chose a different category.

I calculate fleiss's kappa or krippendorff, but the value for krippendorff is lower than the fleiss, much lower, it's 0.032 while my fleiss is 0.49.

Isn't the agreement too low, especially using krippendorff?

Original Q&A

There are 1 best solutions below

DieseRobin On 26 November 2021 at 16:56

Fleiß and Krippendorff implementations expect the input data to be in specific formats (rows, columns) !

Fleiss (subjects, n_categories)

Krippendorff (raters, subjects)

To get there from (subjects, raters)

For Fleiss use aggregate_raters() function from statsmodels fleiss

For Krippendorff transpose the array

If used correctly these functions will result in very similar values. If not make sure Krippendorff ‘knows’ what kind of scale (nominal, ordinal.. etc.) it is dealing with by passing the appropriate argument.

Also see the longer answers:

Inter-rater reliability calculation for multi-raters data

Is fleiss kappa a reliable measure for interannotator agreement? The following results confuses me, are there any involved assumptions while using it?

Low interannotator agreement using krippendorff alpha or fleiss kappa

There are 1 best solutions below

Related Questions in STATSMODELS

Related Questions in COHEN-KAPPA

Trending Questions

Popular # Hahtags

Popular Questions