Calculate Krippendorff Alpha for Multi-label Annotation

4.1k Views Asked by At

How can I calculate Krippendorff Alpha for multi-label annotations? In case of multi-class annotation (assuming that 3 coders have annotated 4 texts with 3 labels: a, b, c), I construct first the reliability data matrix and then coincidences and based on the coincidences I can calculate Alpha:

enter image description here

The question is how I can prepare the coincidences and calculate alpha in case of multi-label classification problem like the following case?

enter image description here

Python implementation or even excel would be appreciated.

1

There are 1 best solutions below

1
On

Came across your question looking for similar information. We used the below code, with nltk.agreement for the metrics and pandas_ods_reader to read the data from a LibreOffice spreadsheet. Our data has two annotators, and for some of the items there can be two labels (for instance, one coder annotated one label only and the other coder annotated two labels instead).

The spreadsheet screencap below shows the structure of the input data. The column for annotation items is called annotItems, and annotation columns are called coder1 and coder2. The separator when there's more than one label is a pipe, unlike the comma in your example.

The code is inspired by this SO post: Low alpha for NLTK agreement using MASI distance

[Spreadsheet screencap]

from nltk import agreement
from nltk.metrics.distance import masi_distance
from nltk.metrics.distance import jaccard_distance

import pandas_ods_reader as pdreader

annotfile = "test-iaa-so.ods"

df = pdreader.read_ods(annotfile, "Sheet1")

annots = []


def create_annot(an):
    """
    Create frozensets with the unique label
    or with both labels splitting on pipe.
    Unique label has to go in a list so that
    frozenset does not split it into characters.
    """
    if "|" in str(an):
        an = frozenset(an.split("|"))
    else:
        # single label has to go in a list
        # need to cast or not depends on your data
        an = frozenset([str(int(an))])
    return an


for idx, row in df.iterrows():
    annot_id = row.annotItem + str.zfill(str(idx), 3)
    annot_coder1 = ['coder1', annot_id, create_annot(row.coder1)]
    annot_coder2 = ['coder2', annot_id, create_annot(row.coder2)]
    annots.append(annot_coder1)
    annots.append(annot_coder2)

# based on https://stackoverflow.com/questions/45741934/
jaccard_task = agreement.AnnotationTask(distance=jaccard_distance)
masi_task = agreement.AnnotationTask(distance=masi_distance)
tasks = [jaccard_task, masi_task]
for task in tasks:
    task.load_array(annots)
    print("Statistics for dataset using {}".format(task.distance))
    print("C: {}\nI: {}\nK: {}".format(task.C, task.I, task.K))
    print("Pi: {}".format(task.pi()))
    print("Kappa: {}".format(task.kappa()))
    print("Multi-Kappa: {}".format(task.multi_kappa()))
    print("Alpha: {}".format(task.alpha()))

For the data in the screencap linked from this answer, this would print:

Statistics for dataset using <function jaccard_distance at 0x7fa1464b6050>
C: {'coder1', 'coder2'}
I: {'item3002', 'item1000', 'item6005', 'item5004', 'item2001', 'item4003'}
K: {frozenset({'1'}), frozenset({'0'}), frozenset({'0', '1'})}
Pi: 0.1818181818181818
Kappa: 0.35714285714285715
Multi-Kappa: 0.35714285714285715
Alpha: 0.02941176470588236

Statistics for dataset using <function masi_distance at 0x7fa1464b60e0>
C: {'coder1', 'coder2'}
I: {'item3002', 'item1000', 'item6005', 'item5004', 'item2001', 'item4003'}
K: {frozenset({'1'}), frozenset({'0'}), frozenset({'0', '1'})}
Pi: 0.09181818181818181
Kappa: 0.2864285714285714
Multi-Kappa: 0.2864285714285714
Alpha: 0.017962466487935425