AssertionError Fleiss' Kappa for IAA

262 Views Asked by At

In my data frame

data = {'Query' : ['Carpal Ts', 'Dermallxde'],
        'Items' : [['Sir chucks', 'Oga Mathew', 'Tambe Charlse'],['Man Ond', 'kolofata Hil', 'Haruna mendy']],
        'Raters' : [[[0,0,0,1,1,0,0,1,0,1,0,0,0],[0,1,0,1,0,0,0,1,1,1,0,0,1],[1,1,1,1,1,1,1,1,1,1,1,1,1]],[[1,1,1,1,1,1,1,1,1,1,1,1,1],[1,1,1,0,1,0,1,1,0,1,1,1,0],[1,1,1,1,1,1,1,1,1,1,1,1,1]]]}

results_df = pd.DataFrame(data)

The Raters rates either 0/1 for for each query rates Items as bad or good respectively.

I have tried to calculate the Fleiss' Kappa by creating an matrix array for the raters column and for each Item row i get the Fleiss' Kappa for that particular Query. Please see my attempt below

from statsmodels.stats.inter_rater import fleiss_kappa

# create an empty list to hold the Fleiss' Kappa values
kappas = []

# iterate over each index in the results dataframe
for idx, row in results_df.iterrows():
    data = row['Raters'] # get the list of lists in the 'Data' field
    users = row['Items'] # get the list of users with data
    
    # check if there are exactly three non-null columns with lists of equal length
    if len(data) == 3 and len(set([len(d) for d in data])) == 1:
        kappa = fleiss_kappa(np.array(data).T)
        print(kappa)
        kappas.append({
            'Query': idx,
            'Items': users,
            'Fleiss Kappa': kappa
        })

# create a new dataframe from the list of Fleiss' Kappa values
kappa_df = pd.DataFrame(kappas)

Any help on this Calculation will be much appreciated.

1

There are 1 best solutions below

0
On

Had the same issue. If I understand your data correctly, you need to first convert data with shape (subject, rater) to (subject, cat_counts), as described here. So before computing kappa, convert the data to the (subject, cat_count) shape by using aggregate_raters() function. Hope this helps!

from statsmodels.stats.inter_rater import fleiss_kappa,aggregate_raters

# create an empty list to hold the Fleiss' Kappa values
kappas = []

# iterate over each index in the results dataframe
for idx, row in results_df.iterrows():
    data = row['Raters'] # get the list of lists in the 'Data' field
    users = row['Items'] # get the list of users with data
    
    # check if there are exactly three non-null columns with lists of equal length
    if len(data) == 3 and len(set([len(d) for d in data])) == 1:
        kappa = fleiss_kappa(aggregate_raters(np.array(data).T)[0])
        print(kappa)
        kappas.append({
            'Query': idx,
            'Items': users,
            'Fleiss Kappa': kappa
        })

# create a new dataframe from the list of Fleiss' Kappa values
kappa_df = pd.DataFrame(kappas)