Rouge-L score very low

285 Views Asked by At

I use huggingface transformer api to calculate the rouge score of summarization results. The rouge-1 and rouge-2 scores are fine, but I find my rouge-L score is very low compared to the results in papers. For example, in the dataset of eife, the baseline model lead-k's rouge scores are 34.12 6.73 32.06, while mine is 37.18 7.97 15.05. Apparently, something goes wrong with my calculation.

Here is my code:

import evaluate
import transformers
import os
import torch
from datasets import list_datasets, load_dataset
import nltk
import numpy as np

rouge = evaluate.load('rouge')

elife = load_dataset('tomasg25/scientific_lay_summarisation', 'elife')
print(elife)
"""
lexsum = load_dataset('allenai/multi_lexsum')
print(lexsum)
"""
refs = []
predicts_lead3 = []
predicts_leadk = []
for text in elife['test']['summary']:
    refs.append(text)

for text in elife['test']['article']:
    
    predicts_lead3.append(' '.join(nltk.sent_tokenize(text)[:3]))
    predicts_leadk.append(' '.join(text.split(' ')[:383]))

result_3 = rouge.compute(predictions=predicts_lead3, references=refs)
print("lead 3 results:")
print(result_3)

result_k = rouge.compute(predictions=predicts_leadk, references=refs)
print("lead k results:")
print(result_k)
0

There are 0 best solutions below