I want to calculate BLEU score for a corpus.
Due to some problems, I had to use sentence-level BLEU score rather than corpus-level BLEU. I have used nltk.translate.bleu_score.sentence_bleu
in python to compute BLEU score for each sentence, and now I’d like to assign a score to the whole corpus.
How should I integerate partial scores? Should I use geometric average or something else?
Any help would be appreciated.