I’m trying to use the BLEU score from NLTK for quality evaluation of the machine translation. I wanted to check this code with two identical sentences, here I’m using method1 as a Smoothing function because I’m comparing two sentences and not corpora. I set 4-grams and weights 0.25 (1/4). But as a result, I’m getting 0.0088308. What am I doing wrong? Two identical sentences should get a score 1.0. I'm coding on Python 3, Windows 7, in PyCharm.
My code:
import nltk
from nltk import word_tokenize
from nltk.translate.bleu_score import SmoothingFunction
ref = 'You know that it would be untrue You know that I would be a liar If I was to say to you Girl, we couldnt get much higher.'
cand = 'You know that it would be untrue You know that I would be a liar If I was to say to you Girl, we couldnt get much higher.'
smoothie = SmoothingFunction().method1
reference = word_tokenize(ref)
candidate = word_tokenize(cand)
weights = (0.25, 0.25, 0.25, 0.25)
BLEUscore = nltk.translate.bleu_score.sentence_bleu(reference, candidate, weights, smoothing_function=smoothie)
print(BLEUscore)
My result:
0.008830895300928163
Process finished with exit code 0
BLEU allows to compare set of references with a candidate, so if you want to use it you should set the list of lists of sentences as a list of references. In other words, even if you take only one reference it should be a list of lists (in my example reference should be [reference]:
When I put reference in [] I've got 1.0.