BLEU scores：could I use nltk.translate.bleu_score.sentence_bleu for calculating scores of bleu in chinese

11.3k Views Asked by tktktk0711 At 27 September 2017 at 09:46

If I have chinese word list: like reference = ['我'， '是', '好' ,'人']， hypothesis = ['我', '是', '善良的'，'人] . Could I use the: nltk.translate.bleu_score.sentence_bleu(references, hypothesis) for chinese translation? it is the same as English? How about Japanese? I mean If I have word list(chinese and japanese ) like english. Thanks!

Original Q&A

There are 1 best solutions below

alvas On 27 September 2017 at 10:39 BEST ANSWER

TL;DR

Yes.

In Long

BLEU score measures n-grams and its agnostic to languages but its dependent on the fact the language sentences can be split into tokens. So yes, it can compare Chinese/Japanese...

Note the caveats of using BLEU score at sentence level. BLEU was never created with sentence level comparison in mind, here's a nice discussion: https://github.com/nltk/nltk/issues/1838

Most probably, you'll see the warning when you have really short sentences, e.g.

>>> from nltk.translate import bleu
>>> ref = '我 是 好 人'.split()
>>> hyp = '我 是 善良的 人'.split()
>>> bleu([ref], hyp)
/usr/local/lib/python2.7/site-packages/nltk/translate/bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 3-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
0.7071067811865475

You can use the smoothing functions in https://github.com/alvations/nltk/blob/develop/nltk/translate/bleu_score.py#L425 to overcome short sentences.

>>> from nltk.translate.bleu_score import SmoothingFunction
>>> smoothie = SmoothingFunction().method4
>>> bleu([ref], hyp, smoothing_function=smoothie)
0.2866227639866161

BLEU scores：could I use nltk.translate.bleu_score.sentence_bleu for calculating scores of bleu in chinese

There are 1 best solutions below

TL;DR

In Long

Related Questions in PYTHON-2.7

Related Questions in NLTK

Related Questions in BLEU

Trending Questions

Popular # Hahtags

Popular Questions