I'm comparing the chrF++ calculation of Evaluate and NLTK library. Here is the evaluate
's implementation:
import evaluate
reference1 = "犯人受到了嚴密的監控。"
hypothesis1 = "犯人受到嚴密監視。"
chrf = metric_chrf.compute(predictions=[hypothesis1], references=[reference1], word_order=2, lowercase=True)
print("CHRF:", chrf["score"])
It returns CHRF: 21.93821771592929
.
But for NLTK's chrF implementation, there is no word_order
parameter, only min_len
, max_len
and beta
values.
Here are the NLTK's codes:
from nltk.translate.chrf_score import sentence_chrf
print("CHRF:", sentence_chrf(reference1, hypothesis1) * 100)
It returns CHRF: 37.60954653811797
.
My question is: how can I obtain the same value of Evaluate library?
I tried different values of min_len
, max_len
and beta
, but I cannot find the right pair of parameters.
print("Beta = 1")
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=2, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=3, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=4, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=5, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=6, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=3, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=4, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=5, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=6, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=4, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=5, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=6, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=5, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=6, beta=1.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=5, max_len=6, beta=1.0) * 100)
print("Beta = 2")
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=2, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=3, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=4, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=5, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=6, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=3, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=4, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=5, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=6, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=4, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=5, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=6, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=5, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=6, beta=2.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=5, max_len=6, beta=2.0) * 100)
print("Beta = 3")
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=2, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=3, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=4, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=5, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=6, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=3, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=4, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=5, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=6, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=4, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=5, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=6, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=5, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=6, beta=3.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=5, max_len=6, beta=3.0) * 100)
print("Beta = 4")
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=2, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=3, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=4, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=5, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=1, max_len=6, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=3, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=4, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=5, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=2, max_len=6, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=4, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=5, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=3, max_len=6, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=5, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=4, max_len=6, beta=4.0) * 100)
print("CHRF:", sentence_chrf(reference1, hypothesis1, min_len=5, max_len=6, beta=4.0) * 100)
which outputs:
Beta = 1
CHRF: 62.22222222222222
CHRF: 49.81481481481482
CHRF: 40.932539682539684
CHRF: 32.74603174603175
CHRF: 27.288359788359788
CHRF: 34.72222222222222
CHRF: 27.91005291005291
CHRF: 20.932539682539687
CHRF: 16.74603174603175
CHRF: 19.642857142857146
CHRF: 13.095238095238102
CHRF: 9.821428571428578
CHRF: 7.1428571428571495
CHRF: 4.76190476190477
CHRF: 1e-14
Beta = 2
CHRF: 58.569182389937104
CHRF: 46.79805957778753
CHRF: 38.38801836755117
CHRF: 30.71041469404093
CHRF: 25.592012245034113
CHRF: 32.46124031007752
CHRF: 26.026791785665715
CHRF: 19.520093839249288
CHRF: 15.616075071399433
CHRF: 18.206854345165237
CHRF: 12.137902896776827
CHRF: 9.103427172582624
CHRF: 6.5789473684210575
CHRF: 4.385964912280709
CHRF: 1e-14
Beta = 3
CHRF: 57.44520030234316
CHRF: 45.872557777319685
CHRF: 37.60954653811797
CHRF: 30.087637230494373
CHRF: 25.07303102541198
CHRF: 31.77179962894248
CHRF: 25.454704026132596
CHRF: 19.09102801959945
CHRF: 15.27282241567956
CHRF: 17.773892773892776
CHRF: 11.849261849261854
CHRF: 8.886946386946393
CHRF: 6.410256410256415
CHRF: 4.2735042735042805
CHRF: 1e-14
Beta = 4
CHRF: 56.99485199485199
CHRF: 45.502086760364904
CHRF: 37.298206861318455
CHRF: 29.838565489054762
CHRF: 24.86547124087897
CHRF: 31.496373383790598
CHRF: 25.226437977253436
CHRF: 18.91982848294008
CHRF: 15.135862786352067
CHRF: 17.601561727784915
CHRF: 11.734374485189946
CHRF: 8.800780863892463
CHRF: 6.343283582089558
CHRF: 4.228855721393042
CHRF: 1e-14
No output is identical to Evaluate's answer.
In the NLTK's source code, it slightly mentions about chrF++ only:
https://www.nltk.org/_modules/nltk/translate/chrf_score.html#sentence_chrf