Why does Kenlm lm model keep returning the same score for different words?

456 Views Asked by sourabh gupta At 08 September 2021 at 19:42

Why is the kenlm model returning the same values? I have tried it with a 4-gram arpa file as well. same issue.

import kenlm
model = kenlm.mode('lm/test.arpa') # unigram model. 

print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a sentence', bos=False, eos=False)])
print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a sentence1', bos=False, eos=False)])
print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a devil', bos=False, eos=False)])

Result:

['-2.00, 1, True', '-21.69, 1, False', '-1.59, 1, False', '-2.69, 1, True']

Original Q&A

There are 1 best solutions below

sourabh gupta On 09 September 2021 at 23:12

Figured it out by myself.

The True/False in the output tells you whether a word is OOV (out of vocabulary) or not. The KenLM model assigns a fixed probability to these words. In the examples in the questions, all the last words are OOVs.

Why does Kenlm lm model keep returning the same score for different words?

There are 1 best solutions below

Related Questions in LM

Related Questions in KENLM

Trending Questions

Popular # Hahtags

Popular Questions