How to find perplexity of bigram if probability of given bigram is 0

1.7k Views Asked by At

Given the formula to calculate the perplexity of a bigram (and probability with add-1 smoothing),

enter image description here

Probability enter image description here

How does one proceed when one of the probabilities of the word per in the sentence to predict is 0?

# just examples, don't mind the counts
corpus_bigram = {'<s> now': 2, 'now is': 1, 'is as': 6, 'as one': 1, 'one mordant': 1, 'mordant </s>': 5}
word_dict = {'<s>': 2, 'now': 1, 'is': 6, 'as': 1, 'one': 1, 'mordant': 5, '</s>': 5}

test_bigram = {'<s> now': 2, 'now <UNK>': 1, '<UNK> as': 6, 'as </s>': 5}

n = 1 # Add one smoothing
probabilities = {}
for bigram in test_bigram:
    if bigram in corpus_bigram:
        value = corpus_bigram[bigram]
        first_word = bigram.split()[0]
        probabilities[bigram] = (value + n) / (word_dict.get(first_word) + (n * len(word_dict)))
    else:
        probabilities[bigram] = 0 

If for instance, the probabilities of the test_bigram come out as

# Again just dummy probability values
probabilities = {{'<s> now': 0.35332322, 'now <UNK>': 0, '<UNK> as': 0, 'as </s>': 0.632782318}}

perplexity = 1
for key in probabilities:
    # when probabilities[key] == 0 ????
    perplexity = perplexity * (1 / probabilities[key])

N = len(sentence)
perplexity = pow(perplexity, 1 / N)

ZeroDivisionError: division by zero

2

There are 2 best solutions below

0
On

The common solution is to assign words that don't occur a small probability, eg 1/N,with N being the number of words in total. So you pretend that a word that didn't occur in your data did occur once; that introduces only a minor error, but stops divisions by zero.

So in your case, probabilities[bigram] = 1 / <sum of all bigram frequencies>

0
On

When you have to smooth, smooth, don't talk. Translation: you have added smoothing but only for existing bigrams, while whole point of smoothing is to use it for not existing bigrams/words.

corpus_bigram = {'<s> now': 2, 'now is': 1, 'is as': 6, 'as one': 1, 'one mordant': 1, 'mordant </s>': 5}
word_dict = {'<s>': 2, 'now': 1, 'is': 6, 'as': 1, 'one': 1, 'mordant': 5, '</s>': 5}

test_bigram = {'<s> now': 2, 'now <UNK>': 1, '<UNK> as': 6, 'as </s>': 5}

n = 1 # Add one smoothing
probabilities = {}
for bigram in test_bigram:
    value = corpus_bigram.get(bigram,0)
    first_word = bigram.split()[0]
    probabilities[bigram] = (value + n) / (word_dict.get(first_word, 0) + (n * len(word_dict)))