Log probability in the Viterbi algorithm (handling zero probabilities)

966 Views Asked by At

I am coding a probabilistic part of speech tagger in Python using the Viterbi algorithm. In this context, the Viterbi probability at time t is the product of the Viterbi path probability from the previous time step t-1, the transition probability from the previous POS tag to the current POS tag and the emission probability of the observed word given the POS tag. I am computing this by at each word in the sentence looking at each tag/state that is possible at this time (from the observed training data) and then for each possible tag computing the most likely path to this state given by the Viterbi probability explained above. A straightforward psuedo-code implementation can be found here.

One practical problem of repeatedly multiplying probabilities is that it may lead to underflow. One solution that is often proposed in the literature is to use log probabilities. As far as I understand, you should do:

current_probability = math.log(emission_probability) + math.log(transition_probability) + previous_probability

instead of

current_probability = emission_probability * transition_probability * previous_probability

given that the three probabilities are stored in the corresponding variables.

One problem I am finding hard to grasp, however, is what to do when the emission probability or the transition probability is 0. Michael Collins writes: "One issue with the new algorithm is that log 0 = −∞. Some care is required to deal with probability values equal to 0. A simple solution is to set log 0 = −B where B is a large number."

What large number should I use? -math.inf?

1

There are 1 best solutions below

1
On

one way to handle this is to add some smoothing like https://i.stack.imgur.com/LdNgP.png where,

local variables

word_tag : count of word in training corpus. tag_total : Number of words which tagged as given as TAG i.e.P(word|TAG)