Given bigram probabilities
for words in a text, how would one compute trigram probabilities
?
For example, if we know that P(dog cat) = 0.3
and P(cat mouse) = 0.2
how do we find the probability of P(dog cat mouse)
?
Thank you!
Given bigram probabilities
for words in a text, how would one compute trigram probabilities
?
For example, if we know that P(dog cat) = 0.3
and P(cat mouse) = 0.2
how do we find the probability of P(dog cat mouse)
?
Thank you!
Copyright © 2021 Jogjafile Inc.
In the following I consider a trigram as three random variables
A,B,C
. Sodog cat horse
would beA=dog, B=cat, C=horse
.Using the chain rule:
P(A,B,C) = P(A,B) * P(C|A,B)
. Now your stuck if you want to stay exact.What you can do is assuming
C
is independent ofA
givenB
. Then it holds thatP(C|A,B) = P(C|B)
. AndP(C|B) = P(C,B) / P(B)
, which you should be able to compute from your trigram frequencies. Note that in your caseP(C|B)
should really be the probability ofC
following aB
, so it's the probability of aBC
divided by the probability of aB*
.So to sum it up, when using the conditional independence assumption:
And to compute
P(B*)
you have to sum up the probabilities for all trigrams beginning withB
.