In order to understand cyk algorithm I've worked through example on : https://www.youtube.com/watch?v=VTH1k-xiswM&feature=youtu.be .
The result of which is :
How do I extract the probabilities associated with each parse and extract the most likely parse tree ?
These are two distinct problems for PCFG:
The CKY algorithm video linked in the question only deals with the recognition problem. To perform the parsing problem simultaneously, we need to (i) maintain the score of each parsing item and (ii) keep track of the hierarchical relationships (e.g. if we use the rule S -> NP VP: we must keep track of which NP and which VP are used to predict S).
Notations:
[X, i, j]: s
means that there is a node labelled X spanning token i (included) to j (excluded) with score s. The score is the log probability of the subtree rooted inX
.w_1 w_2 ... w_n
.At an abstract level, PCFG parsing can be formulated as a set of deduction rules:
Scan Rule (read tokens)
Gloss: if there is a rule
X -> w_i
in the grammar, then we can add the item[X, i, i+1]
in the chart.Complete Rule (create a new constituent bottom up)
Gloss: if there are 2 parsing items
[X, i, k]
and[Y, k, j]
in the chart, and a ruleZ -> X Y
in the grammar, then we can add[Z, i, j]
to the chart.The goal of weighted parsing is to deduce the parsing item
[S, 0, n]:s
(S
: axiom,n
: length of sentence) with the highest scores
.Pseudo code for whole algorithm
Some notes: