The production rules of a context free grammar are formalised as pairs, just a set of relations...
(α,β) ∈ R
where α is a non-terminal and β is either a terminal or a non-terminal.
thus S → A could be written as (S,A) ∈ R
But when parsing tagged natural language trees for probabilitic CFG's. Many of there rules are of the form:
NP → NNP POS
that is, the right hand side is not always a single terminal or non-terminal
Is there a way of formalising these production rules? As I can't see the relation method working...
unless they were perhaps more like (NP → NNP) → POS
Or is it that they are not the exact production rules,
A context-free grammar is defined by a four-tuple
(V, T, P, S)
:v → ω
wherev ∈ V
andω ∈ (V ⋃ T)*
V
, the start symbolTechnically, you could derive
V
andT
fromP
. However, everyone does roughly as above (with some variation of names, and occasionally usingV
andV ⋃ T
as primitives instead ofV
andT
).The important point (in bold above) is that the right-hand side of a production is not "a terminal or a non-terminal" but rather "an element of
(V ⋃ T)*
". If you couldn't expand a non-terminal into more than one symbol, your language would only consist of single-element strings.