In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities

84 Views Asked by At

NLTK has a generate method which enumerates sentences for a given CFG. It also has a PCFG class for probabilistic context-free grammars. Is it possible generate a sample of sentences with respect to probabilities defined in PCFG?

For example, if I try to generate sentences for a single production rule with probabilities, I simply get an exhaustive list where each sentence is unique:

pcfg = PCFG.fromstring("S -> 'a' [0.7] | 'b' [0.3]") 
list(generate(pcfg, n=10))

Out: [['a'], ['b']]

However, what I would like to get is something like this:

list(sample(pcfg, n=10))

Out: [['a'], ['a'], ['a'], ['a'], ['a'], ['a'], ['a'], ['b'], ['b'], ['b']]

Obviously, this example is contrived. But with complex enough grammars such a method would be very useful to sample natural language utterances.

0

There are 0 best solutions below