In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities

84 Views Asked by Albert Gevorgyan At 22 July 2023 at 13:54

NLTK has a generate method which enumerates sentences for a given CFG. It also has a PCFG class for probabilistic context-free grammars. Is it possible generate a sample of sentences with respect to probabilities defined in PCFG?

For example, if I try to generate sentences for a single production rule with probabilities, I simply get an exhaustive list where each sentence is unique:

pcfg = PCFG.fromstring("S -> 'a' [0.7] | 'b' [0.3]") 
list(generate(pcfg, n=10))

Out: [['a'], ['b']]

However, what I would like to get is something like this:

list(sample(pcfg, n=10))

Out: [['a'], ['a'], ['a'], ['a'], ['a'], ['a'], ['a'], ['b'], ['b'], ['b']]

Obviously, this example is contrived. But with complex enough grammars such a method would be very useful to sample natural language utterances.

Original Q&A

In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities

There are 0 best solutions below

Related Questions in NLP

Related Questions in NLTK

Related Questions in CONTEXT-FREE-GRAMMAR

Related Questions in LINGUISTICS

Related Questions in LARGE-LANGUAGE-MODEL

Trending Questions

Popular # Hahtags

Popular Questions