I have created the following code to recognize a grammar consisting of a verb folowed by one or more determiners and then one or more nouns. The grammar will not recognize a second noun as being in the grammar (example phrase: "monitoring a parking space"):
Testing sentence in grammar: monitoring a parking space
Grammar Chunk:
(S (MT monitoring/VBG a/DT parking/NN) (MT space/NN))
False
Here is the code used in Python 3.5.6:
import nltk
def extractMT(sent):
grammar = r"""
MT:
{<VBG|VBZ|VB>?<DT>?<NN|NNS>}
"""
chunker = nltk.RegexpParser(grammar)
ne = set()
chunk = chunker.parse(nltk.pos_tag(nltk.word_tokenize(sent)))
print("Grammar Chunk: ")
print(chunk)
for tree in chunk.subtrees(filter=lambda t: t.label() == 'MT'):
returnList = []
for child in tree.leaves():
returnList.append(child[0])
ne.add(' '.join(returnList))
return ne
testSentence1 = "monitoring a parking space"
print ("Testing sentence in grammar: " + testSentence1)
print ("Is sentence in grammar?: " + testSentence1 in extractMT(testSentence1))
Like in standard
regex
to get many elements you need+
(which meansone or more
) or*
(which meanszero or more
)You can also use
{,2}
to get0
,1
or2
elements, or{1,2}
get1
or2
elements, or{2}
to get exactly2
elements