Spacy pattern exception case based on verb form

184 Views Asked by At

I'm trying to make a spacy pattern that recognizes when a noun is followed by an adjective, which I have as follows:

pattern = [{'POS':'NOUN'}, {'POS':'ADJ'}]

however, I am trying to make a case exception where the adjective is not a participle form of a verb. My examples are in spanish, so I apologize. For example, I want to find and retokenize things like 'institución educativa' but not 'institución comprometida', as 'comprometida' has the VerbForm_part=True in its tag.

I tried adding the following, but it only made the pattern stop working alltogether in cases like 'institución educativa': pattern = [{'POS':'NOUN'}, {'OP':'!', 'TAG':'VerbForm_part'}, {'POS':'ADJ'}]

I also tried: pattern = [{'POS':'NOUN'}, {'POS':'ADJ', 'TAG': not 'VerbForm_part'}]

In summary, I need to group together nouns followed by adjectives, but only SOME types of adjectives, and excluding others based on their TAG attribute 'VerbForm_part'

Is there any way to do this in Spacy? Does it support exceptions in its patterns?

Thank you!

1

There are 1 best solutions below

0
On

I found a solution, which was by defining my own matcher and using it to retokenize when it found matches:

def my_matcher(doc):
    for i in range(0, len(doc)-1):
        if i < len(doc)-1:
            token = doc[i]
            token2 = doc[i+1]
            if token.pos_ == 'NOUN':
                if token2.pos_ == 'ADJ' and 'VerbForm_part' not in model.vocab.morphology.tag_map[token2.tag_].keys():
                    span = Span(doc, i, i+2)
                    print(span)
                    with doc.retokenize() as retokenizer:
                        retokenizer.merge(span)

my_matcher(doc)

If anyone can improve upoon this, or tell me if spacy supports this, it would be greatly appreciated!