How to make a spacy matcher pattern using verb tense/mood?

1k Views Asked by At

I've been trying to make a specific pattern for a spacy matcher using Verbs tenses and moods.
I found out how to access morphological features of words parsed with spacy using model.vocab.morphology.tag_map[token.tag_], which prints out something like this when the verb is in subjunctive mode (the mode I am interested in):

{'Mood_sub': True, 'Number_sing': True, 'Person_three': True, 'Tense_pres': True, 'VerbForm_fin': True, 74: 100}

however, I would like to have a pattern like this one to retokenize specific verb phrases: pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}]

In the case of a spanish phrase like: 'Que siga aprendiendo', 'siga' has 'Mood_sub' = True in its tag, and 'aprendiendo' has 'VerbForm_ger' = True in its tag. However, the matcher is not detecting this match.

Can anyone tell me why this is and how I could fix it? This is the code I am using:

model = spacy.load('es_core_news_md')
text = 'Que siga aprendiendo de sus alumnos'
doc = model(text)
pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}] 
matcher.add(1, None, pattern)
matches = matcher(doc)
for i, start, end in matches:
    span = doc[start:end]
    if len(span) > 0:
       with doc.retokenize() as retokenizer:
            retokenizer.merge(span)
1

There are 1 best solutions below

3
On

The morph support isn't fully implemented in spacy v2, so this is not possible using the direct morph values like Mood_sub.

Instead, I think the best option with the Matcher to is use REGEX over the combined/extended TAG values. It's not going to be particularly elegant, but it should work:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('es_core_news_sm')
doc = nlp("Que siga aprendiendo de sus alumnos")
assert doc[1].tag_ == "AUX__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin"
matcher = Matcher(nlp.vocab)
matcher.add("MOOD_SUB", [[{"TAG": {"REGEX": ".*Mood=Sub.*"}}]])
assert matcher(doc) == [(513366231240698711, 1, 2)]