How to get a word's suffix after lemmatizing it (Python)?

190 Views Asked by At

I need to get the suffix of a word after lemmatizing it. I wonder if there is a way to "subtract" a lemma from a word and get the suffix as a result? I've tried re.sub but of course it works only in certain cases, namely those in which the lemma is to be found in the word (thus it doesn't work with "dancing", "ladies" etc.). I don't know if there is a better way to do this.

word = "produced"
lemma = lemmatizer_.lemmatize(word, "v")
suffix = re.sub(lemma, "", word)
suffix
1

There are 1 best solutions below

0
On

How about...

def get_suffix(word, lemma):
    cnt = sum(w==l for w, l in zip(word, lemma))
    return word[cnt:]
print('suffix:', get_suffix('dancing', 'dance'))
print('suffix:', get_suffix('ladies',  'lady'))

>> suffix: ing
>> suffix: ies

Obviously this won't work with irregular forms and there are probably some other corner cases, but for the basic endings it looks like it will work.