lemmatize plural nouns using nltk and wordnet

8.8k Views Asked by User0 At 24 June 2015 at 02:22

I want to lemmatize using

from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)

def get_wordnet_pos(treebank_tag):
        #maps pos tag so lemmatizer understands
        from nltk.corpus import wordnet
        if treebank_tag.startswith('J'):
            return wordnet.ADJ
        elif treebank_tag.startswith('V'):
            return wordnet.VERB
        elif treebank_tag.startswith('N'):
            return wordnet.NOUN
        elif treebank_tag.startswith('R'):
            return wordnet.ADV
        else:
            return wordnet.NOUN
 lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))

The issue is that the POS tagger gets that "procaspases" is 'NNS', but how do I convert NNS to wordnet, since as is "procaspases" continues to be "procaspaseS" even after the lemmatizer.

Original Q&A

There are 2 best solutions below

Charles J. Daniels On 03 July 2015 at 17:46

I can easily lemmatize things using wordnet.morphy:

>>> from nltk.corpus import wordnet
>>> wordnet.morphy('cats')
u'cat'

Note that procaspases is not in WordNet (caspases is however and morphy will give caspase as lemma), and likely your lemmatizer just simply doesn't recognize it. If you are not having issues lemmatizing other words, it's likely just foreign to the implementation.

justhelping On 08 December 2016 at 00:41

NLTK takes care of most plurals, not just by deleting an ending 's.'

import nltk
from nltk.stem.wordnet import WordNetLemmatizer

Lem = WordNetLemmatizer()

phrase = 'cobblers ants women boys needs finds binaries hobbies busses wolves'

words = phrase.split()
for word in words :
  lemword = Lem.lemmatize(word)
  print(lemword)

Output: cobbler ant woman boy need find binary hobby bus wolf

lemmatize plural nouns using nltk and wordnet

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in WORDNET

Related Questions in LEMMATIZATION

Trending Questions

Popular # Hahtags

Popular Questions