I want to lemmatize using
from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)
def get_wordnet_pos(treebank_tag):
#maps pos tag so lemmatizer understands
from nltk.corpus import wordnet
if treebank_tag.startswith('J'):
return wordnet.ADJ
elif treebank_tag.startswith('V'):
return wordnet.VERB
elif treebank_tag.startswith('N'):
return wordnet.NOUN
elif treebank_tag.startswith('R'):
return wordnet.ADV
else:
return wordnet.NOUN
lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))
The issue is that the POS tagger gets that "procaspases" is 'NNS', but how do I convert NNS to wordnet, since as is "procaspases" continues to be "procaspaseS" even after the lemmatizer.
I can easily lemmatize things using wordnet.morphy:
Note that procaspases is not in WordNet (caspases is however and morphy will give caspase as lemma), and likely your lemmatizer just simply doesn't recognize it. If you are not having issues lemmatizing other words, it's likely just foreign to the implementation.