I have a list of sentences in a csv file. Now I need to lemmatize these sentences and extract those containing certain keywords.
import wordnet, nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import word_tokenize
import pandas as pd
import csv
# define lemmatizer
lemmatizer = WordNetLemmatizer()
result = []
# define wordlist
wordlist = [wordlist]
def get_wordnet_pos(word):
tag = nltk.pos_tag([word])[0][1][0].upper()
tag_dict = {"J": wordnet.ADJ,
"N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}
return tag_dict.get(tag, wordnet.NOUN)
with open (filepath) as f:
for line in f:
lemmatized = [[lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in word_tokenize(s)]
for s in line]
By far it works and I got lemmatized sentences. But when I conducted the search and extraction with my word list, I cannot get expected results.
if any(x in lemmatized for x in wordlist):
result.append(line)
print(result)
Is there something wrong with the loop?