How to lemmatize a list of sentences and extract sentences that contain certain words?

229 Views Asked by At

I have a list of sentences in a csv file. Now I need to lemmatize these sentences and extract those containing certain keywords.

    import wordnet, nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import word_tokenize
import pandas as pd
import csv

# define lemmatizer
lemmatizer = WordNetLemmatizer()
result = []

# define wordlist
wordlist = [wordlist]


def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
               "N": wordnet.NOUN,
               "V": wordnet.VERB,
               "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)


with open (filepath) as f: 
    for line in f:
        lemmatized = [[lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in word_tokenize(s)]
                      for s in line]

By far it works and I got lemmatized sentences. But when I conducted the search and extraction with my word list, I cannot get expected results.

if any(x in lemmatized for x in wordlist): 
            result.append(line)
            print(result)

Is there something wrong with the loop?

0

There are 0 best solutions below