Finding the nouns in a sentence given the context in Python

108 Views Asked by At

How to find the nouns in a sentence regarding the context? I am using the nltk library as follows:

text = 'I bought a vintage car.'
text = nltk.word_tokenize(text)
result = nltk.pos_tag(text)
result = [i for i in result if i[1] == 'NN']

#result = [('vintage', 'NN'), ('car', 'NN')]

The problem with this script is that it considers vintage as a noun, which can be true, but given the context, it is an adjective.

How can we achieve this task?

Appendix: Using textblob, we get "vintage car" as the noun:

!python -m textblob.download_corpora
from textblob import TextBlob
txt = "I bought a vintage car."
blob = TextBlob(txt)
print(blob.noun_phrases) #['vintage car']
2

There are 2 best solutions below

0
On BEST ANSWER

Using spacy might solve your task. Try this:

import spacy
nlp = spacy.load("en_core_web_lg")

def analyze(text):
    doc = nlp(text)
    for token in doc:
        print(token.text, token.pos_)

analyze("I bought a vintage car.")
print()
analyze("This old wine is a vintage.")

Output

I PRON
bought VERB
a DET
vintage ADJ <- correctly identified as adjective
car NOUN
. PUNCT

This DET
old ADJ
wine NOUN
is AUX
a DET
vintage NOUN  <- correctly identified as noun
. PUNCT
0
On

You can use spacy and noun_chunks to separate out the nouns:

import spacy # tested with version 3.6.1
nlp = spacy.load('en_core_web_sm')
doc = nlp('I bought a vintage car.')
noun = []
for chunk in doc.noun_chunks:
    for tok in chunk:
        if tok.pos_ == "NOUN":
            noun.append(tok.text)
print(noun)

which prints out:

['car']

However, if you wish to extract both the nouns and the adjectives you can try this as suggested here:

noun_adj_pairs = {}
for chunk in doc.noun_chunks:
    adj = []
    noun = ""
    for tok in chunk:
        if tok.pos_ == "NOUN":
            noun = tok.text
        if tok.pos_ == "ADJ" or tok.pos_ == "CCONJ": # accounts for both adjective and conjunctions
            adj.append(tok.text)
    if noun:
        noun_adj_pairs.update({noun:" ".join(adj)})

which prints out:

{'car': 'vintage'}