I am working on a dataset that requires extracting all the words that are adjectives, verbs, and adverbs from each sentence of a data frame column.
This is a sample I was working on to figure out how I could get the desired output.
list1=['good','excellent','was','not']
for i in list1:
x=nltk.pos_tag([i])
#print(x)
if (x[0][1] == "JJ" or x[0][1] == "JJS" or x[0][1] == "RB" or x[0][1] == "VB" or x[0][1] == "RBR" or x[0][1] == "RBS" or x[0][1] == "VBN" or x[0][1] == "VBP"):
print(x)
The output it is giving me is:
[('good','JJ')]
[('not','RB')]
The output I need to get is something like this:
good not
Can anyone please help?
You have to be a little more specific about what you want to really extract:
But here's an attempt.
It seems you're trying to extract verb phrases with adjective/adverbs, if so you can try:
But that outputs:
is not
andis not good
!!Hmmm, in that case, do you want to exact
not good
oris not good
?If it's the
is not good
trigram, then try:What if I just want
not good
?Maybe try removing the verbs? E.g.