StanfordNLP POS giving mixed results

110 Views Asked by At

I was testing Stanford NLP POS Tagger, I am getting mixed results.

SOP(StanfordNLP.getInstance().getPOSMap("WHEAT flour(whole)".toLowerCase()));
SOP(StanfordNLP.getInstance().getPOSMap("Whole wheat flour".toLowerCase()));

Gives me the following output

{NN=[wheat, flour, whole]}
{JJ=[whole], NN=[wheat, flour]}

How can I deal with problems like these? Its actually the same words rearranged.

EDIT

Maybe, I should explain the problem.

I want to compare 2 sentences. My approach is perform POS on both string and then compare and score individually Nouns/Adjectives/Verbs from both strings.

But because of fuzzy tagging (as also reffered to by @Elliott) based on order of words, my ranking fails in some cases. Can someone suggest a workaround?

Is there a classification statistics which gives the probability of a Noun classified as Adjective or Verb etc, that i can use in my scoring algo to provide weights?

thanks Chahat

2

There are 2 best solutions below

1
On

POS taggers always give mixed results; the POS tagging is contextual since a word can be a noun, adjective, or verb in different contexts. The AI component of POS tagging decides how to tag words based on their order in the sentence.

2
On

Stanford POS Tagger is pretty good. If however you want to easily see side by side comparisons with standard NLTK and other quality tagger called Senna you could try this: https://github.com/StealthyK/TaggerTimer