I'm using nltk to find the parts of speech for each word in a sentence. It returns abbreviations that I both can't fully intuit and can't find good documentation for.
Running:
import nltk
sample = "There is no spoon."
tokenized_words = nltk.word_tokenize(sample)
tagged_words = nltk.pos_tag(tokenized_words)
print tagged_words
Returns:
[('There', 'EX'), ('is', 'VBZ'), ('no', 'DT'), ('spoon', 'NN'), ('.', '.')]
In the above example, I'm looking for what DT
, EX
, and the rest mean.
The best I have so far is to search for mentions of the abbreviations of concern in Natural Language Processing with Python, but there has to be something better. I did also find a few literature-based resources, but I don't know how to tell which nltk is using.
The link that you have already mentioned has two different tagsets.
In this particular example, these tags are from Penn Treebank tagset.
You can also read about these tags by: