Abbreviation Reference for NLTK Parts of Speech

3.1k Views Asked by At

I'm using nltk to find the parts of speech for each word in a sentence. It returns abbreviations that I both can't fully intuit and can't find good documentation for.

Running:

import nltk
sample = "There is no spoon."
tokenized_words = nltk.word_tokenize(sample)
tagged_words = nltk.pos_tag(tokenized_words)
print tagged_words

Returns:

[('There', 'EX'), ('is', 'VBZ'), ('no', 'DT'), ('spoon', 'NN'), ('.', '.')]

In the above example, I'm looking for what DT, EX, and the rest mean.

The best I have so far is to search for mentions of the abbreviations of concern in Natural Language Processing with Python, but there has to be something better. I did also find a few literature-based resources, but I don't know how to tell which nltk is using.

1

There are 1 best solutions below

1
On BEST ANSWER

The link that you have already mentioned has two different tagsets.

For tagset documentation, see nltk.help.upenn_tagset() and nltk.help.brown_tagset().

In this particular example, these tags are from Penn Treebank tagset.

You can also read about these tags by:

nltk.help.upenn_tagset('DT')
nltk.help.upenn_tagset('EX')