Does anyone have a Python dict mapping from the Wiktionary part-of-speech tags to the 12 universal part-of-speech tags, along with a rationale for the mapping?
The 12 universal tags are:
VERB - verbs (all tenses and modes)
NOUN - nouns (common and proper)
PRON - pronouns
ADJ - adjectives
ADV - adverbs
ADP - adpositions (prepositions and postpositions)
CONJ - conjunctions
DET - determiners
NUM - cardinal numbers
PRT - particles or other function words
X - other: foreign words, typos, abbreviations
. - punctuation
More on the Universal Part-of-Speech Tagset can be found here.
The Wiktionary tags are:
Adjective
Adverb
Ambiposition
Article
Circumposition
Classifier
Conjunction
Contraction
Counter
Determiner
Ideophone,
Interjection
Noun
Numeral
Participle
Particle
Postposition
Preposition
Pronoun
Proper noun
Verb
I looked at this question and did not find the mapping in nltk. Here is the mapping that I am using, however, there is ambuiguity in several selections and clarity on the choice of mappings would be appreciated.
MAPPING = {
"wiktionary_to_universal": {
"Adjective": "ADJ",
"Adverb": "ADV",
"Ambiposition": "ADP",
"Article": "DET",
"Circumposition": "ADP",
"Classifier": "ADJ",
"CONJ": "CONJ",
"Contraction": "X",
"Counter": "ADJ",
"Determiner": "DET",
"Ideophone": "X",
"Interjection": "X",
"Noun": "NOUN",
"Numeral": "NUM",
"Participle": "ADJ",
"Particle": "PRT",
"Postposition": "ADP",
"Preposition": "ADP",
"Pronoun": "PRON",
"Proper noun": "NOUN",
"VERB": "VERB"
}
}
MAPPING['wiktionary_to_universal']['Noun']
Out[22]: 'NOUN'