Mapping from Wiktionary part-of-speech tags to 12 universal part-of-speech tags

86 Views Asked by At

Does anyone have a Python dict mapping from the Wiktionary part-of-speech tags to the 12 universal part-of-speech tags, along with a rationale for the mapping?

The 12 universal tags are:

VERB - verbs (all tenses and modes)
NOUN - nouns (common and proper)
PRON - pronouns 
ADJ - adjectives
ADV - adverbs
ADP - adpositions (prepositions and postpositions)
CONJ - conjunctions
DET - determiners
NUM - cardinal numbers
PRT - particles or other function words
X - other: foreign words, typos, abbreviations
. - punctuation

More on the Universal Part-of-Speech Tagset can be found here.

The Wiktionary tags are:

Adjective
Adverb
Ambiposition
Article
Circumposition
Classifier
Conjunction 
Contraction
Counter
Determiner
Ideophone,  
Interjection  
Noun
Numeral
Participle
Particle
Postposition 
Preposition 
Pronoun
Proper noun
Verb

I looked at this question and did not find the mapping in nltk. Here is the mapping that I am using, however, there is ambuiguity in several selections and clarity on the choice of mappings would be appreciated.

MAPPING = {
    "wiktionary_to_universal": {
        "Adjective": "ADJ",
        "Adverb": "ADV",
        "Ambiposition": "ADP",
        "Article": "DET",
        "Circumposition": "ADP",
        "Classifier": "ADJ",
        "CONJ": "CONJ",
        "Contraction": "X",
        "Counter": "ADJ",
        "Determiner": "DET",
        "Ideophone": "X",
        "Interjection": "X",
        "Noun": "NOUN",
        "Numeral": "NUM",
        "Participle": "ADJ",
        "Particle": "PRT",
        "Postposition": "ADP",
        "Preposition": "ADP",
        "Pronoun": "PRON",
        "Proper noun": "NOUN",
        "VERB": "VERB"
    }
}
MAPPING['wiktionary_to_universal']['Noun']
Out[22]: 'NOUN'
0

There are 0 best solutions below