Find lowest common hypernym given multiple words in WordsNet (Python)

3.1k Views Asked by At

If I have a list of words in python such as:

words = ["blue", "red", "ball"]

Is there a way to programmatically produce the hypernyms for this group of words using WordNet?

1

There are 1 best solutions below

0
On BEST ANSWER

Firstly, see https://stackoverflow.com/a/29478711/610569 to note the difference between "sense" (synset/concept) vs "words" (in the context of wordnet, lemmas).

Given two synsets (NOT words), it is possible to find the lowest common hypernym between them:

>>> from nltk.corpus import wordnet as wn

# A word can represent multiple meaning (aka synsets)
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

>>> wn.synsets('cat')
[Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')]

>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

>>> wn.synsets('cat')[0].definition()
u'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

>>> dog = wn.synsets('dog')[0]
>>> cat = wn.synsets('cat')[0]


>>> cat.lowest_common_hypernyms(dog)
[Synset('carnivore.n.01')]

See http://www.nltk.org/howto/wordnet_lch.html

Is lowest common hypernyms reliable?

Wordnet is a hand-crafted resource, so how reliable it depends on why and how the synset was created among the whole WordNet ontology

Can I use this information for my NLP task?

Perhaps... But most probably, it's not useful.

Can it compare more than 2 synsets?

Not exactly. You have to do multiple pairwise search, e.g.

>>> mouse = wn.synsets('mouse')[0]
>>> cat = wn.synsets('cat')[0]
>>> dog = wn.synsets('dog')[0]

>>> dog.lowest_common_hypernyms(cat)
[Synset('carnivore.n.01')]
>>> cat.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]
>>> dog.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]

>>> placental = dog.lowest_common_hypernyms(mouse)[0]
>>> carnivore = dog.lowest_common_hypernyms(cat)[0]
>>> placental.lowest_common_hypernyms(carnivore)
[Synset('placental.n.01')]

But you can see how inefficient it is. So it's easier if you rewrite your own code to traverse the WordNet ontology and find the lowest common hypernym given N no. of synsets instead of doing it pairwise.