How to get the wordnet sense frequency of a synset in NLTK?

8.9k Views Asked by alvas At 21 March 2013 at 15:06

According to the documentation i can load a sense tagged corpus in nltk as such:

>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')

I can also get the definition, pos, offset, examples as such:

>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition

But how can get the frequency of a synset from a corpus? To break down the question:

first how to count many times did a synset occurs a sense-tagged corpus?
then the next step is to divide by the the count by the total number of counts for all synsets occurrences given the particular lemma.

Original Q&A

There are 2 best solutions below

alvas On 21 March 2013 at 15:21 BEST ANSWER

I managed to do it this way.

from nltk.corpus import wordnet as wn

word = "dog"
synsets = wn.synsets(word)

sense2freq = {}
for s in synsets:
  freq = 0  
  for lemma in s.lemmas:
    freq+=lemma.count()
  sense2freq[s.offset+"-"+s.pos] = freq

for s in sense2freq:
  print s, sense2freq[s]

alvitawa On 14 May 2019 at 13:21

If you only need to know what the most frequent word is, you can do wn.synsets(word)[0] since WordNet generally ranks them from most frequent to least frequent.

(source: Daniel Jurafsky's Speech and Language Processing 2nd edition)

How to get the wordnet sense frequency of a synset in NLTK?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in NLTK

Related Questions in WORDNET

Related Questions in WSD

Trending Questions

Popular # Hahtags

Popular Questions