Nltk .most_common(), what is the order it is returned in?

1k Views Asked by At

I have found the frequecny of bigrams in certain sentences using:

import nltk 
from nltk import ngrams
mydata = “xxxxx"
mylist = mydata.split()
mybigrams =list(ngrams(mylist, 2))
fd = nltk.FreqDist(mybigrams)
print(fd.most_common())

On printing out the bigrams with the most common frequencies, one occurs 7 times wheras all 95 other bigrams only occur 1 time. However when comparing the bigrams to my sentences I can see no logical order to the way the bigrams all of frequency 1 are printed out. Does anyone know if there is any logic to the way .most_common() prints the bigrams or is it randomly generated

Thanks in advance

1

There are 1 best solutions below

0
On

Short answer, based on the documentation of collections.Counter.most_common:

Elements with equal counts are ordered arbitrarily:

In current versions of NLTK, nltk.FreqDist is based on nltk.compat.Counter. On Python 2.7 and 3.x, collections.Counter will be imported from the standard library. On Python 2.6, NLTK provides its own implementation.

For details, look at the source code:
https://github.com/nltk/nltk/blob/develop/nltk/compat.py

In conclusion, without checking all possible version configurations, you cannot expect words with equal frequency to be ordered.