Nltk .most_common(), what is the order it is returned in?

1k Views Asked by RyanKilkelly At 07 June 2025 at 13:31

I have found the frequecny of bigrams in certain sentences using:

import nltk 
from nltk import ngrams
mydata = “xxxxx"
mylist = mydata.split()
mybigrams =list(ngrams(mylist, 2))
fd = nltk.FreqDist(mybigrams)
print(fd.most_common())

On printing out the bigrams with the most common frequencies, one occurs 7 times wheras all 95 other bigrams only occur 1 time. However when comparing the bigrams to my sentences I can see no logical order to the way the bigrams all of frequency 1 are printed out. Does anyone know if there is any logic to the way .most_common() prints the bigrams or is it randomly generated

Thanks in advance

Original Q&A

There are 1 best solutions below

Stefanus On 15 May 2016 at 16:45

Short answer, based on the documentation of collections.Counter.most_common:

Elements with equal counts are ordered arbitrarily:

In current versions of NLTK, nltk.FreqDist is based on nltk.compat.Counter. On Python 2.7 and 3.x, collections.Counter will be imported from the standard library. On Python 2.6, NLTK provides its own implementation.

For details, look at the source code:
https://github.com/nltk/nltk/blob/develop/nltk/compat.py

In conclusion, without checking all possible version configurations, you cannot expect words with equal frequency to be ordered.

Nltk .most_common(), what is the order it is returned in?

There are 1 best solutions below

Related Questions in NLTK

Related Questions in N-GRAM

Related Questions in FREQUENCY-ANALYSIS

Related Questions in HUMAN-LANGUAGE

Trending Questions

Popular # Hahtags

Popular Questions