How to find frequency of ten most common words in a file?

Question

How to find frequency of ten most common words in a file?

538 Views Asked by Stiff At 02 March 2021 at 16:57

I'm writing a function on Python that takes the name of a text file (as a string) as input. The function should first determine how many times each word appears in the file. Later, I will make a bar chart showing the frequency of the ten most common words in the file, and next to each bar is a second bar whose height is the frequency predicted by Zipf’s Law. I have some code for the graph already but I need help with finding the most common words in a text file.

def zipf_graph(text_file):
    import string
    file = open(text_file, encoding = 'utf8')
    text = file.read()
    file.close()

    #the following strips and removes punctuation and makes the words lowercase
    punc = string.punctuation + '’”—⎬⎪“⎫'
    new_text = text
    for char in punc:
        new_text = new_text.replace(char,'')
        new_text = new_text.lower()
    text_split = new_text.split()

I'm stuck here, I'm trying to find the most common strings in a list but I'm not sure where to go from here, the following is what I tried:

    words = text_split
    most_common = max(words, key = words.count)
    # print(most_common)

I also want to add the following code as it was suggested to help

    # Sorting a list by frequency
    # Assumes you have your elements as (word, frequency) tuples
    # (Useful for the zipf function)
    words = [('the', 1), ('and', 1), ('test',2)]
    sorted(words, key = lambda x: x[1], reverse = True)

    # "Sorting" a dictionary by frequency
    # Assumes you have your elements as word:frequency
    # (Useful for the zipf function)
    words = dict()
    words['the'] = 1
    words['and'] = 1
    words['test'] = 2

    # This returns a list of just the most common words without their frequencies
    most_common_words = sorted(words, key = words.get, reverse = True)
    # print(most_common_words)

    # We can go back to the dictionary to get the frequencies
    for word in most_common_words:
        print(word, words[word])

zipf_graph('fortune.txt') #name of the file I chose to use

Original Q&A

There are 2 best solutions below

**Juip** · Answer 1 · 2021-03-02T17:11:27.903000

You can use the nltk library:

import nltk
words = ['words', 'in', 'the', 'file']
fd = nltk.FreqDist(words)
fd.most_common(10)

will give values in the format:

[('file', 1), ('words', 1), ('in', 1), ('the', 1)]

**Nerveless_child** · Answer 2 · 2021-03-02T17:14:25.450000

I would suggest you use Counter from collections.

from collections import Counter

text_split = ["a", "b", "c", "a", "c", "d", "a", "d", "b"]
word_and_freq = Counter(text_split)
top = word_and_freq.most_common(2)

print(top)

Interestingly, this returns the format you want.

[("a", 3), ("b", 2)]

How to find frequency of ten most common words in a file?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in ZIPF

Trending Questions

Popular # Hahtags

Popular Questions