division by zero in calculating TF-IDF algorithm for keyword-extraction

196 Views Asked by At

I wrote a code based on the TF-IDF algorithm to extract keywords from a very large text. The problem is that I keep getting the division by zero error. When I debug my code, everything is working perfectly. As soon as I make the text shorter to contains the word that causes the problem, it works. So, I assume that it's a memory problem.

I thought maybe I could read the big text file in chunks (1KB) instead of reading the whole document in the first place. Unfortunately, it does not work. what should I do? (I am using pycharm on windows)

I am a beginner in programming, python, and NLP domain. Therefore, I really appreciate it if you could help me here.

if __name__ == "__main__":
 with open('spli.txt') as f:
    for piece in read_in_chunks(f):
        #print(piece)
        piece = piece.lower()
        no_punc_words, all_words = text_split(piece)
        no_punc_words, all_words = rm_stop_word(no_punc_words, all_words)
        no_punc_words_freq, all_words_freq = calc_freq(no_punc_words, all_words)
        tf_score = calc_tf_score(no_punc_words_freq)
        idf_score = calc_idf_score(no_punc_words_freq, all_words_freq, piece)
        tf_idf_score = {}
        for k in tf_score:
           tf_idf_score[k] = tf_score[k] * idf_score[k]
           #print(final_score)
    final_tf_idf = {}
    for scores in tf_idf_score:
        final_tf_idf += tf_idf_score
        print(final_tf_idf)
0

There are 0 best solutions below