Infinite Loop not breaking out of nested for loop

472 Views Asked by At

My apologies for the length and for what is I'm sure a simple problem.

I am carrying out a word search using the fuzzysearch and Pyenchant modules.

I am trying to convert a nested for loop method to one that uses a for and while loop. I am failing, getting an infinite loop.

The basic method is as follows

For a word in a sentence, check if the word exists in some dictionary.

If the word does not exist, generate word suggestions using Enchant.

For each suggestion, calculate the ratio of similarity between the generated word and the original word.

If the ratio is high enough, stop generating suggestions, and add the new word to a list.

Do this for all words in the sentence

For example, for the string: A = 'A lvels in Mths and Histtory

The Output would be: ['A','levels','in','maths','and','history']

I have managed to get this to work in the following way:

# Imports:
from fuzzywuzzy import fuzz
import enchant
from enchant.tokenize import get_tokenizer
tknzr = get_tokenizer("en")
d = enchant.Dict("en")

A = 'A lvels in Mths and Histtory'

B = []

# Loop through all words in A
for word, position in tknzr(A):

    # Does word exist in Dictionary?
    if d.check(word) is False:

        # generate a suggestion and calculate the 'fuzz factor'
        for suggestion in d.suggest(word):
            ratio = fuzz.ratio(suggestion, word)

            # If ratio is high enough, add word to list B and move to next word in sentence
            if ratio > 75:
                B.append(suggestion)
                break

# If word is in dictionary, just add
    else:
        B.append(word)

Out >>> ['A','levels','in','maths','and','history']

Good so far.

I would like to convert the above into one that uses a while and for loop. This would then be of the form: Generate new words, until you reach some threshold.

I tried the following:

for word,position in tknzr(A):

    if d.check(word) is False:

        ratio = 0
        while ratio <= 75:

            for suggestion in d.suggest(word):

                print "Suggestion: ", suggestion
                ratio = fuzz.ratio(suggestion, word)

                B.append(word)
    else:
        B.append(word)

This however gives me an infinite loop of suggestions for the word Histtory.

Out >>> Suggestion:  History
Out >>> Suggestion:  Historicity
Out >>> Suggestion:  Historic
Out >>> Suggestion:  Historian
Out >>> Suggestion:  Sophistry
Out >>> Suggestion:  Histrionic
Out >>> Suggestion:  Histogram
Out >>> The above forever

The problem is as follows:

The for suggestion in d.suggest(word): loop will always run to completion, before the higher while loop can check the ratio values.

This means that the final ratio value checked, is that of the last word suggested e.g. for Histtory it is the ratio of Hisstory and Histogram. As this is <75, the while loop condition is still true, and so repeats forever. I cannot for the life of me, figure out how to fix it.

How can I change this code so that it works as in the first example? I should add: My goal here is speed: I will be evaluating 10s of millions of sentences.

Many thanks for your reading.

0

There are 0 best solutions below