Python - importing 127,000+ words to a list, but function only returning partial results

76 Views Asked by At

this function is meant to compare all 127,000 + words imported from a dictionary file to a user inputed length. It then should return the amount of words that are equal to that length. It does do this to an extent.

If I enter "15" it returns "0". If I enter "4" it returns "3078".

I am positive that there are words that are 15 characters in length but it returns "0" anyways. I should also mention that if I enter anything greater that 15 the result is still 0 when there is words greater that 15.

try:
    dictionary = open("dictionary.txt")
except:
    print("Dictionary not found")
    exit()


def reduceDict():
    first_list = []

    for line in dictionary:
       line = line.rstrip()
       if len(line) == word_length:
           for letter in line:
               if len([ln for ln in line if line.count(ln) > 1]) == 0:
                   if first_list.count(line) < 1:
                       first_list.append(line)
               else:
                    continue
    if showTotal == 'y':
       print('|| The possible words remaing are: ||\n ',len(first_list))
2

There are 2 best solutions below

0
On BEST ANSWER

My reading of:

if len([ln for ln in line if line.count(ln) > 1]) == 0:

is that the words in question can't have any repeated letters which could explain why no words are being found -- once you get up to 15, repeated letters are quite common. Since this requirement wasn't mentioned in the explanation, if we drop then we can write:

def reduceDict(word_length, showTotal):
    first_list = []

    for line in dictionary:
        line = line.rstrip()

        if len(line) == word_length:
            if line not in first_list:
                first_list.append(line)

    if showTotal:
        print('The number of words of length {} is {}'.format(word_length, len(first_list)))
        print(first_list)

try:
    dictionary = open("dictionary.txt")
except FileNotFoundError:
    exit("Dictionary not found")

reduceDict(15, True)

Which turns up about 40 words from my Unix words file. If we want to put back the unique letters requirement:

import re

def reduceDict(word_length, showTotal):
    first_list = []

    for line in dictionary:
        line = line.rstrip()

        if len(line) == word_length and not re.search(r"(.).*\1", line):
            if line not in first_list:
                first_list.append(line)

    if showTotal:
        print('The number of words of length {} is {}'.format(word_length, len(first_list)))
        print(first_list)

Which starts returning 0 results around 13 letters as one might expect.

0
On

In your code, you don't need the this line -

for letter in line:

In your list comprehension, if your intention is to loop over all the words in the line use this -

if len([ln for ln in line.split() if line.count(ln) > 1]) == 0:

In you code the loop in list comprehension loops over every character and checks if that character appears more than once in line. That way if your file contains chemotherapeutic it will not be added to the list first_list as there are letters that appears multiple times. So, unless your file contains word with more than 14 letters where all letters appear only once, you code will fail to find them.