I'm trying to create a function best_tiles which takes in the number of tiles in your hand and returns the set of tiles that allows you to produce the most number of unique English-valid words, assuming that you can only use each tile once.
For example, with the set of tiles in your hand (A, B, C) you can produce the words, CAB, BAC, AB, and BA (all of these are English words), so you can spell 4 unique words with that set. With (B, A, A), you can spell 5 words: ABA, BAA, AA, AB, and BA. The goal is to find the set of letters which allows you to spell the most number of English-Valid words (without replacement).
So if 5 was the maximum number of words that could be spelled with any combination of letters for N = 3, running best_tiles( n = 3 ) would return B, A, A.
I'm wondering how to implement this efficiently? My current approach doesn't scale well at all with number of letters.
I read in a wordlist. In this case, I'm using enable.txt here: https://www.wordgamedictionary.com/enable/
import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f:
for values in f:
words.append(list(values.strip().upper()))
I create a function word_in_tiles h/t smack89 which returns whether it is possible to construct a word given a tile set:
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in
collections.Counter(word).items())
I then create a function get_all_words which produces a list of all the possible words one can spell from a word list and a tile set.
def get_all_words(tile_set, words):
# words is a word list
return [i for i in words if word_in_tiles(i, tile_set)]
The extremely naive approach for identifying which tileset is the "best" for three letters is the following:
I first create a list of every possible combination for a given length. So for length 3, I'd do:
import string
import itertools
letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))
# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]
sequence_dict = dict()
for i in all_three_letter_combinations:
string_test = "".join(i).upper()
sequence_dict[i] = get_all_words(string_test, three_letter)
Then remove the values with no length and sort by the length of the result:
res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}
def GetMaxLength(res):
return max((len(v), v, k) for k, v in res.items())[1:]
GetMaxLength(res)
You get that, for three letters, the tile-set that produces the most english valid words is T A E which can produce the following words ['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']
I'd like to be able to scale this up to as big as N = 15. What is the best procedure for doing this?

I think this is good enough!
Here is a log of my code running under PyPy:
The key improvements are these.
And here is the code.