How to generate all anagrams of a given word in Python

476 Views Asked by At

I want to code a function in Python returning all possible anagrams of a word given. Only words of the English language are considered as such.

However, up to now, I only managed to generate all permutations of a word.

import itertools

def anagrams(word):
    letters = list(word)
    perms = itertools.permutations(letters)
    return [''.join(p) for p in perms]

How can I efficiently check and filter for valid English words now?

3

There are 3 best solutions below

4
stroxx On

First you need an English dictionary in Python. I usually use nltk even though there might be better packages. You can install the dictionary of the package by

import nltk
nltk.download('words')

and then a slight adjustment of your code yields what you want:

from nltk.corpus import words
import itertools

# words.words() is list, for faster runtime
word_set = set(words.words())

def anagrams1(word):
    letters = list(word.lower())
    perms = itertools.permutations(letters)
    word_lst = [''.join(p) for p in perms]
    ana_lst = set(w for w in word_lst if w in word_set)
    return ana_lst

For example,

anagrams1('sink')
>>> {'inks', 'sink', 'skin'}

Edit thanks to Kelly Bundy: A far better runtime can be achieved by a different algorithm, that is checking for every correct word if it is an anagram of the input.

def anagrams2(word):
    word_sorted = sorted(word)
    ana_lst = set(w for w in words.words() if sorted(w)==word_sorted)
    return ana_lst
0
AudioBubble On

Your method is pretty inefficient for long words. Better precompute all anagrams:

  • Get an English dictionary and process it as follows:

    • For every word, sort the letters and add to a Python dictionary using the sorted word as the key, and append to the list of words with the same key;
  • For a given query word, sort the letters and lookup the Python dictionary.

E.g. the allowed words are "one", "two", "three", "neo". Store as

"eno": ["one", "neo"]
"otw": ["two"]
"eehrt": ["three"]

Now the anagrams of "eon" -> "eno" are "one", "neo".


Note that the preprocessed dictionary is only about twice as large as the initial set of words. And the preprocessing time will remain reasonable as all words have to be input to the set/dictionary anyway (sorting the letters can be done fairly efficiently by histogram sort).

6
RufusVS On

As I said in my comment, I'd just do one pass through the dictionary to see if the word had the same letters:

from collections import Counter
TARGET_WORD = "trace"
target_pattern = Counter(TARGET_WORD.lower())

for word in open("dictionary.txt","rt").read().splitlines():
    if Counter(word.lower()) == target_pattern:
        print(word)

If you were to process multiple words, you can preprocess your dictionary:

dictionary = [Counter(x.lower()) for x in open("dictionary.txt","rt").read().splitlines()]

for word in word_list:  # word_list is your list of target words.
    if Counter(word.lower()) in dictionary:
        print (word)

(Edit: My first preprocessing program tried to make the preprocessed dictionary a set, but tested the code and Counters aren't hashable as @KellyBundy caught right away. A set would probably be more efficient, so you could use tuple(sorted(Counter(x).items()) for the entries and the comparison, but I'll leave that as an exercise for the reader.)