Skip to content

Find words with the most anagrams efficiently using python

by mat on September 2nd, 2010

Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.

Code

# -*- coding: utf-8 -*-
from time import time
from collections import defaultdict  

ag_len = 10 # Anagram word length
ag_min = 2  # Min # of anagrams
dictionary_path = '/usr/share/dict/british-english'
tic = time()

wd = defaultdict(set)
for l in open (dictionary_path, 'r'):
	l=l.strip()
	if ag_len==len(l):
		wd["".join(sorted(l))].add (l)

for ws, wl in wd.iteritems():
	if len ( wl ) >= ag_min:
		print " ".join ( wl )

toc = time()
print toc-tic,'s'

Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:

"".join(sorted('arranging')) = 'aagginnrr'

With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:

When the dictionary gets to megatons it will create a new key in the dicitonary like so:
{'aegmnost': set(['megatons'])}

Then to magnetos
{'aegmnost': set(['magnetos', 'megatons'])}

Then to montages:
{'aegmnost': set(['magnetos', 'megatons', 'montages'])}

Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.

All done, a very elegant and simple method to find words with several anagrams for a given word length.

Results

I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.

There is a 11 letter tripple anagram:

anthologies anthologise theologians

and some 8 letter with 4 or more anagrams:

painters pertains pantries repaints
resident nerdiest inserted trendies
salesmen lameness nameless maleness
strainer restrain terrains retrains trainers
altering triangle relating integral alerting
rangiest ingrates angriest gantries
parroted predator teardrop prorated
iterates teariest treatise treaties
trounces counters recounts construe
No comments yet

Leave a Reply

Note: I am currently writing my thesis so probably wont have time to reply to your comment
Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS