Archives for 2 Sep,2010

You are browsing the site archives by date.

Find words with the most anagrams efficiently using python

Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.

Code

# -*- coding: utf-8 -*-
from time import time
from collections import defaultdict  

ag_len = 10 # Anagram word length
ag_min = 2  # Min # of anagrams
dictionary_path = '/usr/share/dict/british-english'
tic = time()

wd = defaultdict(set)  
for l in open (dictionary_path, 'r'):
	l=l.strip()
	if ag_len==len(l):
		wd["".join(sorted(l))].add (l)
	
for ws, wl in wd.iteritems():
	if len ( wl ) >= ag_min:
		print " ".join ( wl )


toc = time()
print toc-tic,'s' 

Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:

"".join(sorted('arranging')) = 'aagginnrr'

With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:

When the dictionary gets to megatons it will create a new key in the dicitonary like so:
{'aegmnost': set(['megatons'])}

Then to magnetos
{'aegmnost': set(['magnetos', 'megatons'])}

Then to montages:
{'aegmnost': set(['magnetos', 'megatons', 'montages'])}

Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.

All done, a very elegant and simple method to find words with several anagrams for a given word length.

Results

I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.

There is a 11 letter tripple anagram:

anthologies anthologise theologians

and some 8 letter with 4 or more anagrams:

painters pertains pantries repaints
resident nerdiest inserted trendies
salesmen lameness nameless maleness
strainer restrain terrains retrains trainers
altering triangle relating integral alerting
rangiest ingrates angriest gantries
parroted predator teardrop prorated
iterates teariest treatise treaties
trounces counters recounts construe
Read More

9 letter words with several anagrams

While perusing the statistics of wordcube, I was wondering how many 9 letter words have multiple anagrams (using all the letters in a single word) and what was the maximum number of anagrams. So I wrote a quick and dirty python program to find out. I will first show the results as they are interesting followed by my coding and methods to improve the efficiency of it.

Results
Here are all the nine letter words with more than 2 anagrams:

  • 1. auctioned cautioned education
  • 2. beastlier bleariest liberates
  • 3. cattiness scantiest tacitness
  • 4. countries cretinous neurotics
  • 5. cratering retracing terracing
  • 6. dissenter residents tiredness
  • 7. earthling haltering lathering
  • 8. emigrants mastering streaming
  • 9. estranges greatness sergeants
  • 10. gnarliest integrals triangles
  • 11. mutilates stimulate ultimates
  • 12. reprising respiring springier

I only found 12 sets of 3, there may be more with a larger dictionary. I was also disappointed that there were no words with 4 anagrams yet not entirely unsurprising. My personal favourite is number 10

Which is your favorite?

View Results

Loading ... Loading ...

Python

I recycled an anagram checking function that I have used before:

# -*- coding: utf-8 -*-

# Anagram checking function
def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1 

First program

Firstly I created a dirty program that created a loop to cycle through the 9 letter word dictionary and another loop nested inside to check against every word in the dictionary again. This is a terrible and inefficient method and will create duplicates, I will follow with a more efficient method.

g=open('eng-9-letter', 'r')
for l in g:
	
	wordin=l.strip()

	f=open('eng-9-letter', 'r')
	count=0
	w=""
	for line in f:
		line=line.strip()
		if anagramchk(line,wordin):
			count+=1
			w+=" "+line
	f.close()
	if count>2:
		print wordin, count, "(",w,")"

g.close()

This program took 80.42s to find the 12 solutions. On the path to better coding I decided to load the dictionary into memory, this sped the code up about 20s to 63.88s.

# Load dictionary into memory
dic=[]
f=open('eng-9-letter', 'r')
for line in f:
	dic.append(line.strip())
f.close()

I then attempted to create a method that loops over and removes words from the dictionary as it loops, however I don’t know the correct way (if there is one?) of modifying the loop variable while inside the loop without causing problems.

for word in dic:
	if ....:
		dic.remove(word)

If anyone knows a good method of doing this please let me know! I did managed to hack together something using slices so that I could modify the dictionary each time, however I imagine this is still quite inefficient.

for word in dic[:]:
	w=""
	count=0
	for word2 in dic[:]:
		if anagramchk(word,word2):
			count+=1
			dic.remove(word2)
			w+=word2+" "
	if count>2:
		print w

Even so this method now avoids duplication of results and completes in 31.87s (machine running at 3.15Ghz). Please let me know of any improvements you think can be made and I’ll happily benchmark to see how much better it is.

Read More