9 letter words with several anagrams
While perusing the statistics of wordcube, I was wondering how many 9 letter words have multiple anagrams (using all the letters in a single word) and what was the maximum number of anagrams. So I wrote a quick and dirty python program to find out. I will first show the results as they are interesting followed by my coding and methods to improve the efficiency of it.
Results
Here are all the nine letter words with more than 2 anagrams:
- 1. auctioned cautioned education
- 2. beastlier bleariest liberates
- 3. cattiness scantiest tacitness
- 4. countries cretinous neurotics
- 5. cratering retracing terracing
- 6. dissenter residents tiredness
- 7. earthling haltering lathering
- 8. emigrants mastering streaming
- 9. estranges greatness sergeants
- 10. gnarliest integrals triangles
- 11. mutilates stimulate ultimates
- 12. reprising respiring springier
I only found 12 sets of 3, there may be more with a larger dictionary. I was also disappointed that there were no words with 4 anagrams yet not entirely unsurprising. My personal favourite is number 10
Python
I recycled an anagram checking function that I have used before:
# -*- coding: utf-8 -*- # Anagram checking function def anagramchk(word,chkword): for letter in word: if letter in chkword: chkword=chkword.replace(letter, '', 1) else: return 0 return 1
First program
Firstly I created a dirty program that created a loop to cycle through the 9 letter word dictionary and another loop nested inside to check against every word in the dictionary again. This is a terrible and inefficient method and will create duplicates, I will follow with a more efficient method.
g=open('eng-9-letter', 'r')
for l in g:
wordin=l.strip()
f=open('eng-9-letter', 'r')
count=0
w=""
for line in f:
line=line.strip()
if anagramchk(line,wordin):
count+=1
w+=" "+line
f.close()
if count>2:
print wordin, count, "(",w,")"
g.close()
This program took 80.42s to find the 12 solutions. On the path to better coding I decided to load the dictionary into memory, this sped the code up about 20s to 63.88s.
# Load dictionary into memory
dic=[]
f=open('eng-9-letter', 'r')
for line in f:
dic.append(line.strip())
f.close()
I then attempted to create a method that loops over and removes words from the dictionary as it loops, however I don’t know the correct way (if there is one?) of modifying the loop variable while inside the loop without causing problems.
for word in dic: if ....: dic.remove(word)
If anyone knows a good method of doing this please let me know! I did managed to hack together something using slices so that I could modify the dictionary each time, however I imagine this is still quite inefficient.
for word in dic[:]: w="" count=0 for word2 in dic[:]: if anagramchk(word,word2): count+=1 dic.remove(word2) w+=word2+" " if count>2: print w
Even so this method now avoids duplication of results and completes in 31.87s (machine running at 3.15Ghz). Please let me know of any improvements you think can be made and I’ll happily benchmark to see how much better it is.


Timing:
0.63 real 0.57 user 0.03 sys
on a 2.8GHz core 2 duo.
from collections import defaultdict
d = defaultdict(set)
for i in open(‘/usr/share/dict/words’):
i = i.strip().lower()
if len(i) == 9:
d[tuple(sorted(i))].add(i)
for k,v in sorted(d.items(), key = lambda (k,v): (len(k), min(v))):
if len(v) > 2:
print ‘\t’.join(v)
Sorry about the formatting.
from collections import defaultdict d = defaultdict(set) for i in open('/usr/share/dict/words'): i = i.strip().lower() if len(i) == 9: d[tuple(sorted(i))].add(i) for k,v in sorted(d.items(), key = lambda (k,v): (len(k), min(v))): if len(v) > 2: print '\t'.join(v)Edited: Fixed formatting
I’ve got something similar to Toby’s. It’s a bit more verbose (not a bad thing) and includes handling for my word list, which includes apostrophes:
from collections import defaultdict src = open('/etc/dictionaries-common/words', 'r') d = defaultdict(set) for word in src: word = word.strip().lower() if len(word) == 9 and "'" not in word: key = tuple(sorted(word)) d[key].add(word) sortedSets = [sorted(x) for x in d.values() if len(x) > 2] for wordset in sorted(sortedSets): print(' '.join(wordset))That dictionary is what Debian sets up. I’ve got the American English Large dictionary installed. Using that, every set with more than 3 words is:
* erections neoterics resection secretion
* orientals orleanist relations tiroleans
Edited: Fixed formatting
Forgot to mention…
I didn’t bother to time it. But it ran under a second on an antique Athlon with Python 2.6.6.
import sys wd = {} for l in open ( sys.argv[1] ): w = l.strip() wl = [] for letter in w: wl.append ( letter ) wl.sort() ws = "".join ( wl ) try: wd[ws].append ( w ) except: wd[ws] = [ w ] for ws, wl in wd.iteritems(): if len ( wl ) > 2: print " ".join ( wl )using all 9 letter words in /usr/share/dict/words on ubuntu 10.04, takes o.25 seconds on AND Athlon X2 4000+.
estranges greatness sergeants
mutilates stimulate ultimates
cratering retracing terracing
reprising respiring springier
gnarliest integrals triangles
dissenter residents tiredness
cartesian ascertain sectarian
emigrants mastering streaming
beastlier bleariest liberates
redrawing rewarding wardering
resorting restoring rostering
earthling haltering lathering
enlisting listening tinseling
orientals orientals relations
auctioned cautioned education
dangering deranging gardening
countries cretinous neurotics
cattiness scantiest tacitness
@Toby , @Michael Wow thanks for the code snippets, that is such a clever way of doing it! I knew there had to be a snazzy way of getting this to work.
@Martin Elliott-Hunter Also good, as the other two have used the built in sorted method, I rewrote part of your code from:
To:
Which improved its performance significantly.
I preformed a quick benchmark on my laptop but will compare them properly when I get home.
Toby: 0.0978s
Michael: 0.0696s
Martin: 0.1049s
Martin (+sorted): 0.0624s
Thanks again for your coding skills, have learnt a valuable method.
Edited code to show correct indentation
if your code was as shown above, the line ‘wl.sort’ should not be indented; as shown the sort gets done for each letter. ‘sorted’ is better anyway. It didn’t occur me because I’m an old-fashioned programmer and I pre-date ‘sorted’. I like to write code that is as simple as possible so I can make sense of it later.
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” Kernighan ?
On a VM with a 500 Mhz CPU running Fedora 13, after installing the “words” rpm (sudo yum install words), I ran Michael’s code and obtained 689 hits in 1.362s.
@Martin Elliott-Hunter Ah yeah, my incorrect copy of your code would have caused it to do 8 unnecessary sorts per word.
Doing a comparison on my home computer.
Toby: 0.0533s
Michael: 0.0369s
Martin (+correct code indentation): 0.0484s
Martin (+sorted): 0.0350s
I’m gonna post another article about 10 letter anagrams using some of the new code
Thanks again for your help!
Word Maker Scrabble works good for this type of thing.