## Python: Anagram solver

Finding the solutions to an anagram can be enjoyable and used as a sign of mental agility, with examples from long running television series (countdown) to online gambling skill games. It also makes an interesting tutorial for learning some simple python skills.

We start off by writing a function to check if a given word is an anagram of another word. This will be used to check a list of dictionary words against a given word in order to find its anagrams. The function accepts two string inputs, the word we are testing (word), and the given anagram to check it against (chkword).

The function loops over every letter in the word and if it is not present in chkword then we know that it cannot be an anagram of the chkword. Otherwise it is valid and that letter is removed from the chkword so that each letter may only be used once. For example checking “netting” with “testing” would fail on the second n.

def anagramchk(word,chkword):
for letter in word:
if letter in chkword:
chkword=chkword.replace(letter, '', 1)
else:
return 0
return 1

Now we need to get the anagram from the user. The following using raw_input to read what the user types in the console and assign it to a variable, with the optional argument used to print text so the user knows they need to do something.

wordin=raw_input('Enter anagram:\n')

We can now join both these together and loop over our dictionary file. We strip each line to remove any extra space and gumph at the end of the lines. If the length of a line is less than 4 then it is ignored,this can be changed to whatever minimum you like, so if you want it to be the same length as the input you could replace it with len(line)==len(wordin). We then use the function already written earlier to check if the word is a valid anagram and print the word if it is. Then we finish by closing the file.

f=open('english.txt', 'r')
for line in f:
line=line.strip()
if len(line)>=4:
if anagramchk(line,wordin):
print line
f.close()

Hopefully this has been helpful to someone. As ever if you have any suggestions or improvements then please leave a comment. Below is the complete version of the code

def anagramchk(word,chkword):
for letter in word:
if letter in chkword:
chkword=chkword.replace(letter, '', 1)
else:
return 0
return 1

wordin=raw_input('Enter anagram:')

f=open('english.txt', 'r')
for line in f:
line=line.strip()
if len(line)>=4:
if anagramchk(line,wordin):
print line
f.close()

The dictionary file I used was one I found in /usr/share/dict/british-english (most linux distro’s should have this or similar) it can also be found here

## Python: Factors of a number

Project Euler often requires the factors of a number to be found, here is the function I have written for that purpose.

def factors(n):
fact=[1,n]
check=2
rootn=sqrt(n)
while check<rootn:
if n%check==0:
fact.append(check)
fact.append(n/check)
check+=1
if rootn==check:
fact.append(check)
fact.sort()
return fact

As always any help in speeding up my function would be appreciated. I think it could be improved using a sieve method or by checking if its divisible by 2 first then using check+=2 starting at 3 to improve the speed of the loop by a factor of 2.

Below is a the commented version to help explain what’s going on if it’s not clear.

def factors(n):
# 1 and n are automatically factors of n
fact=[1,n]
# starting at 2 as we have already dealt with 1
check=2
# calculate the square root of n and use this as the
# limit when checking if a number is divisible as
# factors above sqrt(n) will already be calculated as
# the inverse of a lower factor IE. finding factors of
# 100 only need go up to 10 (sqrt(100)=10) as factors
# such as 25 can be found when 5 is found to be a
# factor 100/5=25
rootn=sqrt(n)
while check<rootn:
if n%check==0:
fact.append(check)
fact.append(n/check)
check+=1
# this line checks the sqrt of the number to see if
# it is a factor putting it here prevents it appearing
# twice in the above while loop and putting it outside
# the loop should save some time.
if rootn==check:
fact.append(check)
# return an array of factors sorted into numerial order.
fact.sort()
return fact

## Evolutionary Algorithms in python, queens example

So a friend of mine introduced me to Evolutionary Algorithms a while back and I got some lecture notes passed onto me explaining the basics and a simple example in pseudo-code.

After a bit of work I’ve written up the queens example as an evolutionary algorithm in python. Here goes:

The queens problem
The following is the quote from wikipedia regarding the problem.

The eight queens puzzle is the problem of putting eight chess queens on an 8×8 chessboard such that none of them is able to capture any other using the standard chess queen’s moves. The queens must be placed in such a way that no two queens would be able to attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal.

The above image is one of the many solutions to the problem. Hint: if you are looking for many solutions once you have found one, all of its 4 rotations (if unique) are solutions too.

Code
This is the current code with medium amount of comments, I have also provided a heavily commented version for people with less python experience (scroll to bottom).

# -*- coding: utf-8 -*-
##################################################################
##################################################################
# QUEENS EXAMPLE
# - Program Type: Demonstration of Evolutionary alogorithms
# - Authour: Matthew Rollings
# - Date: 14/09/09
##################################################################
##################################################################

# Import randint from the random module
from random import randint

# Creates a array I just use for convenience
board=[1,2,3,4,5,6,7,8]

##################################################################
# FUNCTIONS

# Function to check amount of attacking queens
def check(array):

# Collisions represent number of attacking queens, starts at zero obv.
collisions=0
for i in range(1,9):
if i not in array:
print "DUPLICATE NUMBER ERROR - KILLING STUPID GENE" # Never happen 😉
return 0

# Total Collisions
# For each queen in the array
for i in range(0, 8):
col=0
# For every other queen in the array
for j in range(0, 8):
# avoids checking self vs self
if i!=j:
# if queen i is on a diagonal from queen j, then the
# difference in magnitude x-cord will equal the difference in
# the magnitude of the y-coord
if abs(board[i]-board[j])==abs(array[j]-array[i]):
# Sets a variable to tell the next part that a collision was detected
col=1
# If a collision was detected add one to collisions
if col==1:
collisions+=1
# Return 8-colllisions, so that 0 is bad (8 attacking queens) and 8 is good (no attacking queens)
return 8-collisions

# The reproduce function, takes two arguements, the two parents
def reproduce(array1, array2):

# FIRST BABY (mother first then father)
# Takes first half of mothers gene
baby=array1[0:4]
# Can't just add two halves as I will get duplicate numbers
for i in array2:
# add fathers genes going in order of numbers left
if i not in baby:
baby.append(i)
# Add a little variation by giving a percentage chance of mutation
if randint(0,100)>20:
baby=mutate(baby)
# Add the baby to the population and add its corresponding fitness
population.append(baby)
fitness.append(check(baby))

# SECOND BABY (arrays just swapped around, Father first then mother)
baby=array2[0:4]
for i in array1:
if i not in baby:
baby.append(i)
if randint(0,100)>20:
baby=mutate(baby)
population.append(baby)
fitness.append(check(baby))

# Mutate the array given to the function
def mutate(array1):
#Chooses two random places (WARNING CAN BE SAME POSITION)
a=randint(0,7)
b=randint(0,7)
# Swaps the two over (temporary var must be used (c))
c=array1[a]
array1[a]=array1[b]
array1[b]=c
return array1

# Prints the population and corresponding fitness to the screen
# Only used for debugging
def printpop():
# Prints the group
for i in range(0, len(population)):
print population[i],fitness[i]

##################################################################
# VARIABLES
# The size of the population (how many genes, more = faster convergance but more cpu + memory, fewer = opp.)
popsize=40
# The starting variation of the group because I havn't created a better method
# for starting off. I just add arrays of 1,2,3,4,5,6,7,8 to the population and
# give them the number of mutations between the following two values to make
# psudo random starting group
variation=[4,14]
# How many of the genes to kill each time in percentage, each dead gene will be replaced by a new child 🙂
die=0.40
# Kill limit is calculated from the above percentage
kill_limit=die*popsize

##################################################################
# MAIN

population=[]
fitness=[]
for i in range(0,popsize):
population.append([1,2,3,4,5,6,7,8])
a=0
while a<randint(variation[0],variation[1]):
a+=1
population[i]=mutate(population[i])
fitness.append(check(population[i]))

maxi=0
generations=1

while maxi!=8:

# Picks top # in group to be parents, kills rest
killed=0
# starting at the lowest fitness (0 and increasing untill kill limit reached)
x=0
while killed<kill_limit:
for i in range(0,popsize):
# Try is here to catch any errors
try:
if fitness[i]==x:
# This bit removes the crappy gene from the population and from fitness
population.pop(i)
fitness.pop(i)
# increases the kill counter
killed+=1
if killed==kill_limit:
break
except:
break
# increments fitness
x+=1

babies=0
cpop=len(population)-1 #current population
while babies<killed:
# produces two babies from two random parents (should prob give fittest parents preference)
reproduce(population[randint(0,cpop)],population[randint(0,cpop)])
babies+=2

generations+=1

# Looks for highest fitness in the group and checks if any have won!
maxi=0
for i in range(0,popsize):
if fitness[i]>maxi:
maxi=fitness[i]
if maxi==8:
print population[i]
break

print "Took",generations,"generations"

Quick Analysis
I did a very quick analysis keeping the kill off percentage at 0.4 and varying the population size to see the improvement on generations required before a solution is found. I’ve also added a link (scroll to bottom) to the .ods file I used to calculate this (and an excel .xls file for the windows chaps). Below is the graph I calculated for different population sizes and it looks like we would expect, decreasing as the population increases. I only did 10 averages for each point and there were some anomalies, which are to be expected with this method of programming as it essentially relies on random numbers so it is a bit noisey:

Note the following might be more desirable to copy and pasting the above as they use tabs rather than spaces.
queens.py – The normal version
queens_commented.py – The heavily commented version.

evolutionary.ods – Analysis in ods format (open office)
evolutionary.xls – Analysis in excel format

## Python: Making multi depth directories

If you try and write a file to something like ‘subfolder/file.txt’ in python it wont automatically create the subfolder for you and it will fail with:

IOError: [Errno 2] No such file or directory: 'test2/file.txt'

I imagine this is deliberate as you don’t want to be accidentally creating folders. So to make a single folder we import the os module, and use the mkdir command

import os
os.mkdir('newfolder')

This is good, but if we try and make a subdirectories using the single command then it will fail as this command will only try and create the last directory in the string you pass. IE: if you pass dir1/dir2/dir3 it will assume dir1 and dir2 already exist and try to make dir3, which will fail with:

OSError: [Errno 2] No such file or directory: 'dir1/dir2/dir3'

So in order for this to work we would need to make it recursivly create subdirectories, but thankfully someone already went to the trouble for us with ‘supermakedir’ or makedirs.

import os
os.makedirs('dir1/dir2/dir3')

So this is probably useful enough for most people, but I needed to create folders in order for a file to get dropped in it, however I was only given the full path for the file not seperated into filename and directory path.

So I made the following function that accepts an input such as ‘dir1/dir2/dir3/filename.txt’ and cuts it into the folder path using os.path.dirname(filename), and then checks if the folder exists using os.path.exists(folder), and then makes the folders if they don’t already exist.

def mkdirnotex(filename):
folder=os.path.dirname(filename)
if not os.path.exists(folder):
os.makedirs(folder)

## Python: Calculating pi using random numbers

During my undergraduate degree I wrote a program in fortran 95 to calculate pi using random numbers. My aim is to rewrite it efficiently in python. I know its a terrible way to calculate pi, and there are much better ways to do it but its fun!

First I’ll explain the maths so you can visualise what’s going on. As we should know _pi_ is the ratio of circle’s radius to its circumference, which is conveniently the same as the ratio of a circle’s area to the square of its radius (wiki…)

So what we are going to be doing is picking lots of random coordinates in an x-y grid and calculating if they are within the circle or the square.

We will assign the radius to be 1, because that makes it easy to work with. By default a random number in python ( random() ) will return a floating point number between 0 and 1. To test if a point is within a circle we simply use Pythagoras.

So if the sqrt(a**2+b**2)<=1 then the point lies inside the circle’s radius. In the diagram above we see that point A lies within the circle, and point B lies outside the circle.

We can really don’t need to use the whole circle as it has symmetry, so we can just take a quartre, which makes the generating of random numbers easier as you only need to use a random number for x and y between 0 and 1, rather than -1 and 1. It will look like the diagram below.

Now for a confusing bit of maths. We are calculating the ratio of the area of a circle to the area of a square.

# Area of circle
A=pi*r**2
# where r = 1
A = pi
# Area of square
A = l ** 2
# in this case (see diagram) our square's length is twice the radius
l=2*r
A=(1+1)**2 = 4

#Therefore our ratio will be pi : 4.
# Which means we must multiply our result by four to get pi.

Final version (efficient for using)

from random import *
from math import sqrt
inside=0
n=1000
for i in range(0,n):
x=random()
y=random()
if sqrt(x*x+y*y)<=1:
inside+=1
pi=4*inside/n
print pi

Below we can see the values it creates

n	calc	error
1	4.00000000	0.73686317
10	3.60000000	0.45840735
100	3.24000000	0.09840735
1000	3.06400000	-0.07759265
10000	3.16160000	0.02000735
100000	3.14140000	-0.00019265
1000000	3.14293600	0.00134335
10000000	3.14117920	-0.00041345

So we can see that the program quickly solves pi to about two decimal places, but it is a terribly inefficient method and will struggle to get much more accuracy than this.

Resources to check out:
This blog post – Solves pi via taylor series expansion
Super pi – Program that calculate pi often used for benchmarking

## Python: palindrome checking function

I created a reasonable palindrome checking function in python.

Method 1

def ispalindrome(num):
n=str(num)
while len(n)>1:
print n
if n[0]!=n[-1]:
return 0
n=n[1:-1]
return 1

I thought that it would be faster avoiding a string conversion, and to somehow use the modulus (modulo) function. However when I came to write it, I found it quite difficult to code, and I’m sure there must be a better way.

Method 2

def ispalindrome2(num):
l=1
while num/10**l>=1.0:
l+=1
r=0
d=[]
for i in range(1,l+1):
p=num%10**i-r
r+=p
p=p/10**(i-1)
d.append(p)

for i in range(0,l/2):
if d[i]!=d[-i-1]:
return 0
return 1

Doing the speed tests show that Method 1 is over 5.2 times faster than Method 2.

Method 1
0.355437994003 Seconds elapsed
Method 2
1.85815691948 Seconds elapsed

Update: Mike of mikemeat sent me his method using slices (Method 3) and I adapted it slight (Method 4), then shortly after I realised it could be even more efficient by not needing to differentiate between even and odd strings (Method 5).

Method 3

def ispalindrome3(x):
z = str(x)
if len(z)%2 == 0 and z[:len(z)/2]==z[-len(z)/2:][::-1]:
return 1
if len(z)%2 != 0 and z[:(len(z)- 1)/2]==z[(-len(z) + 1)/2:][::-1]:
return 1
else:
return 0

Method 4

def ispalindrome4(x):
z = str(x)
if z[:len( z)/2]==z[len( z)/2+len( z)%2:][::-1]:
return 1
return 0

Method 5

def ispalindrome5(x):
z = str(x)
l=len(z)/2
if z[:l]==z[-l:][::-1]:
return 1
return 0
Method 1
0.357168912888 Seconds elapsed
Method 2
1.83943104744 Seconds elapsed
Method 3
0.179126977921 Seconds elapsed
Method 4
0.179482936859 Seconds elapsed
Method 5
0.149376153946 Seconds elapsed

## Python: sum of digits in a string

I have a function I wrote for a project euler that calculates the sum of the digits in a number. This is my first attempt which simply converts each letter to an integer and sums them.

Method 1:

def digitsum(x):
total=0
for letter in str(x):
total+=int(letter)

I thought that this could be improved using ord, which converts a letter into its decimal ascii number. Numbers ‘0’, ‘1’, ‘2’ … ‘9’ correspond to the ascii values of 48 – 57 and then took the moduli of this with 48 to give the integer value. I later realised that this was completely nonsensical and should have just subtracted 48, but I decided to include it for the purposes of the speed test.

Method 2:

def digitsum2(x):
total=0
for letter in str(x):
total+=ord(letter)%48

Method 3:

def digitsum3(x):
total=0
for letter in str(x):
total+=ord(letter)-48

Speed Test:
The test uses a long number and one million repetitions for each method.

from time import time

# .. functions go here

# Nice long number to sum
x=981234153134415646571899783156122451653

tic = time()
for i in range(0,1000000):
digitsum(x)
print time() - tic, 'Seconds elapsed'

tic = time()
for i in range(0,1000000):
digitsum2(x)
print time() - tic, 'Seconds elapsed'

tic = time()
for i in range(0,1000000):
digitsum3(x)
print time() - tic, 'Seconds elapsed'

Results:

#Method 1
29.3496568203 Seconds elapsed
#Method 2
12.185685873 Seconds elapsed
#Method 3
9.59367895126 Seconds elapsed

So we can see that the first method is much slower, avoiding the integer conversion by using ord speeds it up the function by ~60% and that using subtraction rather than modulus (a division based operation) saves a further ~20% on top of this.

## Python: crossword solver + dictionary file

This is a quick and dirty crossword solver that I wrote in python:

word=raw_input('Crossword Solver \nuse * as a wildcard: ')
f=open('dic.txt', 'r')
for line in f:
line=line.strip()
if len(line)==len(word):
good=1
pos=0
for letter in word:
if not letter=='*':
if not letter==line[pos]:
good=0
pos+=1
if good==1:
print line
f.close()

Example usage:

Crossword Solver
use * as a wildcard: *arn*val
carnival

The dictionary file I used is 608.2Kb with 80,368 english words and avaliable here

## Python: diskspace

I wanted a simple function to use in a program I am writing to ensure that the disk isn’t getting full, after a quick search I found a blog post on thinkhole.org with a great solution:

# os module required
import os

# retrieves information for the harddrive where root is mounted
# in windows replace this with "C:\" or the relevant drive letter
disk = os.statvfs("/")

# Information is recieved in numbers of blocks free
# so we need to multiply by the block size to get the space free in bytes
capacity = disk.f_bsize * disk.f_blocks
available = disk.f_bsize * disk.f_bavail
used = disk.f_bsize * (disk.f_blocks - disk.f_bavail)

# print information in bytes
print used, available, capacity

# print information in Kilobytes
print used/1024, available/1024, capacity/1024

# print information in Megabytes
print used/1.048576e6, available/1.048576e6, capacity/1.048576e6

# print information in Gigabytes
print used/1.073741824e9, available/1.073741824e9, capacity/1.073741824e9

You can argue about if they should be KiB or KB if you want, but i take them as 1024 bytes in a kilobyte

## acid3 tests firefox chrome konqueror arora on ubuntu x64 karmic

I preformed some acid3 tests on the browsers I currently had installed.

Chrome: It is sexy (only thing holding it back in Linux is the lack of flash support)
Score: 100/100
Version: 4.0.203.2

Opera: I used to use opera however not really any more, just installed it for this test.
Score: 100/100
Version: 10.00 (build 4585)

Arora: I don’t tend to use Arora, however I’ve heard alot of good things about it. It is a light-weight browser based on web-kit (as are chrome, safari and konqueror) and excels at its speed, highly recommended for notebooks which don’t need the bloat of firefox.