# Archive for September, 2009

So a friend of mine introduced me to **Evolutionary Algorithms** a while back and I got some lecture notes passed onto me explaining the basics and a simple example in pseudo-code.

After a bit of work I’ve written up the queens example as an evolutionary algorithm in python. Here goes:

**The queens problem**

The following is the quote from wikipedia regarding the problem.

The eight queens puzzle is the problem of putting eight chess queens on an 8×8 chessboard such that none of them is able to capture any other using the standard chess queen’s moves. The queens must be placed in such a way that no two queens would be able to attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal.

The above image is one of the many solutions to the problem. Hint: if you are looking for many solutions once you have found one, all of its 4 rotations (if unique) are solutions too.

Check out the wikipedia article for more information

**Code**

This is the current code with medium amount of comments, I have also provided a heavily commented version for people with less python experience (scroll to bottom).

# -*- coding: utf-8 -*- ################################################################## ################################################################## # QUEENS EXAMPLE # - Program Type: Demonstration of Evolutionary alogorithms # - Authour: Matthew Rollings # - Date: 14/09/09 ################################################################## ################################################################## # Import randint from the random module from random import randint # Creates a array I just use for convenience board=[1,2,3,4,5,6,7,8] ################################################################## # FUNCTIONS # Function to check amount of attacking queens def check(array): # Collisions represent number of attacking queens, starts at zero obv. collisions=0 for i in range(1,9): if i not in array: print "DUPLICATE NUMBER ERROR - KILLING STUPID GENE" # Never happen return 0 # Total Collisions # For each queen in the array for i in range(0, 8): col=0 # For every other queen in the array for j in range(0, 8): # avoids checking self vs self if i!=j: # if queen i is on a diagonal from queen j, then the # difference in magnitude x-cord will equal the difference in # the magnitude of the y-coord if abs(board[i]-board[j])==abs(array[j]-array[i]): # Sets a variable to tell the next part that a collision was detected col=1 # If a collision was detected add one to collisions if col==1: collisions+=1 # Return 8-colllisions, so that 0 is bad (8 attacking queens) and 8 is good (no attacking queens) return 8-collisions # The reproduce function, takes two arguements, the two parents def reproduce(array1, array2): # FIRST BABY (mother first then father) # Takes first half of mothers gene baby=array1[0:4] # Can't just add two halves as I will get duplicate numbers for i in array2: # add fathers genes going in order of numbers left if i not in baby: baby.append(i) # Add a little variation by giving a percentage chance of mutation if randint(0,100)>20: baby=mutate(baby) # Add the baby to the population and add its corresponding fitness population.append(baby) fitness.append(check(baby)) # SECOND BABY (arrays just swapped around, Father first then mother) baby=array2[0:4] for i in array1: if i not in baby: baby.append(i) if randint(0,100)>20: baby=mutate(baby) population.append(baby) fitness.append(check(baby)) # Mutate the array given to the function def mutate(array1): #Chooses two random places (WARNING CAN BE SAME POSITION) a=randint(0,7) b=randint(0,7) # Swaps the two over (temporary var must be used (c)) c=array1[a] array1[a]=array1[b] array1[b]=c return array1 # Prints the population and corresponding fitness to the screen # Only used for debugging def printpop(): # Prints the group for i in range(0, len(population)): print population[i],fitness[i] ################################################################## # VARIABLES # The size of the population (how many genes, more = faster convergance but more cpu + memory, fewer = opp.) popsize=40 # The starting variation of the group because I havn't created a better method # for starting off. I just add arrays of 1,2,3,4,5,6,7,8 to the population and # give them the number of mutations between the following two values to make # psudo random starting group variation=[4,14] # How many of the genes to kill each time in percentage, each dead gene will be replaced by a new child die=0.40 # Kill limit is calculated from the above percentage kill_limit=die*popsize ################################################################## # MAIN population=[] fitness=[] for i in range(0,popsize): population.append([1,2,3,4,5,6,7,8]) a=0 while a<randint(variation[0],variation[1]): a+=1 population[i]=mutate(population[i]) fitness.append(check(population[i])) maxi=0 generations=1 while maxi!=8: # Picks top # in group to be parents, kills rest killed=0 # starting at the lowest fitness (0 and increasing untill kill limit reached) x=0 while killed<kill_limit: for i in range(0,popsize): # Try is here to catch any errors try: if fitness[i]==x: # This bit removes the crappy gene from the population and from fitness population.pop(i) fitness.pop(i) # increases the kill counter killed+=1 if killed==kill_limit: break except: break # increments fitness x+=1 babies=0 cpop=len(population)-1 #current population while babies<killed: # produces two babies from two random parents (should prob give fittest parents preference) reproduce(population[randint(0,cpop)],population[randint(0,cpop)]) babies+=2 generations+=1 # Looks for highest fitness in the group and checks if any have won! maxi=0 for i in range(0,popsize): if fitness[i]>maxi: maxi=fitness[i] if maxi==8: print population[i] break print "Took",generations,"generations"

**Quick Analysis**

I did a very quick analysis keeping the kill off percentage at 0.4 and varying the population size to see the improvement on generations required before a solution is found. I’ve also added a link (scroll to bottom) to the .ods file I used to calculate this (and an excel .xls file for the windows chaps). Below is the graph I calculated for different population sizes and it looks like we would expect, decreasing as the population increases. I only did 10 averages for each point and there were some anomalies, which are to be expected with this method of programming as it essentially relies on random numbers so it is a bit noisey:

**Links**

Note the following might be more desirable to copy and pasting the above as they use tabs rather than spaces.

queens.py – The normal version

queens_commented.py – The heavily commented version.

evolutionary.ods – Analysis in ods format (open office)

evolutionary.xls – Analysis in excel format

If you have any corrections or suggestions please contact me.

## Python: Making multi depth directories

If you try and write a file to something like ‘subfolder/file.txt’ in python it wont automatically create the subfolder for you and it will fail with:

IOError: [Errno 2] No such file or directory: 'test2/file.txt'

I imagine this is deliberate as you don’t want to be accidentally creating folders. So to make a single folder we import the os module, and use the mkdir command

import os os.mkdir('newfolder')

This is good, but if we try and make a subdirectories using the single command then it will fail as this command will only try and create the last directory in the string you pass. IE: if you pass dir1/dir2/dir3 it will assume dir1 and dir2 already exist and try to make dir3, which will fail with:

OSError: [Errno 2] No such file or directory: 'dir1/dir2/dir3'

So in order for this to work we would need to make it recursivly create subdirectories, but thankfully someone already went to the trouble for us with ‘supermakedir’ or makedirs.

import os os.makedirs('dir1/dir2/dir3')

So this is probably useful enough for most people, but I needed to create folders in order for a file to get dropped in it, however I was only given the full path for the file not seperated into filename and directory path.

So I made the following function that accepts an input such as ‘dir1/dir2/dir3/filename.txt’ and cuts it into the folder path using os.path.dirname(filename), and then checks if the folder exists using os.path.exists(folder), and then makes the folders if they don’t already exist.

def mkdirnotex(filename): folder=os.path.dirname(filename) if not os.path.exists(folder): os.makedirs(folder)

During my undergraduate degree I wrote a program in fortran 95 to calculate pi using random numbers. My aim is to rewrite it efficiently in python. I know its a terrible way to calculate pi, and there are much better ways to do it but its fun!

First I’ll explain the maths so you can visualise what’s going on. As we should know _pi_ is the ratio of circle’s radius to its circumference, which is conveniently the same as the ratio of a circle’s area to the square of its radius (wiki…)

So what we are going to be doing is picking lots of random coordinates in an x-y grid and calculating if they are within the circle or the square.

We will assign the radius to be 1, because that makes it easy to work with. By default a random number in python ( random() ) will return a floating point number between 0 and 1. To test if a point is within a circle we simply use Pythagoras.

So if the sqrt(a**2+b**2)<=1 then the point lies inside the circle’s radius. In the diagram above we see that point A lies within the circle, and point B lies outside the circle.

We can really don’t need to use the whole circle as it has symmetry, so we can just take a quartre, which makes the generating of random numbers easier as you only need to use a random number for x and y between 0 and 1, rather than -1 and 1. It will look like the diagram below.

Now for a confusing bit of maths. We are calculating the ratio of the area of a circle to the area of a square.

# Area of circle A=pi*r**2 # where r = 1 A = pi # Area of square A = l ** 2 # in this case (see diagram) our square's length is twice the radius l=2*r A=(1+1)**2 = 4 #Therefore our ratio will be pi : 4. # Which means we must multiply our result by four to get pi.

**Final version (efficient for using)**

from random import * from math import sqrt inside=0 n=1000 for i in range(0,n): x=random() y=random() if sqrt(x*x+y*y)<=1: inside+=1 pi=4*inside/n print pi

Below we can see the values it creates

n calc error 1 4.00000000 0.73686317 10 3.60000000 0.45840735 100 3.24000000 0.09840735 1000 3.06400000 -0.07759265 10000 3.16160000 0.02000735 100000 3.14140000 -0.00019265 1000000 3.14293600 0.00134335 10000000 3.14117920 -0.00041345

So we can see that the program quickly solves pi to about two decimal places, but it is a terribly inefficient method and will struggle to get much more accuracy than this.

Resources to check out:

This blog post – Solves pi via taylor series expansion

Super pi – Program that calculate pi often used for benchmarking

## Python: palindrome checking function

I created a reasonable palindrome checking function in python.

**Method 1**

def ispalindrome(num): n=str(num) while len(n)>1: print n if n[0]!=n[-1]: return 0 n=n[1:-1] return 1

I thought that it would be faster avoiding a string conversion, and to somehow use the modulus (modulo) function. However when I came to write it, I found it quite difficult to code, and I’m sure there must be a better way.

**Method 2**

def ispalindrome2(num): l=1 while num/10**l>=1.0: l+=1 r=0 d=[] for i in range(1,l+1): p=num%10**i-r r+=p p=p/10**(i-1) d.append(p) for i in range(0,l/2): if d[i]!=d[-i-1]: return 0 return 1

Doing the speed tests show that Method 1 is over 5.2 times faster than Method 2.

Method 1 0.355437994003 Seconds elapsed Method 2 1.85815691948 Seconds elapsed

**Update: Mike of mikemeat sent me his method using slices (Method 3) and I adapted it slight (Method 4), then shortly after I realised it could be even more efficient by not needing to differentiate between even and odd strings (Method 5).**

**Method 3**

def ispalindrome3(x): z = str(x) if len(z)%2 == 0 and z[:len(z)/2]==z[-len(z)/2:][::-1]: return 1 if len(z)%2 != 0 and z[:(len(z)- 1)/2]==z[(-len(z) + 1)/2:][::-1]: return 1 else: return 0

**Method 4**

def ispalindrome4(x): z = str(x) if z[:len( z)/2]==z[len( z)/2+len( z)%2:][::-1]: return 1 return 0

**Method 5**

def ispalindrome5(x): z = str(x) l=len(z)/2 if z[:l]==z[-l:][::-1]: return 1 return 0

Method 1 0.357168912888 Seconds elapsed Method 2 1.83943104744 Seconds elapsed Method 3 0.179126977921 Seconds elapsed Method 4 0.179482936859 Seconds elapsed Method 5 0.149376153946 Seconds elapsed

## Python: sum of digits in a string

I have a function I wrote for a project euler that calculates the sum of the digits in a number. This is my first attempt which simply converts each letter to an integer and sums them.

**Method 1:**

def digitsum(x): total=0 for letter in str(x): total+=int(letter) return total

I thought that this could be improved using ord, which converts a letter into its decimal ascii number. Numbers ‘0’, ‘1’, ‘2’ … ‘9’ correspond to the ascii values of 48 – 57 and then took the moduli of this with 48 to give the integer value. I later realised that this was completely nonsensical and should have just subtracted 48, but I decided to include it for the purposes of the speed test.

**Method 2:**

def digitsum2(x): total=0 for letter in str(x): total+=ord(letter)%48 return total

**Method 3:**

def digitsum3(x): total=0 for letter in str(x): total+=ord(letter)-48 return total

**Speed Test:**

The test uses a long number and one million repetitions for each method.

from time import time # .. functions go here # Nice long number to sum x=981234153134415646571899783156122451653 tic = time() for i in range(0,1000000): digitsum(x) print time() - tic, 'Seconds elapsed' tic = time() for i in range(0,1000000): digitsum2(x) print time() - tic, 'Seconds elapsed' tic = time() for i in range(0,1000000): digitsum3(x) print time() - tic, 'Seconds elapsed'

**Results:**

#Method 1 29.3496568203 Seconds elapsed #Method 2 12.185685873 Seconds elapsed #Method 3 9.59367895126 Seconds elapsed

So we can see that the first method is much slower, avoiding the integer conversion by using ord speeds it up the function by ~60% and that using subtraction rather than modulus (a division based operation) saves a further ~20% on top of this.

This is a quick and dirty crossword solver that I wrote in python:

word=raw_input('Crossword Solver \nuse * as a wildcard: ') f=open('dic.txt', 'r') for line in f: line=line.strip() if len(line)==len(word): good=1 pos=0 for letter in word: if not letter=='*': if not letter==line[pos]: good=0 pos+=1 if good==1: print line f.close()

Example usage:

Crossword Solver

use * as a wildcard: *arn*val

carnival

The dictionary file I used is 608.2Kb with 80,368 english words and avaliable here

## Python: diskspace

I wanted a simple function to use in a program I am writing to ensure that the disk isn’t getting full, after a quick search I found a blog post on thinkhole.org with a great solution:

# os module required import os # retrieves information for the harddrive where root is mounted # in windows replace this with "C:\" or the relevant drive letter disk = os.statvfs("/") # Information is recieved in numbers of blocks free # so we need to multiply by the block size to get the space free in bytes capacity = disk.f_bsize * disk.f_blocks available = disk.f_bsize * disk.f_bavail used = disk.f_bsize * (disk.f_blocks - disk.f_bavail) # print information in bytes print used, available, capacity # print information in Kilobytes print used/1024, available/1024, capacity/1024 # print information in Megabytes print used/1.048576e6, available/1.048576e6, capacity/1.048576e6 # print information in Gigabytes print used/1.073741824e9, available/1.073741824e9, capacity/1.073741824e9

You can argue about if they should be KiB or KB if you want, but i take them as 1024 bytes in a kilobyte

I preformed some acid3 tests on the browsers I currently had installed.

**Chrome**: It is sexy (only thing holding it back in Linux is the lack of flash support)

**Score: 100/100**

**Version: 4.0.203.2**

**Opera**: I used to use opera however not really any more, just installed it for this test.

**Score: 100/100**

**Version: 10.00 (build 4585)**

**Arora**: I don’t tend to use Arora, however I’ve heard alot of good things about it. It is a light-weight browser based on web-kit (as are chrome, safari and konqueror) and excels at its speed, highly recommended for notebooks which don’t need the bloat of firefox.

**Score: 100/100 Linktest failed.**

**Version:0.8.0**

**Firefox**: It seems to handle flash better (without crashing) and has lots of useful plugins.

**Score: 93/100**

**Version: 3.5.2**

**Konqueror**:

**Score: 89/100 Linktest failed.**

**Version: 4.3.1**

Google’s chrome comes first (with chrome having smoother and faster animation) followed by Opera and closely followed by Arora only failing on the link test, then firefox and trailing is konqueror. But I still love konqueror as its so convenient have filemanager and web-browser in one.

I’ve been doing a lot of problem solving on Project Euler recently. If your not aware of Project Euler (PE) it is a website with lots of maths / programming based puzzles to solve. Many of the problems involve using or checking prime numbers, in many cases a sieve method would be applicable (see: Sieve of Eratosthenes for a clear explanation / animation).

However sometimes this is not necessary if you only require one prime number, or if memory limitations mean a sieve would be inconceivable. The following is the code I wrote to check if a number is prime or not in python. It tests upwards checking if the number is perfectly divisble, and as it does this it lowers the maximum number need to reach to stop as we know that if a number is not divisible by 2 then it will not be uniquely divisible (a repatition of a divislbe may exist) by anything greater than N/2

def isprime(number): if number<=1: return 0 check=2 maxneeded=number while check<maxneeded+1: maxneeded=number/check if number%check==0: return 0 check+=1 return 1

I hope that this made sense, and is useful to someone. If anyone has any more efficient methods I would be happy to hear from you.

Check out my profile to see which problems I’ve completed. If you’ve completed any I haven’t please get in contact and give me some tips.

**Update:**

Mike sent a suggestion that I could speed up the program by ignoring all factors of 2, it could also propbably be sped up by looking at 3,4,5 .. etc until certain point.

def isprime(number): if number<=1 or number%2==0: return 0 check=3 maxneeded=number while check<maxneeded+1: maxneeded=number/check if number%check==0: return 0 check+=2 return 1

So lets test the speed difference by doing 1,000 checks of the number 982,451,653 (known to be prime from this usefulsite)

Original Code: 9.99917411804 Seconds elapsed

Improved Code: 5.2977039814 Seconds elapsed

That’s approximately a factor of two speed increase, but this makes me think that a combination of the sieve method with this one may lead to an even further increase.

## Stealthcopter in the wild

Whilst shamelessly searching my own domain name in the search engines I found another stealthcopter in the wild: