python

Python: Cryptography decoding a Caesar shift (frequency analysis)

Due to the simple nature of the Caesar cipher, it could easily be brute forced by trying all possible 25 keys and then looking by eye to see if the plaintext was revealed (this too can be automated by checking for common English words to see if the solution was probable). However the much more elegant method of frequency analysis can be used.

Below is a table of the frequency of letters in the English language:

Letter Frequency (percent) Frequency (decimal) Normalised Frequency
a 8.17% 0.08167 0.64297
b 1.49% 0.01492 0.11746
c 2.78% 0.02782 0.21902
d 4.25% 0.04253 0.33483
e 12.70% 0.12702 1.00000
f 2.23% 0.02228 0.17541
g 2.02% 0.02015 0.15864
h 6.09% 0.06094 0.47977
i 6.97% 0.06966 0.54842
j 0.15% 0.00153 0.01205
k 0.77% 0.00772 0.06078
l 4.03% 0.04025 0.31688
m 2.41% 0.02406 0.18942
n 6.75% 0.06749 0.53133
o 7.51% 0.07507 0.59101
p 1.93% 0.01929 0.15187
q 0.10% 0.00095 0.00748
r 5.99% 0.05987 0.47134
s 6.33% 0.06327 0.49811
t 9.06% 0.09056 0.71296
u 2.76% 0.02758 0.21713
v 0.98% 0.00978 0.07700
w 2.36% 0.02360 0.18580
x 0.15% 0.00150 0.01181
y 1.97% 0.01974 0.15541
z 0.07% 0.00074 0.00583

And shown graphically:

English Letter Frequency

English Letter Frequency

Using the following code we can use frequency analysis to find the solution to ciphertext created using the Caesar shift demonstrated previously (see Caesar shift and Caesar shift using makestrans).

from Numeric import *
from string import maketrans

def translator(text,alphabet,key):
	trantab = maketrans(alphabet,key)
	return text.translate(trantab)

def caesar_decode(ciphertext,s):
	alpha="abcdefghijklmnopqrstuvwxyz"
	return translator(ciphertext,alpha,alpha[-s:]+alpha[:-s])




class frequency_analysis:
	def __init__(self, ciphertext):
		self.cor=[0.64297,0.11746,0.21902,0.33483,1.00000,0.17541,
		0.15864,0.47977,0.54842,0.01205,0.06078,0.31688,0.18942,
		0.53133,0.59101,0.15187,0.00748,0.47134,0.49811,0.71296,
		0.21713,0.07700,0.18580,0.01181,0.15541,0.00583]
		self.ciphertext=ciphertext.lower()
		self.freq()
		self.min_error()
		self.key=self.minimum[0]
		self.solution=caesar_decode(self.ciphertext,self.minimum[0])

	def freq(self):
		self.arr=zeros(26,Float64)
		for l in self.ciphertext:
			x=ord(l)
			if (x>=97 and x<=122):
				self.arr[x-97]+=1.0
		self.arr/=max(self.arr)

	def error(self):
		e=0
		for i in range(0,len(self.arr)):
			e+=abs(self.arr[i]-self.cor[i])**2
		return e

	def min_error(self):
		self.minimum=[0,10000]
		for rot in range(0,25):
			e=self.error()
			print rot,e
			if e<self.minimum[1]:
				self.minimum[1]=e
				self.minimum[0]=rot
			x=self.arr[-1]
			del self.cor[-1]
			self.cor.insert(0,x)

ciphertext="ymjwj fwj ybt ydujx tk jshwduynts: tsj ymfy bnqq "+\
"uwjajsy dtzw xnxyjw kwtr wjfinsl dtzw infwd fsi tsj ymfy bnqq "+\
"uwjajsy dtzw ltajwsrjsy. ymnx nx f ajwd nrutwyfsy qjxxts yt "+\
"wjrjgjwjxujhnfqqd ktw fyyfhpx zxnsl kwjvzjshd fsfqdxnx bmnhm "+\
"wjvznwj qtsljw ufxxflj tk yjcy ns twijw yt fhmnjaj gjyyjw wjxzqyx."
FA=frequency_analysis(ciphertext)
print FA.solution

This code will calculate the error in statistical frequency for each letter squared to generate an error for each possible rotation. Using a sufficiently long piece of ciphertext this code should accurately reveal the Caesar rotation use. The table below shows the error for each rotation:

Rotation Error
0 4.11797847386
1 3.05305477067
2 3.70059678828
3 3.66330931218
4 3.5078619579
5 0.361318100755
6 3.17289666386
7 3.66072641654
8 3.39769855873
9 1.74854802027
10 2.92550921273
11 2.67524757297
12 2.86847189573
13 3.06980318397
14 2.56886153328
15 2.17180117031
16 2.24503724763
17 2.95579718798
18 1.74002183444
19 1.83328601011
20 1.74779021766
21 2.71332097813
22 1.5409364067
23 1.83209213494
24 1.54904808883

The lowest error is for 5 rotations (correctly so) with an error of 0.361318100755, the next lowest error is 22 rotations with an error of 1.5409364067. This is ~4.3x difference, which gives a very large degree of confidence to our solution and below is the deciphered text.

there are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government. this is a very important lesson to remeberespecially for attacks using frequency analysis which require longer passage of text in order to achieve better results.

Future
The frequency analysis presented here can be used along with some other techniques in order to crack the viginere cipher.

Read More

Python: Cryptograph using maketrans for substitution and Caesar ciphers

I’ve rewritten the functions in my previous two posts (caesar and substitution ciphers) using the maketrans function from the strings module in python (With thanks to wumzi for pointing this out).

maketrans takes an input alphabet and an output alphabet and can then be used on a string. This can be used to greatly simplify the two ciphers I produced previously. Example below:

input_alphabet="abcde"
output_alphabet="12345"
trantab = maketrans(input_alphabet,output_alphabet)
text="translate abcdefg"
print text.translate(trantab)
# This will output:
# tr1nsl1t5 12345fg

This nows means my code can be rewritten in just a fraction of what it was before:

from random import shuffle
from string import maketrans

alphabet="abcdefghijklmnopqrstuvwxyz" + \
	 "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"+ \
         ":.;,?!@#$%&()+=-*/_<> []{}`~^"

def translator(text,alphabet,key):
	trantab = maketrans(alphabet,key)
	return text.translate(trantab)

def caesar_encode(plaintext,s,alphabet):
	return translator(plaintext,alphabet,alphabet[s:]+alphabet[:s])

def caesar_decode(ciphertext,s,alphabet):
	return translator(ciphertext,alphabet,alphabet[-s:]+alphabet[:-s])

def substitution_encode(plaintext,alphabet):
	randarray=range(0,len(alphabet))
	shuffle(randarray)
	key=""
	for i in range(0,len(alphabet)):
		key+=alphabet[randarray[i]]

	return translator(plaintext,alphabet,key),key

def substitution_decode(ciphertext,key,alphabet):
	return translator(ciphertext,key,alphabet)


# Example useage
plaintext="The wheels on the bus go round and round. round" + \
" and round. round and round. The wheels on the bus go round and"+ \
" round, all through the town!"


print 
print "SUBSTITUTION"
ciphertext,key=substitution_encode(plaintext,alphabet)
print "Key: ", key
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", substitution_decode(ciphertext,key,alphabet)

print 
print "CAESAR SHIFT"
ciphertext=caesar_encode(plaintext,5,alphabet)
print "Key: ", 5
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", caesar_decode(ciphertext,5,alphabet)

This will output the following

SUBSTITUTION
Key: &fywQ.%!lmx_sRGu:{<(5jqAXvMFgk]SIY[4iK+3P8pcV$2da@DT/ZnOJ*E>-r},BH16zUb#L?N`e7C ~9t)oW^=;h0
Plaintext: The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!
Cipertext: O!Q)q!QQ_<)GR)(!Q)f5<)%G){G5Rw)&Rw){G5Rw,){G5Rw)&Rw){G5Rw,{G5Rw)&Rw){G5Rw,)O!Q)q!QQ_<)GR)(!Q)f5<)%G){G5Rw)&Rw){G5RwH)&__)(!{G5%!)(!Q)(GqR6 Decoded : The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town! CAESAR SHIFT Key: 5 Plaintext: The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town! Cipertext: Ymj`2mjjqx`ts`ymj`gzx`lt`wtzsi`fsi`wtzsi@`wtzsi`fsi`wtzsi@wtzsi`fsi`wtzsi@`Ymj`2mjjqx`ts`ymj`gzx`lt`wtzsi`fsi`wtzsi$`fqq`ymwtzlm`ymj`yt2s& Decoded : The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!

Conclusion
maketrans is awesome 🙂

Read More

Python: Cryptography Substitution Cipher improving on the Caesar cipher

This post builds upon the Caesar shift presented previously; converting it to a full substitution cipher. The substitution cipher will practically remove bruteforce style methods of defeating the encryption and provide a basis for more complicated ciphers.

Subsitution Cipher
Because a Caesar shift only rotates the alphabet there are only 25 possible unique solutions, this leaves the cipher quite vulnerable to brute force. If rather than just rotating the alphabet and keeping it ‘linear’ we can shuffle it to create a substitution cipher. This improves the number of possible solutions to a shocking
(26! – 1) = 4.03291461126605635584e26 (unfortunatly, the substitution cipher is alot weaker than it seems as it is vunerable to several different cryptanalysis attacks).

Substitution Cipher Diagram

Substitution Cipher Diagram

We start with the simple Caesar shift function with the following changes:

  • A new array is introduced filled with the numbers 0 – the length of the alphabet. This array is shuffled using shuffle from the random module. The alphabet substitution dictionary is created using this array to decide which letters go where (this is probably clearer in the code than in my explanation).
  • With the Caesar shift we only needed to know the number of rotations in order to decrypt the text, now we need a full list of the letter substitutions. This is stored as the key and will be needed in order to decode our substitution cipher.
  • We also now store the alphabet outside of the function so that it can be used in a decode function. This function was added along with some example usage to make the full process more understandable
from random import shuffle

alphabet="abcdefghijklmnopqrstuvwxyz"

def substitution(alphabet,plaintext):

	# Create array to use to randomise alphabet position
	randarray=range(0,len(alphabet))
	shuffle(randarray)

	key=""

	#Create our substitution dictionary
	dic={}
	for i in range(0,len(alphabet)):
		key+=alphabet[randarray[i]]
		dic[alphabet[i]]=alphabet[randarray[i]]

	#Convert each letter of plaintext to the corrsponding
	#encrypted letter in our dictionary creating the cryptext
	ciphertext=""
	for l in plaintext:
		if l in dic:
			l=dic[l]
		ciphertext+=l
	for i in alphabet:
		print i,
	print
	for i in key:
		print i,
	print
	return ciphertext,key


# This function decodes the ciphertext using the key and creating
# the reverse of the dictionary created in substitution to retrieve
# the plaintext again
def decode(alphabet,ciphertext,key):
	
	dic={}
	for i in range(0,len(key)):
		dic[key[i]]=alphabet[i]

	plaintext=""
	for l in ciphertext:
		if l in dic:
			l=dic[l]
		plaintext+=l

	return plaintext

# Example useage
plaintext="the cat sat on the mat"
ciphertext,key=substitution(plaintext)
print "Key: ", key
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", decode(ciphertext,key)

Running this will output the following (This will be different on each run due to the use of random to generate the key).

Key: miylbsowutgdkfvjepqhazrncx
Plaintext: the cat sat on the mat
Cipertext: hwb ymh qmh vf hwb kmh
Decoded : the cat sat on the mat

Improvement by using additional characters
Adding additional characters into the substitution will it more difficult to solve. For example if we change our alphabet from:

alphabet=”abcdefghijklmnopqrstuvwxyz”

If we include capital letters, numbers from 0 -9 and special characters:

alphabet=”ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890″+ \
“:.;,?!@#$%&()+=-*/_<> []{}`~^”+ \
“abcdefghijklmnopqrstuvwxyz”

We increase the number of possible solutions from (26!-1) to (63!-1) which is 1.98260831540444006412e87. This also has the added benefit of making the encrypted text alot cooler and harder to guess at by eye (unfortunately still very easy to see what character represents space).

Key: MXjAarLzWqFePI7E botO5f1kym29RZd3Sh8JTiQGKVDp6YBCsU4nucNgwlx0vH
Plaintext: this plaintext will be much more of a challenge to decode compared to the Caesar shift cipher
Cipertext: tzWoHEeMWIta1tHfWeeHXaHPOjzHP7baH7rHMHjzMeeaILaHt7HAaj7AaHj7PEMbaAHt7HtzaHjMaoMbHozWrtHjWEzab
Decoded : this plaintext will be much more of a challenge to decode compared to the caesar shift cipher

Future
The substitution cipher is a lot more secure than Caesar shift cipher but unfortunately is very insecure towards frequency analysis. In future posts I will address using frequency analysis and methods to prevent this type of attack as well as improving on this cipher by creating multiple-dicitionary based ciphers to create Vigenère style ciphers.

I imagine most people reading this will enjoy the simple challenge of solving some encrypted text. I have used this code to make some ciphertext, try and decode it! (extra points for knowing where it is from):

^’VtuBbtv3vut1u-w.G^’Vt&vnu-tZBtZnuIvwvtwn
-qbtuB6GN3vutbqBS-qt.BSt&wB~vtV.tqv1wbG}u5t
~nDDv5tVvG}u5tbBwvtVvtbBt@nvIvZG}u5tbqwv6tv
3vw.t@nvIvtnubBt1tUnwvG}Ztbqv.t&Swuv5tnbtqS
wbt&vI1SZvt^t61ZtZBtq1@@.tUBwt.BSm)B6tbqvZv
t@BnubZtBUt51b1tV1~vt1t&v1SbnUSDtDnuvG}u5t6
v’wvtBSbtBUt&vb1GQv’wvtwvDv1Znu-tButbnVvG

(note: newlines were placed to make it fit and do not represent a character)

Update: Made code a little cleaner by moving dictionary outside of functions.

Read More

Python: Cryptography Caesar shift encryption (shift cipher)

I have always had a keen interest in cryptography and rather than give a brief history of cryptography I will recommend reading Simon Singh’s The code book or for a modern and hands on approach Applied Cryptography by Bruce Schneier (Who also made a brilliant book on security, more of descriptive approach but very interesting Secrets and Lies: Digital Security in a Networked World).

This post aims to detail the creation (in python) of one of the simplest forms of encryption; the simple Caesar shift (or shift cipher). The Caesar shift takes the normal alphabet and maps it to a an identical alphabet with a rotation. The cipher will be written in such a way that it can be easily expanded on to create more complex encryption schemes with little modification.

Caesar shift alphabet

Caesar shift alphabet diagram

The above image shows a diagrammatic representation of a Caesar shift of 3 (alphabet transposed onto a rotation of itself with a displacement of 3). Below shows the entire alphabet in plaintext and in ciphertext, followed by a simple sentence in plaintext and in cipher text.

Plaintext: abcdefghijklmnopqrstuvwxyz
Ciphertext: defghijklmnopqrstuvwxyzabc
Plaintext: the cat sat on the mat
Ciphertext: wkh fdw vdw rq wkh pdw

Below is the code to convert plaintext into ciphertext along with an example of the usage:

def caesar(plaintext,shift):

	alphabet=["a","b","c","d","e","f","g","h","i","j","k","l",
	"m","n","o","p","q","r","s","t","u","v","w","x","y","z"]

	#Create our substitution dictionary
	dic={}
	for i in range(0,len(alphabet)):
		dic[alphabet[i]]=alphabet[(i+shift)%len(alphabet)]

	#Convert each letter of plaintext to the corrsponding
	#encrypted letter in our dictionary creating the cryptext
	ciphertext=""
	for l in plaintext.lower():
		if l in dic:
			l=dic[l]
		ciphertext+=l

	return ciphertext

#Example useage
plaintext="the cat sat on the mat"
print "Plaintext:", plaintext
print "Cipertext:",caesar(plaintext,3)
#This will result in:
#Plaintext: the cat sat on the mat
#Cipertext: wkh fdw vdw rq wkh pdw

Explanation
Here we have written a function that is given the plaintext and an arbitary rotation (including negative) and returns a ciphertext. The function creates a dictionary, mapping each letter to another letter that is ‘shift’ letters away from it in the alphabet (the modulus (%) is used so to wrap the alphabet back to the start if it falls off the end). It then converts each letter of the plain text into the corresponding encrypted letter using the dictionary.

Future
In the next post I will discuss methods to improve the Caesar shift and how to turn it into a full substitution cipher (where the alphabet is shuffled rather than rotated). The Caesar shift is very susceptible to brute forcing and frequency analysis which I shall explain and create a program to defeat these encryptions in a future post.

Read More

Python: Wordwheel / WordCube solver

Often in newspapers there is a wordwheel or some variant, whereby you have to find as many words greater than 3 letters long, containing the centre word and using the letters no more than once. I have created a webpage that generates a “WordCube” daily for people to peruse at their leisure (www.stealthcopter.com/wordcube). This post contains the code and explanation of the solutions to wordcube’s (and all other word<insert shape here>).

WordCube from http://www.stealthcopter.com/wordcube for 12/12/2009
Example WordCube image for the 12th December 2009 from www.stealthcopter.com/wordcube/2009/12/12

Below is a function I wrote to check if an input was a valid anagram (or partial anagram, as it isn’t essential to use every letter). The function works by cycling over each letter of word we are testing (word), and checks if the letter is valid (checked against chkword). If the letter is valid then it removes the letter from the original word and moves to the next letter until we run out of letters (returns True) or if the letter is invalid (returns False).

def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return False
	return True

f=open('english', 'r')
word=raw_input('Input letters (starting with mandatory letter) :')
minlen=4
count=0
for l in f:
	l=l.strip()
	if len(l)<=len(word) and len(l)>=minlen and word[0] in l and anagramchk(l,word):
		if len(l)==len(word):
			print l,'  <-- Long word'
		else:
			print l
		count+=1
f.close()
print count

This will output a list of the possible words, along with a total. The results can be seen for the WordCube in the example above here (To prevent spoiling it if you’d like to have a go at it yourself).

As always I’d be interested to see if anyone knows any faster methods or any other general improvements or comments.

The dictionary file can be found here (not perfect):
here

Read More

Python: Highest common factor of two numbers

Finding the highest common factor is useful in mathematics and cryptography.

It would be computationally expensive to calculate all the common factors of two numbers and then compare to find the highest factor in common. Instead (as usual) there is a mathematical theory that can be used to speed up this process, in this case it’s the Euclidean algorithm.

The Euclidean algorithm subtracts the smaller number from the larger number, then using this new number and the smaller number repeats this process until the numbers are equal and hence the subtraction goes to zero. This is the highest common factor.

Example:
Find the highest common factor of 252 and 105

252-105=147
147-105=42
105-42=63
63-42=21
42-21=21
21-21=0

Below is the python code to preform this

def hcf(no1,no2):
	while no1!=no2:
		if no1>no2:
			no1-=no2
		elif no2>no1:
			no2-=no1
	return no1

This method is quite quick even and scales well for larger numbers. If you know of a faster method or can see any improvements to make then please let me know.

Read More

Python: Anagram solver

Finding the solutions to an anagram can be enjoyable and used as a sign of mental agility, with examples from long running television series (countdown) to online gambling skill games. It also makes an interesting tutorial for learning some simple python skills.

We start off by writing a function to check if a given word is an anagram of another word. This will be used to check a list of dictionary words against a given word in order to find its anagrams. The function accepts two string inputs, the word we are testing (word), and the given anagram to check it against (chkword).

The function loops over every letter in the word and if it is not present in chkword then we know that it cannot be an anagram of the chkword. Otherwise it is valid and that letter is removed from the chkword so that each letter may only be used once. For example checking “netting” with “testing” would fail on the second n.

def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1

Now we need to get the anagram from the user. The following using raw_input to read what the user types in the console and assign it to a variable, with the optional argument used to print text so the user knows they need to do something.

wordin=raw_input('Enter anagram:\n')

We can now join both these together and loop over our dictionary file. We strip each line to remove any extra space and gumph at the end of the lines. If the length of a line is less than 4 then it is ignored,this can be changed to whatever minimum you like, so if you want it to be the same length as the input you could replace it with len(line)==len(wordin). We then use the function already written earlier to check if the word is a valid anagram and print the word if it is. Then we finish by closing the file.

f=open('english.txt', 'r')
for line in f:
	line=line.strip()
	if len(line)>=4:
		if anagramchk(line,wordin):
			print line
f.close()

Hopefully this has been helpful to someone. As ever if you have any suggestions or improvements then please leave a comment. Below is the complete version of the code

def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1

wordin=raw_input('Enter anagram:')

f=open('english.txt', 'r')
for line in f:
	line=line.strip()
	if len(line)>=4:
		if anagramchk(line,wordin):
			print line
f.close()

The dictionary file I used was one I found in /usr/share/dict/british-english (most linux distro’s should have this or similar) it can also be found here

Read More

Python: Factors of a number

Project Euler often requires the factors of a number to be found, here is the function I have written for that purpose.

def factors(n):
	fact=[1,n]
	check=2
	rootn=sqrt(n)
	while check<rootn:
		if n%check==0:
			fact.append(check)
			fact.append(n/check)
		check+=1
	if rootn==check:
		fact.append(check)
        fact.sort()
	return fact

As always any help in speeding up my function would be appreciated. I think it could be improved using a sieve method or by checking if its divisible by 2 first then using check+=2 starting at 3 to improve the speed of the loop by a factor of 2.

Below is a the commented version to help explain what’s going on if it’s not clear.

def factors(n):
	# 1 and n are automatically factors of n
	fact=[1,n]
	# starting at 2 as we have already dealt with 1
	check=2
	# calculate the square root of n and use this as the 
	# limit when checking if a number is divisible as 
	# factors above sqrt(n) will already be calculated as 
	# the inverse of a lower factor IE. finding factors of 
	# 100 only need go up to 10 (sqrt(100)=10) as factors 
	# such as 25 can be found when 5 is found to be a 
	# factor 100/5=25
	rootn=sqrt(n)
	while check<rootn:
		if n%check==0:
			fact.append(check)
			fact.append(n/check)
		check+=1
	# this line checks the sqrt of the number to see if 
	# it is a factor putting it here prevents it appearing 
	# twice in the above while loop and putting it outside
	# the loop should save some time.
	if rootn==check:
		fact.append(check)
	# return an array of factors sorted into numerial order.
        fact.sort()
	return fact
Read More

Evolutionary Algorithms in python, queens example

So a friend of mine introduced me to Evolutionary Algorithms a while back and I got some lecture notes passed onto me explaining the basics and a simple example in pseudo-code.

After a bit of work I’ve written up the queens example as an evolutionary algorithm in python. Here goes:

The queens problem
The following is the quote from wikipedia regarding the problem.

The eight queens puzzle is the problem of putting eight chess queens on an 8×8 chessboard such that none of them is able to capture any other using the standard chess queen’s moves. The queens must be placed in such a way that no two queens would be able to attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal.

queens problem chess maths python
The above image is one of the many solutions to the problem. Hint: if you are looking for many solutions once you have found one, all of its 4 rotations (if unique) are solutions too.

Check out the wikipedia article for more information

Code
This is the current code with medium amount of comments, I have also provided a heavily commented version for people with less python experience (scroll to bottom).

# -*- coding: utf-8 -*-
##################################################################
##################################################################
# QUEENS EXAMPLE
# - Program Type: Demonstration of Evolutionary alogorithms
# - Authour: Matthew Rollings
# - Date: 14/09/09
##################################################################
##################################################################


# Import randint from the random module
from random import randint

# Creates a array I just use for convenience
board=[1,2,3,4,5,6,7,8]

##################################################################
# FUNCTIONS

# Function to check amount of attacking queens
def check(array):
 
 # Collisions represent number of attacking queens, starts at zero obv.
 collisions=0
 for i in range(1,9):
  if i not in array:
   print "DUPLICATE NUMBER ERROR - KILLING STUPID GENE" # Never happen 😉
   return 0

 # Total Collisions
 # For each queen in the array
 for i in range(0, 8):
  col=0
  # For every other queen in the array
  for j in range(0, 8):
   # avoids checking self vs self
   if i!=j:
    # if queen i is on a diagonal from queen j, then the
    # difference in magnitude x-cord will equal the difference in 
    # the magnitude of the y-coord
    if abs(board[i]-board[j])==abs(array[j]-array[i]):
     # Sets a variable to tell the next part that a collision was detected
     col=1
  # If a collision was detected add one to collisions
  if col==1:
   collisions+=1
 # Return 8-colllisions, so that 0 is bad (8 attacking queens) and 8 is good (no attacking queens)
 return 8-collisions


# The reproduce function, takes two arguements, the two parents
def reproduce(array1, array2):
 
 # FIRST BABY (mother first then father)
 # Takes first half of mothers gene
 baby=array1[0:4]
 # Can't just add two halves as I will get duplicate numbers
 for i in array2:
  # add fathers genes going in order of numbers left
  if i not in baby:
   baby.append(i)
 # Add a little variation by giving a percentage chance of mutation
 if randint(0,100)>20:
  baby=mutate(baby)
 # Add the baby to the population and add its corresponding fitness
 population.append(baby)
 fitness.append(check(baby))
 
 # SECOND BABY (arrays just swapped around, Father first then mother)
 baby=array2[0:4]
 for i in array1:
  if i not in baby:
   baby.append(i)
 if randint(0,100)>20:
  baby=mutate(baby)
 population.append(baby)
 fitness.append(check(baby)) 
 

# Mutate the array given to the function
def mutate(array1):
 #Chooses two random places (WARNING CAN BE SAME POSITION)
 a=randint(0,7)
 b=randint(0,7)
 # Swaps the two over (temporary var must be used (c))
 c=array1[a]
 array1[a]=array1[b]
 array1[b]=c
 return array1

# Prints the population and corresponding fitness to the screen
# Only used for debugging
def printpop():
 # Prints the group
 for i in range(0, len(population)):
  print population[i],fitness[i]


##################################################################
# VARIABLES 
# The size of the population (how many genes, more = faster convergance but more cpu + memory, fewer = opp.)
popsize=40
# The starting variation of the group because I havn't created a better method
# for starting off. I just add arrays of 1,2,3,4,5,6,7,8 to the population and 
# give them the number of mutations between the following two values to make 
# psudo random starting group
variation=[4,14]  
# How many of the genes to kill each time in percentage, each dead gene will be replaced by a new child 🙂
die=0.40 
# Kill limit is calculated from the above percentage
kill_limit=die*popsize



##################################################################
# MAIN

population=[]
fitness=[]
for i in range(0,popsize):
 population.append([1,2,3,4,5,6,7,8])
 a=0
 while a<randint(variation[0],variation[1]):
  a+=1
  population[i]=mutate(population[i])
 fitness.append(check(population[i]))


maxi=0
generations=1

while maxi!=8:
 
 # Picks top # in group to be parents, kills rest
 killed=0
 # starting at the lowest fitness (0 and increasing untill kill limit reached)
 x=0
 while killed<kill_limit:
  for i in range(0,popsize):
   # Try is here to catch any errors
   try:
    if fitness[i]==x:
     # This bit removes the crappy gene from the population and from fitness
     population.pop(i)
     fitness.pop(i)
     # increases the kill counter
     killed+=1
    if killed==kill_limit:
     break
   except:
    break
  # increments fitness
  x+=1
 
 babies=0 
 cpop=len(population)-1 #current population
 while babies<killed:
  # produces two babies from two random parents (should prob give fittest parents preference)
  reproduce(population[randint(0,cpop)],population[randint(0,cpop)])
  babies+=2 

 generations+=1

 # Looks for highest fitness in the group and checks if any have won!
 maxi=0
 for i in range(0,popsize):
  if fitness[i]>maxi:
   maxi=fitness[i]
   if maxi==8:
    print population[i]
    break

print "Took",generations,"generations"

Quick Analysis
I did a very quick analysis keeping the kill off percentage at 0.4 and varying the population size to see the improvement on generations required before a solution is found. I’ve also added a link (scroll to bottom) to the .ods file I used to calculate this (and an excel .xls file for the windows chaps). Below is the graph I calculated for different population sizes and it looks like we would expect, decreasing as the population increases. I only did 10 averages for each point and there were some anomalies, which are to be expected with this method of programming as it essentially relies on random numbers so it is a bit noisey:

eight queens problem python

Links
Note the following might be more desirable to copy and pasting the above as they use tabs rather than spaces.
queens.py – The normal version
queens_commented.py – The heavily commented version.

evolutionary.ods – Analysis in ods format (open office)
evolutionary.xls – Analysis in excel format

If you have any corrections or suggestions please contact me.

Read More

Python: Making multi depth directories

If you try and write a file to something like ‘subfolder/file.txt’ in python it wont automatically create the subfolder for you and it will fail with:

IOError: [Errno 2] No such file or directory: 'test2/file.txt'

I imagine this is deliberate as you don’t want to be accidentally creating folders. So to make a single folder we import the os module, and use the mkdir command

import os
os.mkdir('newfolder')

This is good, but if we try and make a subdirectories using the single command then it will fail as this command will only try and create the last directory in the string you pass. IE: if you pass dir1/dir2/dir3 it will assume dir1 and dir2 already exist and try to make dir3, which will fail with:

OSError: [Errno 2] No such file or directory: 'dir1/dir2/dir3'

So in order for this to work we would need to make it recursivly create subdirectories, but thankfully someone already went to the trouble for us with ‘supermakedir’ or makedirs.

import os
os.makedirs('dir1/dir2/dir3')

So this is probably useful enough for most people, but I needed to create folders in order for a file to get dropped in it, however I was only given the full path for the file not seperated into filename and directory path.

So I made the following function that accepts an input such as ‘dir1/dir2/dir3/filename.txt’ and cuts it into the folder path using os.path.dirname(filename), and then checks if the folder exists using os.path.exists(folder), and then makes the folders if they don’t already exist.

def mkdirnotex(filename):
	folder=os.path.dirname(filename)
	if not os.path.exists(folder):
		os.makedirs(folder)
Read More