Skip to content

Posts tagged ‘computational’

Find words with the most anagrams efficiently using python

Sep 2 10
by mat

Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.

Code

# -*- coding: utf-8 -*-
from time import time
from collections import defaultdict  

ag_len = 10 # Anagram word length
ag_min = 2  # Min # of anagrams
dictionary_path = '/usr/share/dict/british-english'
tic = time()

wd = defaultdict(set)  
for l in open (dictionary_path, 'r'):
	l=l.strip()
	if ag_len==len(l):
		wd["".join(sorted(l))].add (l)
	
for ws, wl in wd.iteritems():
	if len ( wl ) >= ag_min:
		print " ".join ( wl )


toc = time()
print toc-tic,'s' 

Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:

"".join(sorted('arranging')) = 'aagginnrr'

With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:

When the dictionary gets to megatons it will create a new key in the dicitonary like so:
{'aegmnost': set(['megatons'])}

Then to magnetos
{'aegmnost': set(['magnetos', 'megatons'])}

Then to montages:
{'aegmnost': set(['magnetos', 'megatons', 'montages'])}

Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.

All done, a very elegant and simple method to find words with several anagrams for a given word length.

Results

I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.

There is a 11 letter tripple anagram:

anthologies anthologise theologians

and some 8 letter with 4 or more anagrams:

painters pertains pantries repaints
resident nerdiest inserted trendies
salesmen lameness nameless maleness
strainer restrain terrains retrains trainers
altering triangle relating integral alerting
rangiest ingrates angriest gantries
parroted predator teardrop prorated
iterates teariest treatise treaties
trounces counters recounts construe

9 letter words with several anagrams

Sep 2 10
by mat

While perusing the statistics of wordcube, I was wondering how many 9 letter words have multiple anagrams (using all the letters in a single word) and what was the maximum number of anagrams. So I wrote a quick and dirty python program to find out. I will first show the results as they are interesting followed by my coding and methods to improve the efficiency of it.

Results
Here are all the nine letter words with more than 2 anagrams:

  • 1. auctioned cautioned education
  • 2. beastlier bleariest liberates
  • 3. cattiness scantiest tacitness
  • 4. countries cretinous neurotics
  • 5. cratering retracing terracing
  • 6. dissenter residents tiredness
  • 7. earthling haltering lathering
  • 8. emigrants mastering streaming
  • 9. estranges greatness sergeants
  • 10. gnarliest integrals triangles
  • 11. mutilates stimulate ultimates
  • 12. reprising respiring springier

I only found 12 sets of 3, there may be more with a larger dictionary. I was also disappointed that there were no words with 4 anagrams yet not entirely unsurprising. My personal favourite is number 10

Which is your favorite?

View Results

Loading ... Loading ...

Python

I recycled an anagram checking function that I have used before:

# -*- coding: utf-8 -*-

# Anagram checking function
def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1 

First program

Firstly I created a dirty program that created a loop to cycle through the 9 letter word dictionary and another loop nested inside to check against every word in the dictionary again. This is a terrible and inefficient method and will create duplicates, I will follow with a more efficient method.

g=open('eng-9-letter', 'r')
for l in g:
	
	wordin=l.strip()

	f=open('eng-9-letter', 'r')
	count=0
	w=""
	for line in f:
		line=line.strip()
		if anagramchk(line,wordin):
			count+=1
			w+=" "+line
	f.close()
	if count>2:
		print wordin, count, "(",w,")"

g.close()

This program took 80.42s to find the 12 solutions. On the path to better coding I decided to load the dictionary into memory, this sped the code up about 20s to 63.88s.

# Load dictionary into memory
dic=[]
f=open('eng-9-letter', 'r')
for line in f:
	dic.append(line.strip())
f.close()

I then attempted to create a method that loops over and removes words from the dictionary as it loops, however I don’t know the correct way (if there is one?) of modifying the loop variable while inside the loop without causing problems.

for word in dic:
	if ....:
		dic.remove(word)

If anyone knows a good method of doing this please let me know! I did managed to hack together something using slices so that I could modify the dictionary each time, however I imagine this is still quite inefficient.

for word in dic[:]:
	w=""
	count=0
	for word2 in dic[:]:
		if anagramchk(word,word2):
			count+=1
			dic.remove(word2)
			w+=word2+" "
	if count>2:
		print w

Even so this method now avoids duplication of results and completes in 31.87s (machine running at 3.15Ghz). Please let me know of any improvements you think can be made and I’ll happily benchmark to see how much better it is.

Cracking MD5 hashes (or passwords) ultra-fast with GPU acceleration

Jun 24 10
by mat

Do you want to crack MD5 hashes in at a rate of ~300MHash/s without a massive rainbow table? Do you have a CUDA enabled GFX card? If you said yes or maybe to these questions then read on for a brief introduction on how to compile and run a CUDA accelerated MD5 cracker (coded by Benjamin “Titan” Vernoux ).

Pre-Requisites and Downloading

Building in Ubuntu 10.04

Extract the archive and do a make on the source code. When doing this I came across two problems that can be fixed by modifying the common.mk file.

Problem 1: (cannot be declared weak)

$ make
/usr/include/string.h:43: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/usr/include/string.h:64: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/usr/include/bits/string3.h:49: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/usr/include/bits/string3.h:78: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/opt/cuda/bin/../include/common_functions.h:59: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/opt/cuda/bin/../include/common_functions.h:62: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/opt/cuda/bin/../include/math_functions.h:422: error: inline function ‘int __signbit(double)’ cannot be declared weak
/opt/cuda/bin/../include/math_functions.h:427: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/opt/cuda/bin/../include/math_functions.h:440: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/include/bits/mathinline.h:36: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathinline.h:42: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/include/bits/mathinline.h:48: error: inline function ‘int __signbitl(long double)’ cannot be declared weak

Solution 1

# Debug/release configuration
ifeq ($(dbg),1)
COMMONFLAGS += -g
NVCCFLAGS += -D_DEBUG
BINSUBDIR := debug
LIBSUFFIX := D
else
##############Change the following line to have -O0 instead of -O2
COMMONFLAGS += -O0
BINSUBDIR := release
LIBSUFFIX :=
NVCCFLAGS += –compiler-options -fno-strict-aliasing
CXXFLAGS += -fno-strict-aliasing
CFLAGS += -fno-strict-aliasing
endif

Problem 2: (lcudart)

$ make
/usr/bin/ld: skipping incompatible /opt/cuda/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: skipping incompatible /opt/cuda/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: cannot find -lcudart
collect2: ld returned 1 exit status
make: *** [bin/linux/release/gpu_md5_crack_0.2.3] Error 1

Solution 2

############## Change lib to lib64 if using a 64 bit operating system
LIB := -L$(CUDA_INSTALL_PATH)/lib64 -L$(LIBDIR) -L$(COMMONDIR)/lib64/$(OSLOWER) -L$(NVIDIA_SDK_PATH)/lib

Remember that you should “make clean” in-between each attempt to compile.

Benchmarking

Once it has compiled nicely you can give it a testdrive with its build in benchmark (with an NVIDIA 260 GFX card). Just run with the -b option:

./gpu_md5_crack_0.2.3 -b
GPU_MD5_Crack v0.2.3 09 July 2009 LGPL for BackTrack 4.
Copyright (C) 2009 TitanMKD (titanmkd@gmail.com).

Benchmark Start
Using default CUDA GPU device:0
Cuda device ID:0, Device name:GeForce GTX 260, supporting CUDA:1.3,
multiProcessorCount:27, clockRate:1466.00 MHz, TotalMem:895.31 MB
******* Test 0 Start *******
Expected Password: 1234567890
MD5 Hash:e807f1fcf82d132f9bb018ca6738a19f, Start Password:1200000000, Total pwd to check:1000000000
Charset used 0:0123456789
MD5 brute force started

MD5 Cracked pwd=1234567890 hash=e807f1fcf82d132f9bb018ca6738a19f
Instant 200.02 Mhash/s(40.00 ms)
Average 190.49 Mhash/s, Total Time:0.21s(210.00 ms)
MD5 brute force finished
******* Test 0 End *******

******* Test 1 Start *******
Expected Password: azerty
MD5 Hash:ab4f63f9ac65152575886860dde480a1, Start Password:, Total pwd to check:1000000000
Charset used 1:abcdefghijklmnopqrstuvwxyz
MD5 brute force started

MD5 Cracked pwd=azerty hash=ab4f63f9ac65152575886860dde480a1
Instant 200.02 Mhash/s(40.00 ms)
Average 240.02 Mhash/s, Total Time:0.10s(100.00 ms)
MD5 brute force finished
******* Test 1 End *******

******* Test 2 Start *******
Expected Password: azer09
MD5 Hash:41b9cabe6033932eb3037fc933060adc, Start Password:, Total pwd to check:1000000000
Charset used 2:abcdefghijklmnopqrstuvwxyz0123456789
MD5 brute force started
Progress 5%, Pwd:6lmea, Instant 280.02 Mhash/s(28.57 ms)
MD5 Cracked pwd=azer09 hash=41b9cabe6033932eb3037fc933060adc
Instant 266.69 Mhash/s(30.00 ms)
Average 287.20 Mhash/s, Total Time:0.39s(390.00 ms)
MD5 brute force finished
******* Test 2 End *******

******* Test 3 Start *******
Expected Password: AZBVSD
MD5 Hash:fd049008572788d60140aaead79336cc, Start Password:, Total pwd to check:1000000000
Charset used 3:ABCDEFGHIJKLMNOPQRSTUVWXYZ
MD5 brute force started

MD5 Cracked pwd=AZBVSD hash=fd049008572788d60140aaead79336cc
Instant 266.69 Mhash/s(30.00 ms)
Average 240.02 Mhash/s, Total Time:0.10s(100.00 ms)
MD5 brute force finished
******* Test 3 End *******

******* Test 4 Start *******
Expected Password: AZ09AA
MD5 Hash:7a552dd9cdd49acc5320bad9c29c9722, Start Password:, Total pwd to check:1000000000
Charset used 4:ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
MD5 brute force started
Progress 5%, Pwd:6LMEA, Instant 266.69 Mhash/s(30.00 ms)
MD5 Cracked pwd=AZ09AA hash=7a552dd9cdd49acc5320bad9c29c9722
Instant 266.69 Mhash/s(30.00 ms)
Average 280.02 Mhash/s, Total Time:0.40s(400.00 ms)
MD5 brute force finished
******* Test 4 End *******

******* Test 5 Start *******
Expected Password: zaZAab
MD5 Hash:aef49f70bb7b923b8bc0a018f916ef64, Start Password:zCAAAA, Total pwd to check:1000000000
Charset used 5:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
MD5 brute force started
Progress 17%, Pwd:zaDpoA, Instant 280.02 Mhash/s(28.57 ms)
MD5 Cracked pwd=zaZAab hash=aef49f70bb7b923b8bc0a018f916ef64
Instant 266.69 Mhash/s(30.00 ms)
Average 283.10 Mhash/s, Total Time:0.65s(650.00 ms)
MD5 brute force finished
******* Test 5 End *******

******* Test 6 Start *******
Expected Password: za0ZA9
MD5 Hash:062cc3b1302759722f48ac0b95b75803, Start Password:zaAAAA, Total pwd to check:1000000000
Charset used 6:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
MD5 brute force started

MD5 Cracked pwd=za0ZA9 hash=062cc3b1302759722f48ac0b95b75803
Instant 266.69 Mhash/s(30.00 ms)
Average 266.69 Mhash/s, Total Time:0.06s(60.00 ms)
MD5 brute force finished
******* Test 6 End *******

******* Test 7 Start *******
Expected Password: a^-*|
MD5 Hash:cf7dcf4c3eeb6255668393242fcce273, Start Password:a0000, Total pwd to check:1000000000
Charset used 7: !”#$%&’()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
MD5 brute force started

MD5 Cracked pwd=a^-*| hash=cf7dcf4c3eeb6255668393242fcce273
Instant 266.69 Mhash/s(30.00 ms)
Average 266.69 Mhash/s, Total Time:0.15s(150.00 ms)
MD5 brute force finished
******* Test 7 End *******

Benchmark End

So from the benchmark you can see that we are getting between 200 and 300 Mhash/s, that is about 250,000,000 hash attempts per second! AMAZING!!!

Number of combinations for different alphabets

Length 0-9 a-z a-z0-9 a-zA-Z a-zA-Z0-9
1 10 26 36 52 62
2 100 676 1,296 2,704 3,844
3 1,000 17,576 46,656 140,608 238,328
4 10,000 456,976 1,679,616 7,311,616 14,776,336
5 100,000 11,881,376 60,466,176 380,204,032 916,132,832
6 1,000,000 308,915,776 2,176,782,336 19,770,609,664 56,800,235,584
7 10,000,000 8,031,810,176 78,364,164,096 1,028,071,702,528 3,521,614,606,208
8 100,000,000 208,827,064,576 2,821,109,907,456 53,459,728,531,456 218,340,105,584,896
9 1,000,000,000 5,429,503,678,976 101,559,956,668,416 2,779,905,883,635,710 13,537,086,546,263,600
10 10,000,000,000 141,167,095,653,376 3,656,158,440,062,980 144,555,105,949,057,000 839,299,365,868,340,000

Estimated time (in seconds) to crack (at 250MHash/s)

Length 0-9 a-z a-z0-9 a-zA-Z a-zA-Z0-9
1 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00
3 0.00 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.01 0.03
5 0.00 0.02 0.12 0.76 1.83
6 0.00 0.62 4.35 39.54 113.60
7 0.02 16.06 156.73 2,056.14 7,043.23
8 0.20 417.65 5,642.22 106,919.46 436,680.21
9 2.00 10,859.01 203,119.91 5,559,811.77 27,074,173.09
10 20.00 282,334.19 7,312,316.88 289,110,211.90 1,678,598,731.74

Full calculations avaliable here: MD5 hash cracking time using GPU accelerated brute forcing

What now?
Well you can crack MD5′s at an extremely accelerated rate, so enjoy doing so responsibly (let your morals guide you :P ). You could also explore the source code and make additions as you see fit, I am planning on modifying it to allow an extra parameter so that prefixes can be added if you already know how the password starts. This can be the case when someone has prefixed the password with a known salt.

Compiling and running CUDA 2.3 SDK and toolkit on ubuntu 9.10 x64 (64-bit)

Feb 20 10
by mat

I’ve heard a lot about CUDA, such as how it is 10,000% faster at cracking wireless passwords over a conventional program/hardware, but never really got around to testing it out before now. This post details the steps required to compile and setup CUDA 2.3 SDK and toolkit on ubuntu 9.10.

Downloads
You are required to have an Nvidia graphics driver (relatively new version) already installed. First download the CUDA toolkit and CUDA sdk from the Nvidia CUDA 2.3 download page.

Install the toolkit

# Make file executable
chmod +x cudatoolkit_2.3_linux_64_ubuntu9.04.run
# Run it as superuser
sudo ./cudatoolkit_2.3_linux_64_ubuntu9.04.run

You now need to edit your .bashrc file in your home directory to include the paths (so your CUDA binaries can be found by the system)

export PATH=${PATH}:/usr/local/cuda/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

Note if you are using 32bit then “lib64″ should be replaced with just “lib”

Install the SDK

# Make file executable
chmod +x cudasdk_2.3_linux.run
# Run it as normal user
./cudasdk_2.3_linux.run

You should now have a NVIDIA_GPU_Computing_SDK folder in your home directory. Change directory into the C folder inside this one.

cd NVIDIA_GPU_Computing_SDK/C

In this folder is a make file which will compile all the Nvidia SDK and all the demos, in order for this to work in ubuntu 9.10 (x64) you will need to install several dependencies. By installing these before attempting to make will save you a lot of time, if you are getting errors please scroll down to the problems section to see if they are already covered.

# Install the necessary libraries
sudo apt-get install freeglut3 freeglut3-dev libx11-dev mesa-common-dev libxmu6

Making and running demos

You can then run the make command, once this is ran all of the executables will be placed in NVIDIA_GPU_Computing_SDK/C/bin/linux/released . We can check that our computer has an useable CUDA device install by running the deviceQuery program:

cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/released
./deviceQuery

This should output something similar to the following:

# ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: "GeForce GTX 260"
  CUDA Driver Version:                           2.30
  CUDA Runtime Version:                          2.30
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         3
  Total amount of global memory:                 938803200 bytes
  Number of multiprocessors:                     27
  Number of cores:                               216
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.47 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)

Test PASSED

Now that we can see CUDA is successfully installed and a suitable device is found we can run some of nvidia’s more ascetically pleasing demos:

./fluidsGL

CUDA SDK example fluidsGL on ubuntu 9.10 x64

CUDA SDK example fluidsGL on ubuntu 9.10 x64

./smokeParticles

CUDA SDK example smokeparticles on ubuntu 9.10 x64

CUDA SDK example smokeparticles on ubuntu 9.10 x64

./particles

CUDA SDK example particles on ubuntu 9.10 x64

CUDA SDK example particles on ubuntu 9.10 x64

./postProcessGL

CUDA SDK example postProcessGL on ubuntu 9.10 x64 (teapot)

CUDA SDK example postProcessGL on ubuntu 9.10 x64 (teapot)

Problems


libxi (Nvidia forum link)

make[1]: Leaving directory `/home/mat/NVIDIA_GPU_Computing_SDK/C/common'
make[1]: Entering directory `/home/mat/NVIDIA_GPU_Computing_SDK/C/common'
In file included from ./../common/inc/paramgl.h:24,
                 from src/paramgl.cpp:19:
./../common/inc/GL/glut.h:60:20: error: GL/glu.h: No such file or directory
make[1]: *** [obj/release/paramgl.cpp.o] Error 1
make[1]: Leaving directory `/home/mat/NVIDIA_GPU_Computing_SDK/C/common'
make: *** [lib/libparamgl.so] Error 2
sudo apt-get install freeglut3 freeglut3-dev libx11-dev mesa-common-dev
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/include/bits/mathinline.h:36: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathinline.h:42: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/include/bits/mathinline.h:48: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/local/cuda/bin/../include/math_functions.h:442: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
make[1]: *** [obj/release/particleSystem.cu.o] Error 255
make[1]: Leaving directory `/home/mat/NVIDIA_GPU_Computing_SDK/C/src/particles'
make: *** [src/particles/Makefile.ph_build] Error 2

The problem is due to having gcc 4.4 installed rather than 4.3, it is possible to install the older version of this compiler but it is simpler to modify common/common.mk and add the following extra flag (Nvidia forum link):

# Change:
NVCCFLAGS += --compiler-options -fno-strict-aliasing
# To:
NVCCFLAGS += --compiler-options -fno-strict-aliasing --compiler-options -fno-inline

and change the -O2

# Change:
COMMONFLAGS += -O2
# To: 
COMMONFLAGS += -O0

The two remaining errors you may encounter are very similar and arrise from missing libraries:

libxi (Nvidia forum link)

/usr/bin/ld: cannot find -lXi
collect2: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/particles] Error 1
sudo apt-get install libxi-dev

libxmu (Nvidia forum link)

/usr/bin/ld: cannot find -lXmu
collect2: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/particles] Error 1
sudo apt-get install libxmu-dev libxmu6

Python: Cryptography Caesar shift encryption (shift cipher)

Dec 21 09
by mat

I have always had a keen interest in cryptography and rather than give a brief history of cryptography I will recommend reading Simon Singh’s The code book or for a modern and hands on approach Applied Cryptography by Bruce Schneier (Who also made a brilliant book on security, more of descriptive approach but very interesting Secrets and Lies: Digital Security in a Networked World).

This post aims to detail the creation (in python) of one of the simplest forms of encryption; the simple Caesar shift (or shift cipher). The Caesar shift takes the normal alphabet and maps it to a an identical alphabet with a rotation. The cipher will be written in such a way that it can be easily expanded on to create more complex encryption schemes with little modification.

Caesar shift alphabet

Caesar shift alphabet diagram

The above image shows a diagrammatic representation of a Caesar shift of 3 (alphabet transposed onto a rotation of itself with a displacement of 3). Below shows the entire alphabet in plaintext and in ciphertext, followed by a simple sentence in plaintext and in cipher text.

Plaintext: abcdefghijklmnopqrstuvwxyz
Ciphertext: defghijklmnopqrstuvwxyzabc
Plaintext: the cat sat on the mat
Ciphertext: wkh fdw vdw rq wkh pdw

Below is the code to convert plaintext into ciphertext along with an example of the usage:

def caesar(plaintext,shift):

	alphabet=["a","b","c","d","e","f","g","h","i","j","k","l",
	"m","n","o","p","q","r","s","t","u","v","w","x","y","z"]

	#Create our substitution dictionary
	dic={}
	for i in range(0,len(alphabet)):
		dic[alphabet[i]]=alphabet[(i+shift)%len(alphabet)]

	#Convert each letter of plaintext to the corrsponding
	#encrypted letter in our dictionary creating the cryptext
	ciphertext=""
	for l in plaintext.lower():
		if l in dic:
			l=dic[l]
		ciphertext+=l

	return ciphertext

#Example useage
plaintext="the cat sat on the mat"
print "Plaintext:", plaintext
print "Cipertext:",caesar(plaintext,3)
#This will result in:
#Plaintext: the cat sat on the mat
#Cipertext: wkh fdw vdw rq wkh pdw

Explanation
Here we have written a function that is given the plaintext and an arbitary rotation (including negative) and returns a ciphertext. The function creates a dictionary, mapping each letter to another letter that is ‘shift’ letters away from it in the alphabet (the modulus (%) is used so to wrap the alphabet back to the start if it falls off the end). It then converts each letter of the plain text into the corresponding encrypted letter using the dictionary.

Future
In the next post I will discuss methods to improve the Caesar shift and how to turn it into a full substitution cipher (where the alphabet is shuffled rather than rotated). The Caesar shift is very susceptible to brute forcing and frequency analysis which I shall explain and create a program to defeat these encryptions in a future post.

Python: Highest common factor of two numbers

Dec 8 09
by mat

Finding the highest common factor is useful in mathematics and cryptography.

It would be computationally expensive to calculate all the common factors of two numbers and then compare to find the highest factor in common. Instead (as usual) there is a mathematical theory that can be used to speed up this process, in this case it’s the Euclidean algorithm.

The Euclidean algorithm subtracts the smaller number from the larger number, then using this new number and the smaller number repeats this process until the numbers are equal and hence the subtraction goes to zero. This is the highest common factor.

Example:
Find the highest common factor of 252 and 105

252-105=147
147-105=42
105-42=63
63-42=21
42-21=21
21-21=0

Below is the python code to preform this

def hcf(no1,no2):
	while no1!=no2:
		if no1>no2:
			no1-=no2
		elif no2>no1:
			no2-=no1
	return no1

This method is quite quick even and scales well for larger numbers. If you know of a faster method or can see any improvements to make then please let me know.

Python: Anagram solver

Nov 18 09
by mat

Finding the solutions to an anagram can be enjoyable and used as a sign of mental agility, with examples from long running television series (countdown) to online gambling skill games. It also makes an interesting tutorial for learning some simple python skills.

We start off by writing a function to check if a given word is an anagram of another word. This will be used to check a list of dictionary words against a given word in order to find its anagrams. The function accepts two string inputs, the word we are testing (word), and the given anagram to check it against (chkword).

The function loops over every letter in the word and if it is not present in chkword then we know that it cannot be an anagram of the chkword. Otherwise it is valid and that letter is removed from the chkword so that each letter may only be used once. For example checking “netting” with “testing” would fail on the second n.

def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1

Now we need to get the anagram from the user. The following using raw_input to read what the user types in the console and assign it to a variable, with the optional argument used to print text so the user knows they need to do something.

wordin=raw_input('Enter anagram:\n')

We can now join both these together and loop over our dictionary file. We strip each line to remove any extra space and gumph at the end of the lines. If the length of a line is less than 4 then it is ignored,this can be changed to whatever minimum you like, so if you want it to be the same length as the input you could replace it with len(line)==len(wordin). We then use the function already written earlier to check if the word is a valid anagram and print the word if it is. Then we finish by closing the file.

f=open('english.txt', 'r')
for line in f:
	line=line.strip()
	if len(line)>=4:
		if anagramchk(line,wordin):
			print line
f.close()

Hopefully this has been helpful to someone. As ever if you have any suggestions or improvements then please leave a comment. Below is the complete version of the code

def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1

wordin=raw_input('Enter anagram:')

f=open('english.txt', 'r')
for line in f:
	line=line.strip()
	if len(line)>=4:
		if anagramchk(line,wordin):
			print line
f.close()

The dictionary file I used was one I found in /usr/share/dict/british-english (most linux distro’s should have this or similar) it can also be found here

Python: Factors of a number

Nov 10 09
by mat

Project Euler often requires the factors of a number to be found, here is the function I have written for that purpose.

def factors(n):
	fact=[1,n]
	check=2
	rootn=sqrt(n)
	while check<rootn:
		if n%check==0:
			fact.append(check)
			fact.append(n/check)
		check+=1
	if rootn==check:
		fact.append(check)
        fact.sort()
	return fact

As always any help in speeding up my function would be appreciated. I think it could be improved using a sieve method or by checking if its divisible by 2 first then using check+=2 starting at 3 to improve the speed of the loop by a factor of 2.

Below is a the commented version to help explain what’s going on if it’s not clear.

def factors(n):
	# 1 and n are automatically factors of n
	fact=[1,n]
	# starting at 2 as we have already dealt with 1
	check=2
	# calculate the square root of n and use this as the 
	# limit when checking if a number is divisible as 
	# factors above sqrt(n) will already be calculated as 
	# the inverse of a lower factor IE. finding factors of 
	# 100 only need go up to 10 (sqrt(100)=10) as factors 
	# such as 25 can be found when 5 is found to be a 
	# factor 100/5=25
	rootn=sqrt(n)
	while check<rootn:
		if n%check==0:
			fact.append(check)
			fact.append(n/check)
		check+=1
	# this line checks the sqrt of the number to see if 
	# it is a factor putting it here prevents it appearing 
	# twice in the above while loop and putting it outside
	# the loop should save some time.
	if rootn==check:
		fact.append(check)
	# return an array of factors sorted into numerial order.
        fact.sort()
	return fact

Evolutionary Algorithms in python, queens example

Sep 14 09
by mat

So a friend of mine introduced me to Evolutionary Algorithms a while back and I got some lecture notes passed onto me explaining the basics and a simple example in pseudo-code.

After a bit of work I’ve written up the queens example as an evolutionary algorithm in python. Here goes:

The queens problem
The following is the quote from wikipedia regarding the problem.

The eight queens puzzle is the problem of putting eight chess queens on an 8×8 chessboard such that none of them is able to capture any other using the standard chess queen’s moves. The queens must be placed in such a way that no two queens would be able to attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal.

queens problem chess maths python
The above image is one of the many solutions to the problem. Hint: if you are looking for many solutions once you have found one, all of its 4 rotations (if unique) are solutions too.

Check out the wikipedia article for more information

Code
This is the current code with medium amount of comments, I have also provided a heavily commented version for people with less python experience (scroll to bottom).

# -*- coding: utf-8 -*-
##################################################################
##################################################################
# QUEENS EXAMPLE
# - Program Type: Demonstration of Evolutionary alogorithms
# - Authour: Matthew Rollings
# - Date: 14/09/09
##################################################################
##################################################################


# Import randint from the random module
from random import randint

# Creates a array I just use for convenience
board=[1,2,3,4,5,6,7,8]

##################################################################
# FUNCTIONS

# Function to check amount of attacking queens
def check(array):
 
 # Collisions represent number of attacking queens, starts at zero obv.
 collisions=0
 for i in range(1,9):
  if i not in array:
   print "DUPLICATE NUMBER ERROR - KILLING STUPID GENE" # Never happen ;) 
   return 0

 # Total Collisions
 # For each queen in the array
 for i in range(0, 8):
  col=0
  # For every other queen in the array
  for j in range(0, 8):
   # avoids checking self vs self
   if i!=j:
    # if queen i is on a diagonal from queen j, then the
    # difference in magnitude x-cord will equal the difference in 
    # the magnitude of the y-coord
    if abs(board[i]-board[j])==abs(array[j]-array[i]):
     # Sets a variable to tell the next part that a collision was detected
     col=1
  # If a collision was detected add one to collisions
  if col==1:
   collisions+=1
 # Return 8-colllisions, so that 0 is bad (8 attacking queens) and 8 is good (no attacking queens)
 return 8-collisions


# The reproduce function, takes two arguements, the two parents
def reproduce(array1, array2):
 
 # FIRST BABY (mother first then father)
 # Takes first half of mothers gene
 baby=array1[0:4]
 # Can't just add two halves as I will get duplicate numbers
 for i in array2:
  # add fathers genes going in order of numbers left
  if i not in baby:
   baby.append(i)
 # Add a little variation by giving a percentage chance of mutation
 if randint(0,100)>20:
  baby=mutate(baby)
 # Add the baby to the population and add its corresponding fitness
 population.append(baby)
 fitness.append(check(baby))
 
 # SECOND BABY (arrays just swapped around, Father first then mother)
 baby=array2[0:4]
 for i in array1:
  if i not in baby:
   baby.append(i)
 if randint(0,100)>20:
  baby=mutate(baby)
 population.append(baby)
 fitness.append(check(baby)) 
 

# Mutate the array given to the function
def mutate(array1):
 #Chooses two random places (WARNING CAN BE SAME POSITION)
 a=randint(0,7)
 b=randint(0,7)
 # Swaps the two over (temporary var must be used (c))
 c=array1[a]
 array1[a]=array1[b]
 array1[b]=c
 return array1

# Prints the population and corresponding fitness to the screen
# Only used for debugging
def printpop():
 # Prints the group
 for i in range(0, len(population)):
  print population[i],fitness[i]


##################################################################
# VARIABLES 
# The size of the population (how many genes, more = faster convergance but more cpu + memory, fewer = opp.)
popsize=40
# The starting variation of the group because I havn't created a better method
# for starting off. I just add arrays of 1,2,3,4,5,6,7,8 to the population and 
# give them the number of mutations between the following two values to make 
# psudo random starting group
variation=[4,14]  
# How many of the genes to kill each time in percentage, each dead gene will be replaced by a new child :) 
die=0.40 
# Kill limit is calculated from the above percentage
kill_limit=die*popsize



##################################################################
# MAIN

population=[]
fitness=[]
for i in range(0,popsize):
 population.append([1,2,3,4,5,6,7,8])
 a=0
 while a<randint(variation[0],variation[1]):
  a+=1
  population[i]=mutate(population[i])
 fitness.append(check(population[i]))


maxi=0
generations=1

while maxi!=8:
 
 # Picks top # in group to be parents, kills rest
 killed=0
 # starting at the lowest fitness (0 and increasing untill kill limit reached)
 x=0
 while killed<kill_limit:
  for i in range(0,popsize):
   # Try is here to catch any errors
   try:
    if fitness[i]==x:
     # This bit removes the crappy gene from the population and from fitness
     population.pop(i)
     fitness.pop(i)
     # increases the kill counter
     killed+=1
    if killed==kill_limit:
     break
   except:
    break
  # increments fitness
  x+=1
 
 babies=0 
 cpop=len(population)-1 #current population
 while babies<killed:
  # produces two babies from two random parents (should prob give fittest parents preference)
  reproduce(population[randint(0,cpop)],population[randint(0,cpop)])
  babies+=2 

 generations+=1

 # Looks for highest fitness in the group and checks if any have won!
 maxi=0
 for i in range(0,popsize):
  if fitness[i]>maxi:
   maxi=fitness[i]
   if maxi==8:
    print population[i]
    break

print "Took",generations,"generations"

Quick Analysis
I did a very quick analysis keeping the kill off percentage at 0.4 and varying the population size to see the improvement on generations required before a solution is found. I’ve also added a link (scroll to bottom) to the .ods file I used to calculate this (and an excel .xls file for the windows chaps). Below is the graph I calculated for different population sizes and it looks like we would expect, decreasing as the population increases. I only did 10 averages for each point and there were some anomalies, which are to be expected with this method of programming as it essentially relies on random numbers so it is a bit noisey:

eight queens problem python

Links
Note the following might be more desirable to copy and pasting the above as they use tabs rather than spaces.
queens.py – The normal version
queens_commented.py – The heavily commented version.

evolutionary.ods – Analysis in ods format (open office)
evolutionary.xls – Analysis in excel format

If you have any corrections or suggestions please contact me.