Archive

Posts Tagged ‘python’

Find words with the most anagrams efficiently using python

September 2nd, 2010 mat No comments

Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.

Code

# -*- coding: utf-8 -*-
from time import time
from collections import defaultdict  

ag_len = 10 # Anagram word length
ag_min = 2  # Min # of anagrams
dictionary_path = '/usr/share/dict/british-english'
tic = time()

wd = defaultdict(set)
for l in open (dictionary_path, 'r'):
	l=l.strip()
	if ag_len==len(l):
		wd["".join(sorted(l))].add (l)

for ws, wl in wd.iteritems():
	if len ( wl ) >= ag_min:
		print " ".join ( wl )

toc = time()
print toc-tic,'s'

Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:

"".join(sorted('arranging')) = 'aagginnrr'

With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:

When the dictionary gets to megatons it will create a new key in the dicitonary like so:
{'aegmnost': set(['megatons'])}

Then to magnetos
{'aegmnost': set(['magnetos', 'megatons'])}

Then to montages:
{'aegmnost': set(['magnetos', 'megatons', 'montages'])}

Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.

All done, a very elegant and simple method to find words with several anagrams for a given word length.

Results

I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.

There is a 11 letter tripple anagram:

anthologies anthologise theologians

and some 8 letter with 4 or more anagrams:

painters pertains pantries repaints
resident nerdiest inserted trendies
salesmen lameness nameless maleness
strainer restrain terrains retrains trainers
altering triangle relating integral alerting
rangiest ingrates angriest gantries
parroted predator teardrop prorated
iterates teariest treatise treaties
trounces counters recounts construe
Categories: python Tags: , ,

9 letter words with several anagrams

September 2nd, 2010 mat 10 comments

While perusing the statistics of wordcube, I was wondering how many 9 letter words have multiple anagrams (using all the letters in a single word) and what was the maximum number of anagrams. So I wrote a quick and dirty python program to find out. I will first show the results as they are interesting followed by my coding and methods to improve the efficiency of it.

Results
Here are all the nine letter words with more than 2 anagrams:

  • 1. auctioned cautioned education
  • 2. beastlier bleariest liberates
  • 3. cattiness scantiest tacitness
  • 4. countries cretinous neurotics
  • 5. cratering retracing terracing
  • 6. dissenter residents tiredness
  • 7. earthling haltering lathering
  • 8. emigrants mastering streaming
  • 9. estranges greatness sergeants
  • 10. gnarliest integrals triangles
  • 11. mutilates stimulate ultimates
  • 12. reprising respiring springier

I only found 12 sets of 3, there may be more with a larger dictionary. I was also disappointed that there were no words with 4 anagrams yet not entirely unsurprising. My personal favourite is number 10

Which is your favorite?

View Results

Loading ... Loading ...

Python

I recycled an anagram checking function that I have used before:

# -*- coding: utf-8 -*-

# Anagram checking function
def anagramchk(word,chkword):
	for letter in word:
		if letter in chkword:
			chkword=chkword.replace(letter, '', 1)
		else:
			return 0
	return 1 

First program

Firstly I created a dirty program that created a loop to cycle through the 9 letter word dictionary and another loop nested inside to check against every word in the dictionary again. This is a terrible and inefficient method and will create duplicates, I will follow with a more efficient method.

g=open('eng-9-letter', 'r')
for l in g:

	wordin=l.strip()

	f=open('eng-9-letter', 'r')
	count=0
	w=""
	for line in f:
		line=line.strip()
		if anagramchk(line,wordin):
			count+=1
			w+=" "+line
	f.close()
	if count>2:
		print wordin, count, "(",w,")"

g.close()

This program took 80.42s to find the 12 solutions. On the path to better coding I decided to load the dictionary into memory, this sped the code up about 20s to 63.88s.

# Load dictionary into memory
dic=[]
f=open('eng-9-letter', 'r')
for line in f:
	dic.append(line.strip())
f.close()

I then attempted to create a method that loops over and removes words from the dictionary as it loops, however I don’t know the correct way (if there is one?) of modifying the loop variable while inside the loop without causing problems.

for word in dic:
	if ....:
		dic.remove(word)

If anyone knows a good method of doing this please let me know! I did managed to hack together something using slices so that I could modify the dictionary each time, however I imagine this is still quite inefficient.

for word in dic[:]:
	w=""
	count=0
	for word2 in dic[:]:
		if anagramchk(word,word2):
			count+=1
			dic.remove(word2)
			w+=word2+" "
	if count>2:
		print w

Even so this method now avoids duplication of results and completes in 31.87s (machine running at 3.15Ghz). Please let me know of any improvements you think can be made and I’ll happily benchmark to see how much better it is.

Installing PyCuda on ubuntu 10.04

July 8th, 2010 mat No comments

Using graphics cards to hand massively parallel tasks can now be realised in python, thanks to PyCuda a module for python which allows interaction with the CUDA libaries/binaries provided by Nvidia. To get this working in ubuntu 10.04 lucid lynx I followed this guide http://wiki.tiker.net/PyCuda/Installation/Linux/Ubuntu.

Prep / Installation:

First install CUDA, most places will tell you that CUDA is incompatible with gcc-4.3 but this is not true if you make a few small changes to the configuration.mk file, please see this post about installing CUDA in ubuntu

Install the required packages (or use package manager):

sudo apt-get install python-numpy -y
sudo apt-get install build-essential python-dev python-setuptools libboost-python-dev -y

Download pycuda from: http://pypi.python.org/pypi/pycuda

Untar the achive:

tar xzvf pycuda-0.94rc.tar.gz

Configure, make and install:

cd pycuda-0.94rc
./configure.py –cuda-root=/usr/local/cuda –cudadrv-lib-dir=/usr/lib –boost-inc-dir=/usr/include –boost-lib-dir=/usr/lib –boost-python-libname=boost_python-mt –boost-thread-libname=boost_thread-mt
make -j 4
sudo python setup.py install

Problems

I had some problems with the compliation because it was complaining about pytools missing, this was resolved by removing and reinstalling python-setuptools:

sudo apt-get remove python-setuptools
sudo apt-get install python-setuptools

Running the example

You should now be able to run the hellogpu.py demo in the examples folder and use the download-examples-from-wiki.py to download further demos from the pycuda wiki.

Next
I plan to learn how to use PyCuda and aim to make a MD5 cracker as discussed previously (here and here)

Categories: python Tags: ,

Cracking real world salted MD5 passwords in python with several dictionaries

June 28th, 2010 mat 14 comments

Recently a friend (who will remain unnamed for obvious reasons) asked me to penetration test a website he created. I found a very simple exploit where I could upload an avatar but the file was not checked to ensure it was an image, so I uploaded a php script I wrote an began exploring the server. I printed out all of the usernames, passwords and salts from the database to see how many of the 1,109 passwords could be easily cracked.

The passwords were stored as MD5 hashes with a random 6 character alphanumeric salt. To create the MD5 hash of the password the salt was prefixed to the password and then the combination was hashed. Thanks to this method we can employ a simple bruteforce/dictionary attack on the passwords. I will start with the wordlists creation, then results I obtained to keep your interest, and finally show my python code.

Creating wordlists
I already has two reasnoble sized dictionaries that I use for different things like wordcube. I used john the ripper on my double sized dictionary to create lots of common permutations on words, such as captial first letter, and a number affixed to the end. To do this you run john with the following parameters, where dic.txt is the input dictionary and dic_plus_rules.txt is the output from john with all of the additions it has made.

john –wordlist=dic.txt –rules –stdout > dic_plus_rules.txt

I also download two wordlists from openwall, one which is a list of ~3100 common passwords, and one labelled ALL that has a large amount of words (~4 million) in various languages. Because of the highly compressible nature of text the files are available in small gzip files. ALL is 11.5Mb which unzips to 41.4Mb and password 12kb which unzips to 21.8kb. There are also more wordlists avaliable for different languages, but the ALL file includes these.

The size of all of the wordlists I used is shown below:

Dictionary Combinations
English 42,987
Double-English 80,368
Double+john-rules 3,986,706
Openwall Common Passwords 3,158
Openwall ALL 3,917,116

Results

Dictionary Cracked Percentage Time
English 60 5.41% 80s
Double-English 65 5.86% 170s
Double+john-rules 116 10.46% 2.5hrs (8393s)
Openwall Common Passwords 112 10.10% 7s
Openwall All 210 18.94% 2.45hrs (8829s)
Total Passwords Obtained 254 22.90% ~5hrs

Comical passwords

Here are some of the more amusingly bad passwords, the number in brackets shows the frequency of the password.

Crap passwords: 123456 (18), password (4), 1234567 (4), 123456789 (3) 12345678 (2), 12345 (2), abc123 (2), asdfgh (2), nintendo (2), 123123, abcd1234, abcdefg, qwerty
Self-describing passwords: catholic, cowboy, creator, doger, ginger, killer, maggot, player, princess, skater, smallcock, smooth, super, superman, superstar, tester, veggie, winner, wolverine
Some other passwords:bananas, cheese, cinnamon, hampster ,DRAGON, dribble1, poopie, poopoo

Python Program

# -*- coding: utf-8 -*-
#pymd5cracker.py
import hashlib, sys
from time import time

# Change to commandline swtiches when you have the time!
hash = ""
hash_file = "hash2.csv"
wordlist = "mass_rules.txt"; 

# Read the hash file entered
try:
	hashdocument = open(hash_file,"r")
except IOError:
	print "Invalid file."
	raw_input()
	sys.exit()
else:
	# Read the csv values seperated by colons into an array
	hashes=[]
	for line in hashdocument:
		line=line.replace("\n","")
		inp = line.split(":")
		if (line.count(":")<2):
			inp.append("")
		hashes.append(inp)
	hashdocument.close();

# Read wordlist in
try:
	wordlistfile = open(wordlist,"r")
except IOError:
	print "Invalid file."
	raw_input()
	sys.exit()
else:
	pass

tested=0
cracked=0
tic = time()
for line in wordlistfile:

	line = line.replace("\n","")
	tested+=1
	for i in range(0,len(hashes)):

		m = hashlib.md5()
		m.update(hashes[i][2]+line)
		word_hash = m.hexdigest()
		if word_hash==hashes[i][1]:
			toc = time()
			cracked+=1
			hashes[i].append(line)
			print hashes[i][0]," : ", line, "\t(",time()-tic,"s)"

	# Show progress evey 1000 passwords tested
	if tested%1000==0:
		print "Cracked: ",cracked," (",tested,") ", line

# Save the output of this program so we can use again
# with another program/dictionary adding the password
# to each line we have solved.
crackout = open("pycrackout.txt","w")
for i in hashes:
	s=""
	for j in i:
		if s!="":
			s+=":"
		s+=j
	s+="\n"
	crackout.write(s)
crackout.close()

print "Passwords found: ",cracked,"/",len(hashes)
print "Wordlist Words :", test
print "Hashes computed: ",len(hashes)*tested
print "Total time taken: ",time()-tic,'s'

Next

  • Play with more dictionaries
  • Speed up code:
    • Add multi-threading: My experience with multi-threading in python is that it doesn't work well for cpu intensive tasks, if you know otherwise please let me know.
    • Have a look at PyCUDA to see if I can use my graphics card to speed up the code significantly (another type of mutli-threading really...) without having to change language like in my previous post of CUDA MD5 cracking
  • Remove hash once found to stop pointless checking
  • Add command line switches to all it to be used like a real program

How to create a cryptogram in python (random substitution cipher)

April 20th, 2010 mat 3 comments

Cryptograms are enjoyable puzzles created from a saying or phrase encrypted with a substitutional cipher. They can be fun to decipher by hand by looking for common letter combinations, doublets, guesswork, and other flaws in this encryption mechanism.

I wrote a quick python script which will accept an input text and create a random substitutional cipher and encrypt it. It then outputs the cipher alphabet and the encrypted text.

Source code:

# -*- coding: utf-8 -*-
import sys
from random import randint
from string import maketrans

if (len(sys.argv)>1):
	# Normal alphabet
	alphabet="abcdefghijklmnopqrstuvwxyz"

	# Randomly create a new cipherbet
	cipherbet=""
	left=alphabet
	for i in range(0,len(alphabet)):
		x=randint(0,len(left)-1)
		cipherbet+=left[x]
		left=left[:x]+left[x+1:]

	# Get input text to translate
	text=sys.argv[1].lower()

	trantab = maketrans(alphabet,cipherbet)
	text=text.translate(trantab)

	# Replace unused letters in cipherbet with _'s
	for i in cipherbet:
		if i not in text:
			cipherbet=cipherbet.replace(i,"_")

	# Print cipherbet (solution) and the text (cryptogram)
	print cipherbet
	print text

Example usage

python create_cipher.py “The Science gets done. And you make a neat gun. For the people who are still alive.”
b_lpievrm_acqxuj_fzdgwn_o_
dri zlmixli vidz puxi. bxp oug qbai b xibd vgx. euf dri jiujci nru bfi zdmcc bcmwi.

Categories: cryptography, python Tags: ,

Saving settings in python with YAML (an XML alternative)

February 25th, 2010 mat 2 comments

When transferring information or settings between computers and programs is is useful to use something simple and program independent. For a recent project I need to send a settings file between a PHP script and a python program, I needed something that would support trees so an INI file was out of the question, and I naturally thought of using XML, however when looking for more information I stumbled upon YAML (seemingly forgotten alternative to XML).

YAML is a recursive acronym that stands for YAML Ain’t Markup Language and below is a simplified example taken from the YAML site. The layout is similar to python with indents representing the nested layers. Semicolons are used to seperate variable names and value pairs.

name   : Joe Bloggs
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.5

To use YAML in python you will need the PyYAML library avaliable here or in via your package manager in linux. (I.E. “sudo apt-get install http://pyyaml.org/wiki/PyYAML” in ubuntu/debian).

The code for then using YAML is very simple, first we import the yaml library, then we open our settings file we have created and then use yaml’s load feature to load our settings into a python dictionary, and then finally printing it out so we can see what it contains

import yaml
f=open('settings.cfg')
settings=yaml.load(f)
print settings

Using the example YAML file above saved to settings.cfg this python script will output the following when run:

{'product': [{'sku': 'BL394D', 'price': 450.0, 'description':
'Basketball', 'quantity': 4}, {'sku': 'BL4438H', 'price': 2392.0,
'description': 'Super Hoop', 'quantity': 1}], 'total': 4443.5, 'tax':
251.41999999999999, 'name': 'Joe Bloggs'}

We can now access all the information in the file very easily (I.E. settings['name'] will give us the name Joe Bloggs).

YAML supports much much more than what is discussed here and further information can be found on the YAML website.

Categories: python Tags: , ,

Advancing PyMan (using python’s pygame to recreate pacman)

February 18th, 2010 mat No comments

A while back I followed a few tutorials on creating a pacman game in python using pygame. I took the resulting code from these tutorials and added several enhancements. Unfortunately I haven’t had enough time to continue with the code, so hopefully this code is of use to someone learning python to play with and further enhance (Who knows might end up feature complete!).

Screen shot of PyMan in action

Screen shot of PyMan in action

Enhancements over original code

  • start menu
  • sounds
  • multiple ghosts
  • not having to hold down a directional button constantly
  • can press a direction before a turn and will turn at next opertunity (like real pacman)
  • probably more stuff that I’ve forgotten about by now

Source
Source Files – The source and files, to run the game just run “python PyMan.py”

References
I believe the site where I got the original code from was learningpython.com (not 100% sure) and the following links to the tutorials:
Tutorial 1
Tutorial 2
Tutorial 3

Categories: python Tags: ,

Python: interfacing with an arduino

February 3rd, 2010 mat 3 comments

So what is an arduino?
An arduino is an open source open hardware programmable controller with several inputs and outputs. The image below shows an Ardunio Dicemella.

Ardunio Dicemella Annotated Photo

Ardunio Dicemella Annotated Photo

It (Arduino Dicemella) has 14 digital input/output pins (of which 6 can be used as PWM outputs), 6 analog inputs, a 16 MHz crystal oscillator, a USB connection, a power jack, an ICSP header, and a reset button. It contains everything needed to support the microcontroller; simply connect it to a computer with a USB cable or power it with a AC-to-DC adapter or battery to get started.

They are very useful for people who know how to program but have little experience with hardware interaction.

Programming the arduino
This post will not contain in-depth detail on how to program the arduino, instead focussing briefly on setting up serial (over serial or usb cable) communications in order to talk to a python script. The arduino can be programmed via a IDE provided by the creators in a C-style hardware language.

Code example

int ledPin = 13;            // choose the pin for the LED
int inputPin = 2;          // choose the input pin (for a pushbutton)
int val = 0;                // variable for reading the pin status

void setup() {
  pinMode(ledPin, OUTPUT);      // declare LED as output
  pinMode(inputPin, INPUT);     // declare pushbutton as input
}

void loop(){
  val = digitalRead(inputPin);  // read input value
  if (val == HIGH) {            // check if the input is HIGH
    digitalWrite(ledPin, HIGH);  // turn LED ON
  } else {
    digitalWrite(ledPin, LOW); // turn LED OFF
  }
}
Arduino LED switch circuit off

Arduino LED switch circuit off

Arduino LED switch circuit on

Arduino LED switch circuit on

Now we add a few lines to enable the writing of information from our arduino over the serial connection. We first need to set up the transfer speed in our setup (Serial.begin(9600);). Then we can simply send messages over serial using Serial.print(“message\n”);. You can choose between print and println with the difference been that the latter automatically appends the newline char, so we would use the former to write multiple things to the same line. Below is our modified code:

Serial write example

int ledPin = 13;           // choose the pin for the LED
int inputPin = 2;         // choose the input pin (for a pushbutton)
int val = 0;               // variable for reading the pin status

void setup() {
  pinMode(ledPin, OUTPUT);      // declare LED as output
  pinMode(inputPin, INPUT);     // declare pushbutton as input
  Serial.begin(9600);
  Serial.print("Program Initiated\n");
}

void loop(){
  val = digitalRead(inputPin);  // read input value
  if (val == HIGH) {            // check if the input is HIGH
    digitalWrite(ledPin, HIGH);  // turn LED ON
    Serial.print("LED Activated\n");
  } else {
    digitalWrite(ledPin, LOW); // turn LED OFF
  }
}

We now add into this code the ability to receive information via serial. Below is the modified example which removes the action of the button and replaces it by activating the LED when ‘Y’ is sent via serial.

Serial read example

int ledPin = 13;  // choose the pin for the LED
int val = 0;      // variable for reading the pin status
char msg = '  ';   // variable to hold data from serial

void setup() {
  pinMode(ledPin, OUTPUT);      // declare LED as output
  Serial.begin(9600);
  Serial.print("Program Initiated\n");
}

void loop(){
        // While data is sent over serial assign it to the msg
	while (Serial.available()>0){
		msg=Serial.read();
	}

  // Turn LED on/off if we recieve 'Y'/'N' over serial
  if (msg=='Y') {
    digitalWrite(ledPin, HIGH);  // turn LED ON
    Serial.print("LED Activated\n");
    msg=' ';
  } else if (msg=='N') {
    digitalWrite(ledPin, LOW); // turn LED OFF
  }
}

Interaction with python

First we import the serial library to python in order to communicate with the arduino (this includes talking over usb).

import serial

We then attempt to connect to our arduino on /dev/ttyUSB0, using try and except to catch an exception if we are unable to find the arduino on USB0. The 9600 corresponds to the baud rate (speed of communication) that we are using with the arduino and should be the same as set in the program on the arduino otherwise your communication may appear garbled.

try:
	arduino = serial.Serial('/dev/ttyUSB0', 9600)
except:
	print "Failed to connect on /dev/ttyUSB0"

The address will be /dev/ttyUSB# where # is replaced by a number for arduinos connected via usb and /dev/ttyS# where # is replaced by a number for arduinos connected via serial. If you are not sure of the location of your arduino, it can be found in the arduino IDE or you can write some python to scroll through possible locations until a response is found

locations=['/dev/ttyUSB0','/dev/ttyUSB1','/dev/ttyUSB2','/dev/ttyUSB3',
'/dev/ttyS0','/dev/ttyS1','/dev/ttyS2','/dev/ttyS3']

for device in locations:
	try:
		arduino = serial.Serial(device, 9600)
	except:
		print "Failed to connect on",device

You may need to be careful as other devices can be connected. For example if I try to connect to /dev/ttyS0 I will connect to the wacom tablet on my laptop.

Once you have connected to your arduino successfully you can write information to it using write and read information sent from it using read (you will need to import time to use the sleep function). If your arduino does not send any messages via serial then attempting to readline will result in your program hanging until it receives a message.

try:
	arduino.write('Y')
	time.sleep(1)
	print arduino.readline()
except:
	print "Failed to send!"

So the python code should now look like the following and we should be able to control the LED over serial.

import serial
import time

locations=['/dev/ttyUSB0','/dev/ttyUSB1','/dev/ttyUSB2','/dev/ttyUSB3',
'/dev/ttyS0','/dev/ttyS1','/dev/ttyS2','/dev/ttyS3']  

for device in locations:
	try:
		print "Trying...",device
		arduino = serial.Serial(device, 9600)
		break
	except:
		print "Failed to connect on",device   

try:
    arduino.write('Y')
    time.sleep(1)
    print arduino.readline()
except:
    print "Failed to send!"

The above will send the character ‘Y’ (Y for Yes please turn on the LED) to the arduino wait for 1 second and then read from the arduino which will have hopefully posted a response to our ‘Y’. Using the program on this should turn the LED on, and report LED Activated back via serial to our python program. This should be enough for people to get started with ardunios and communicating with them in python.

References

  • Arduino – The arduino website with everything you are likely to need (programming examples and reference guide, and hardware information)
  • Arduino tutorial – a basic and easy to understand tutorial on programming the arduino
  • Python port of arduino-serial.c – By John Wiseman from which I based my program.
  • original arduino-serial.c – by Tod E. Kurt.
  • Sparkfun – Here is a good place to purchase ardunio and other electronics parts. Try coolcomponents if your from the uk like me
  • Dealextreme – Hong Kong based retailer that sells a lot of cheap DIY electronics and also has worldwide free delivery with no min spend (crazy). Does take about two weeks to arrive though (uk).
Categories: arduino, python Tags: , ,

Python: Cryptography decoding a Caesar shift (frequency analysis)

January 4th, 2010 mat No comments

Due to the simple nature of the Caesar cipher, it could easily be brute forced by trying all possible 25 keys and then looking by eye to see if the plaintext was revealed (this too can be automated by checking for common English words to see if the solution was probable). However the much more elegant method of frequency analysis can be used.

Below is a table of the frequency of letters in the English language:

Letter Frequency (percent) Frequency (decimal) Normalised Frequency
a 8.17% 0.08167 0.64297
b 1.49% 0.01492 0.11746
c 2.78% 0.02782 0.21902
d 4.25% 0.04253 0.33483
e 12.70% 0.12702 1.00000
f 2.23% 0.02228 0.17541
g 2.02% 0.02015 0.15864
h 6.09% 0.06094 0.47977
i 6.97% 0.06966 0.54842
j 0.15% 0.00153 0.01205
k 0.77% 0.00772 0.06078
l 4.03% 0.04025 0.31688
m 2.41% 0.02406 0.18942
n 6.75% 0.06749 0.53133
o 7.51% 0.07507 0.59101
p 1.93% 0.01929 0.15187
q 0.10% 0.00095 0.00748
r 5.99% 0.05987 0.47134
s 6.33% 0.06327 0.49811
t 9.06% 0.09056 0.71296
u 2.76% 0.02758 0.21713
v 0.98% 0.00978 0.07700
w 2.36% 0.02360 0.18580
x 0.15% 0.00150 0.01181
y 1.97% 0.01974 0.15541
z 0.07% 0.00074 0.00583

And shown graphically:

English Letter Frequency

English Letter Frequency

Using the following code we can use frequency analysis to find the solution to ciphertext created using the Caesar shift demonstrated previously (see Caesar shift and Caesar shift using makestrans).

from Numeric import *
from string import maketrans

def translator(text,alphabet,key):
	trantab = maketrans(alphabet,key)
	return text.translate(trantab)

def caesar_decode(ciphertext,s):
	alpha="abcdefghijklmnopqrstuvwxyz"
	return translator(ciphertext,alpha,alpha[-s:]+alpha[:-s])

class frequency_analysis:
	def __init__(self, ciphertext):
		self.cor=[0.64297,0.11746,0.21902,0.33483,1.00000,0.17541,
		0.15864,0.47977,0.54842,0.01205,0.06078,0.31688,0.18942,
		0.53133,0.59101,0.15187,0.00748,0.47134,0.49811,0.71296,
		0.21713,0.07700,0.18580,0.01181,0.15541,0.00583]
		self.ciphertext=ciphertext.lower()
		self.freq()
		self.min_error()
		self.key=self.minimum[0]
		self.solution=caesar_decode(self.ciphertext,self.minimum[0])

	def freq(self):
		self.arr=zeros(26,Float64)
		for l in self.ciphertext:
			x=ord(l)
			if (x>=97 and x<=122):
				self.arr[x-97]+=1.0
		self.arr/=max(self.arr)

	def error(self):
		e=0
		for i in range(0,len(self.arr)):
			e+=abs(self.arr[i]-self.cor[i])**2
		return e

	def min_error(self):
		self.minimum=[0,10000]
		for rot in range(0,25):
			e=self.error()
			print rot,e
			if e<self.minimum[1]:
				self.minimum[1]=e
				self.minimum[0]=rot
			x=self.arr[-1]
			del self.cor[-1]
			self.cor.insert(0,x)

ciphertext="ymjwj fwj ybt ydujx tk jshwduynts: tsj ymfy bnqq "+\
"uwjajsy dtzw xnxyjw kwtr wjfinsl dtzw infwd fsi tsj ymfy bnqq "+\
"uwjajsy dtzw ltajwsrjsy. ymnx nx f ajwd nrutwyfsy qjxxts yt "+\
"wjrjgjwjxujhnfqqd ktw fyyfhpx zxnsl kwjvzjshd fsfqdxnx bmnhm "+\
"wjvznwj qtsljw ufxxflj tk yjcy ns twijw yt fhmnjaj gjyyjw wjxzqyx."
FA=frequency_analysis(ciphertext)
print FA.solution

This code will calculate the error in statistical frequency for each letter squared to generate an error for each possible rotation. Using a sufficiently long piece of ciphertext this code should accurately reveal the Caesar rotation use. The table below shows the error for each rotation:


Rotation Error
0 4.11797847386
1 3.05305477067
2 3.70059678828
3 3.66330931218
4 3.5078619579
5 0.361318100755
6 3.17289666386
7 3.66072641654
8 3.39769855873
9 1.74854802027
10 2.92550921273
11 2.67524757297
12 2.86847189573
13 3.06980318397
14 2.56886153328
15 2.17180117031
16 2.24503724763
17 2.95579718798
18 1.74002183444
19 1.83328601011
20 1.74779021766
21 2.71332097813
22 1.5409364067
23 1.83209213494
24 1.54904808883

The lowest error is for 5 rotations (correctly so) with an error of 0.361318100755, the next lowest error is 22 rotations with an error of 1.5409364067. This is ~4.3x difference, which gives a very large degree of confidence to our solution and below is the deciphered text.

there are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government. this is a very important lesson to remeberespecially for attacks using frequency analysis which require longer passage of text in order to achieve better results.

Future
The frequency analysis presented here can be used along with some other techniques in order to crack the viginere cipher.

Categories: cryptography, python Tags: ,

Python: Cryptograph using maketrans for substitution and Caesar ciphers

December 22nd, 2009 mat 4 comments

I’ve rewritten the functions in my previous two posts (caesar and substitution ciphers) using the maketrans function from the strings module in python (With thanks to wumzi for pointing this out).

maketrans takes an input alphabet and an output alphabet and can then be used on a string. This can be used to greatly simplify the two ciphers I produced previously. Example below:

input_alphabet="abcde"
output_alphabet="12345"
trantab = maketrans(input_alphabet,output_alphabet)
text="translate abcdefg"
print text.translate(trantab)
# This will output:
# tr1nsl1t5 12345fg

This nows means my code can be rewritten in just a fraction of what it was before:

from random import shuffle
from string import maketrans

alphabet="abcdefghijklmnopqrstuvwxyz" + \
	 "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"+ \
         ":.;,?!@#$%&()+=-*/_<> []{}`~^"

def translator(text,alphabet,key):
	trantab = maketrans(alphabet,key)
	return text.translate(trantab)

def caesar_encode(plaintext,s,alphabet):
	return translator(plaintext,alphabet,alphabet[s:]+alphabet[:s])

def caesar_decode(ciphertext,s,alphabet):
	return translator(ciphertext,alphabet,alphabet[-s:]+alphabet[:-s])

def substitution_encode(plaintext,alphabet):
	randarray=range(0,len(alphabet))
	shuffle(randarray)
	key=""
	for i in range(0,len(alphabet)):
		key+=alphabet[randarray[i]]

	return translator(plaintext,alphabet,key),key

def substitution_decode(ciphertext,key,alphabet):
	return translator(ciphertext,key,alphabet)

# Example useage
plaintext="The wheels on the bus go round and round. round" + \
" and round. round and round. The wheels on the bus go round and"+ \
" round, all through the town!"

print
print "SUBSTITUTION"
ciphertext,key=substitution_encode(plaintext,alphabet)
print "Key: ", key
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", substitution_decode(ciphertext,key,alphabet)

print
print "CAESAR SHIFT"
ciphertext=caesar_encode(plaintext,5,alphabet)
print "Key: ", 5
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", caesar_decode(ciphertext,5,alphabet)

This will output the following

SUBSTITUTION
Key: &fywQ.%!lmx_sRGu:{<(5jqAXvMFgk]SIY[4iK+3P8pcV$2da@DT/ZnOJ*E>-r},BH16zUb#L?N`e7C ~9t)oW^=;h0
Plaintext: The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!
Cipertext: O!Q)q!QQ_<)GR)(!Q)f5<)%G){G5Rw)&Rw){G5Rw,){G5Rw)&Rw){G5Rw,{G5Rw)&Rw){G5Rw,)O!Q)q!QQ_<)GR)(!Q)f5<)%G){G5Rw)&Rw){G5RwH)&__)(!{G5%!)(!Q)(GqR6
Decoded : The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!

CAESAR SHIFT
Key: 5
Plaintext: The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!
Cipertext: Ymj`2mjjqx`ts`ymj`gzx`lt`wtzsi`fsi`wtzsi@`wtzsi`fsi`wtzsi@wtzsi`fsi`wtzsi@`Ymj`2mjjqx`ts`ymj`gzx`lt`wtzsi`fsi`wtzsi$`fqq`ymwtzlm`ymj`yt2s&
Decoded : The wheels on the bus go round and round. round and round.round and round. The wheels on the bus go round and round, all through the town!

Conclusion
maketrans is awesome :)

Categories: cryptography, python Tags: ,
// unused langs // // // //