Archives for 4 Jan,2010

You are browsing the site archives by date.

Python: Cryptography decoding a Caesar shift (frequency analysis)

Due to the simple nature of the Caesar cipher, it could easily be brute forced by trying all possible 25 keys and then looking by eye to see if the plaintext was revealed (this too can be automated by checking for common English words to see if the solution was probable). However the much more elegant method of frequency analysis can be used.

Below is a table of the frequency of letters in the English language:

``` Letter Frequency (percent) Frequency (decimal) Normalised Frequency a 8.17% 0.08167 0.64297 b 1.49% 0.01492 0.11746 c 2.78% 0.02782 0.21902 d 4.25% 0.04253 0.33483 e 12.70% 0.12702 1.00000 f 2.23% 0.02228 0.17541 g 2.02% 0.02015 0.15864 h 6.09% 0.06094 0.47977 i 6.97% 0.06966 0.54842 j 0.15% 0.00153 0.01205 k 0.77% 0.00772 0.06078 l 4.03% 0.04025 0.31688 m 2.41% 0.02406 0.18942 n 6.75% 0.06749 0.53133 o 7.51% 0.07507 0.59101 p 1.93% 0.01929 0.15187 q 0.10% 0.00095 0.00748 r 5.99% 0.05987 0.47134 s 6.33% 0.06327 0.49811 t 9.06% 0.09056 0.71296 u 2.76% 0.02758 0.21713 v 0.98% 0.00978 0.07700 w 2.36% 0.02360 0.18580 x 0.15% 0.00150 0.01181 y 1.97% 0.01974 0.15541 z 0.07% 0.00074 0.00583 ```

And shown graphically:

English Letter Frequency

Using the following code we can use frequency analysis to find the solution to ciphertext created using the Caesar shift demonstrated previously (see Caesar shift and Caesar shift using makestrans).

```from Numeric import *
from string import maketrans

def translator(text,alphabet,key):
trantab = maketrans(alphabet,key)
return text.translate(trantab)

def caesar_decode(ciphertext,s):
alpha="abcdefghijklmnopqrstuvwxyz"
return translator(ciphertext,alpha,alpha[-s:]+alpha[:-s])

class frequency_analysis:
def __init__(self, ciphertext):
self.cor=[0.64297,0.11746,0.21902,0.33483,1.00000,0.17541,
0.15864,0.47977,0.54842,0.01205,0.06078,0.31688,0.18942,
0.53133,0.59101,0.15187,0.00748,0.47134,0.49811,0.71296,
0.21713,0.07700,0.18580,0.01181,0.15541,0.00583]
self.ciphertext=ciphertext.lower()
self.freq()
self.min_error()
self.key=self.minimum[0]
self.solution=caesar_decode(self.ciphertext,self.minimum[0])

def freq(self):
self.arr=zeros(26,Float64)
for l in self.ciphertext:
x=ord(l)
if (x>=97 and x<=122):
self.arr[x-97]+=1.0
self.arr/=max(self.arr)

def error(self):
e=0
for i in range(0,len(self.arr)):
e+=abs(self.arr[i]-self.cor[i])**2
return e

def min_error(self):
self.minimum=[0,10000]
for rot in range(0,25):
e=self.error()
print rot,e
if e<self.minimum[1]:
self.minimum[1]=e
self.minimum[0]=rot
x=self.arr[-1]
del self.cor[-1]
self.cor.insert(0,x)

ciphertext="ymjwj fwj ybt ydujx tk jshwduynts: tsj ymfy bnqq "+\
"uwjajsy dtzw xnxyjw kwtr wjfinsl dtzw infwd fsi tsj ymfy bnqq "+\
"uwjajsy dtzw ltajwsrjsy. ymnx nx f ajwd nrutwyfsy qjxxts yt "+\
"wjrjgjwjxujhnfqqd ktw fyyfhpx zxnsl kwjvzjshd fsfqdxnx bmnhm "+\
"wjvznwj qtsljw ufxxflj tk yjcy ns twijw yt fhmnjaj gjyyjw wjxzqyx."
FA=frequency_analysis(ciphertext)
print FA.solution
```

This code will calculate the error in statistical frequency for each letter squared to generate an error for each possible rotation. Using a sufficiently long piece of ciphertext this code should accurately reveal the Caesar rotation use. The table below shows the error for each rotation:

``` Rotation Error 0 4.11797847386 1 3.05305477067 2 3.70059678828 3 3.66330931218 4 3.5078619579 5 0.361318100755 6 3.17289666386 7 3.66072641654 8 3.39769855873 9 1.74854802027 10 2.92550921273 11 2.67524757297 12 2.86847189573 13 3.06980318397 14 2.56886153328 15 2.17180117031 16 2.24503724763 17 2.95579718798 18 1.74002183444 19 1.83328601011 20 1.74779021766 21 2.71332097813 22 1.5409364067 23 1.83209213494 24 1.54904808883 ```

The lowest error is for 5 rotations (correctly so) with an error of 0.361318100755, the next lowest error is 22 rotations with an error of 1.5409364067. This is ~4.3x difference, which gives a very large degree of confidence to our solution and below is the deciphered text.

there are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government. this is a very important lesson to remeberespecially for attacks using frequency analysis which require longer passage of text in order to achieve better results.

Future
The frequency analysis presented here can be used along with some other techniques in order to crack the viginere cipher.