Skip to content

Python: Cryptography decoding a Caesar shift (frequency analysis)

by mat on January 4th, 2010

Due to the simple nature of the Caesar cipher, it could easily be brute forced by trying all possible 25 keys and then looking by eye to see if the plaintext was revealed (this too can be automated by checking for common English words to see if the solution was probable). However the much more elegant method of frequency analysis can be used.

Below is a table of the frequency of letters in the English language:

Letter Frequency (percent) Frequency (decimal) Normalised Frequency
a 8.17% 0.08167 0.64297
b 1.49% 0.01492 0.11746
c 2.78% 0.02782 0.21902
d 4.25% 0.04253 0.33483
e 12.70% 0.12702 1.00000
f 2.23% 0.02228 0.17541
g 2.02% 0.02015 0.15864
h 6.09% 0.06094 0.47977
i 6.97% 0.06966 0.54842
j 0.15% 0.00153 0.01205
k 0.77% 0.00772 0.06078
l 4.03% 0.04025 0.31688
m 2.41% 0.02406 0.18942
n 6.75% 0.06749 0.53133
o 7.51% 0.07507 0.59101
p 1.93% 0.01929 0.15187
q 0.10% 0.00095 0.00748
r 5.99% 0.05987 0.47134
s 6.33% 0.06327 0.49811
t 9.06% 0.09056 0.71296
u 2.76% 0.02758 0.21713
v 0.98% 0.00978 0.07700
w 2.36% 0.02360 0.18580
x 0.15% 0.00150 0.01181
y 1.97% 0.01974 0.15541
z 0.07% 0.00074 0.00583

And shown graphically:

English Letter Frequency

English Letter Frequency

Using the following code we can use frequency analysis to find the solution to ciphertext created using the Caesar shift demonstrated previously (see Caesar shift and Caesar shift using makestrans).

from Numeric import *
from string import maketrans

def translator(text,alphabet,key):
	trantab = maketrans(alphabet,key)
	return text.translate(trantab)

def caesar_decode(ciphertext,s):
	alpha="abcdefghijklmnopqrstuvwxyz"
	return translator(ciphertext,alpha,alpha[-s:]+alpha[:-s])




class frequency_analysis:
	def __init__(self, ciphertext):
		self.cor=[0.64297,0.11746,0.21902,0.33483,1.00000,0.17541,
		0.15864,0.47977,0.54842,0.01205,0.06078,0.31688,0.18942,
		0.53133,0.59101,0.15187,0.00748,0.47134,0.49811,0.71296,
		0.21713,0.07700,0.18580,0.01181,0.15541,0.00583]
		self.ciphertext=ciphertext.lower()
		self.freq()
		self.min_error()
		self.key=self.minimum[0]
		self.solution=caesar_decode(self.ciphertext,self.minimum[0])

	def freq(self):
		self.arr=zeros(26,Float64)
		for l in self.ciphertext:
			x=ord(l)
			if (x>=97 and x<=122):
				self.arr[x-97]+=1.0
		self.arr/=max(self.arr)

	def error(self):
		e=0
		for i in range(0,len(self.arr)):
			e+=abs(self.arr[i]-self.cor[i])**2
		return e

	def min_error(self):
		self.minimum=[0,10000]
		for rot in range(0,25):
			e=self.error()
			print rot,e
			if e<self.minimum[1]:
				self.minimum[1]=e
				self.minimum[0]=rot
			x=self.arr[-1]
			del self.cor[-1]
			self.cor.insert(0,x)

ciphertext="ymjwj fwj ybt ydujx tk jshwduynts: tsj ymfy bnqq "+\
"uwjajsy dtzw xnxyjw kwtr wjfinsl dtzw infwd fsi tsj ymfy bnqq "+\
"uwjajsy dtzw ltajwsrjsy. ymnx nx f ajwd nrutwyfsy qjxxts yt "+\
"wjrjgjwjxujhnfqqd ktw fyyfhpx zxnsl kwjvzjshd fsfqdxnx bmnhm "+\
"wjvznwj qtsljw ufxxflj tk yjcy ns twijw yt fhmnjaj gjyyjw wjxzqyx."
FA=frequency_analysis(ciphertext)
print FA.solution

This code will calculate the error in statistical frequency for each letter squared to generate an error for each possible rotation. Using a sufficiently long piece of ciphertext this code should accurately reveal the Caesar rotation use. The table below shows the error for each rotation:

Rotation Error
0 4.11797847386
1 3.05305477067
2 3.70059678828
3 3.66330931218
4 3.5078619579
5 0.361318100755
6 3.17289666386
7 3.66072641654
8 3.39769855873
9 1.74854802027
10 2.92550921273
11 2.67524757297
12 2.86847189573
13 3.06980318397
14 2.56886153328
15 2.17180117031
16 2.24503724763
17 2.95579718798
18 1.74002183444
19 1.83328601011
20 1.74779021766
21 2.71332097813
22 1.5409364067
23 1.83209213494
24 1.54904808883

The lowest error is for 5 rotations (correctly so) with an error of 0.361318100755, the next lowest error is 22 rotations with an error of 1.5409364067. This is ~4.3x difference, which gives a very large degree of confidence to our solution and below is the deciphered text.

there are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government. this is a very important lesson to remeberespecially for attacks using frequency analysis which require longer passage of text in order to achieve better results.

Future
The frequency analysis presented here can be used along with some other techniques in order to crack the viginere cipher.

2 Comments
  1. Jean permalink

    Hi!
    That works on your example but not at all on this one…

    ciphertext=”IKNHQHNWKZHTHNHPZKTPKAZYASNKOOAVHNPSAETKOHQHNCH\
    HZSKBZRHYKBRCBRNHIBOHYRKCHXZKSXHYKBRAZYIKNHQHNWK\
    ZHETKTAOORBVCFHYCBRORKKYNPDTRCASXBLAZYIKNHQHNWK\
    ZHETKEKNXOTANYAZYZHQHNDPQHOBLRTPOKZHPOIKNWKBWKB\
    XZKEETARRTHWOAWAOKTPKDKHOOKDKHORTHZARPKZEHFFRT\
    POZARPKZOSKVPZDCASXAZYOKPORTPOSAVLAPDZRTHLHKLFHK\
    IKTPKTAQHOAPYPRFKBYFWAZYSFHANFWEHNHDKPZDKZEHNHDK\
    PZDORNKZDAZYEHNHDKPZDAFFRTHEAWWKBXZKERTHWSAFFKT\
    PKACHFFEHRTHNORARHPROACARRFHDNKBZYORARHPROAORA\
    RHRTARXZKEOTKERKLPSXALNHOPYHZRAZYZKSAZYPYARHPZNH\
    SHZRTPORKNWYHVKSNARKNNHLBCFPSAZTAOEKZRTHETPRHTK\
    BOHEPRTKBREPZZPZDRTHKTPKLNPVANW”

    Thanks nonetheless

  2. I like the helpful information you provide in your articles.
    I’ll bookmark your blog and check again here frequently. I am quite
    certain I’ll learn lots of new stuff right here! Best
    of luck for the next!

Leave a Reply

Note: I am currently writing my thesis so probably wont have time to reply to your comment
Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS