Due to the simple nature of the Caesar cipher, it could easily be brute forced by trying all possible 25 keys and then looking by eye to see if the plaintext was revealed (this too can be automated by checking for common English words to see if the solution was probable). However the much more elegant method of frequency analysis can be used.

Below is a table of the frequency of letters in the English language:

```
```
Letter
Frequency (percent)
Frequency (decimal)
Normalised Frequency
a
8.17%
0.08167
0.64297
b
1.49%
0.01492
0.11746
c
2.78%
0.02782
0.21902
d
4.25%
0.04253
0.33483
e
12.70%
0.12702
1.00000
f
2.23%
0.02228
0.17541
g
2.02%
0.02015
0.15864
h
6.09%
0.06094
0.47977
i
6.97%
0.06966
0.54842
j
0.15%
0.00153
0.01205
k
0.77%
0.00772
0.06078
l
4.03%
0.04025
0.31688
m
2.41%
0.02406
0.18942
n
6.75%
0.06749
0.53133
o
7.51%
0.07507
0.59101
p
1.93%
0.01929
0.15187
q
0.10%
0.00095
0.00748
r
5.99%
0.05987
0.47134
s
6.33%
0.06327
0.49811
t
9.06%
0.09056
0.71296
u
2.76%
0.02758
0.21713
v
0.98%
0.00978
0.07700
w
2.36%
0.02360
0.18580
x
0.15%
0.00150
0.01181
y
1.97%
0.01974
0.15541
z
0.07%
0.00074
0.00583

And shown graphically:

Using the following code we can use frequency analysis to find the solution to ciphertext created using the Caesar shift demonstrated previously (see Caesar shift and Caesar shift using makestrans).

from Numeric import * from string import maketrans def translator(text,alphabet,key): trantab = maketrans(alphabet,key) return text.translate(trantab) def caesar_decode(ciphertext,s): alpha="abcdefghijklmnopqrstuvwxyz" return translator(ciphertext,alpha,alpha[-s:]+alpha[:-s]) class frequency_analysis: def __init__(self, ciphertext): self.cor=[0.64297,0.11746,0.21902,0.33483,1.00000,0.17541, 0.15864,0.47977,0.54842,0.01205,0.06078,0.31688,0.18942, 0.53133,0.59101,0.15187,0.00748,0.47134,0.49811,0.71296, 0.21713,0.07700,0.18580,0.01181,0.15541,0.00583] self.ciphertext=ciphertext.lower() self.freq() self.min_error() self.key=self.minimum[0] self.solution=caesar_decode(self.ciphertext,self.minimum[0]) def freq(self): self.arr=zeros(26,Float64) for l in self.ciphertext: x=ord(l) if (x>=97 and x<=122): self.arr[x-97]+=1.0 self.arr/=max(self.arr) def error(self): e=0 for i in range(0,len(self.arr)): e+=abs(self.arr[i]-self.cor[i])**2 return e def min_error(self): self.minimum=[0,10000] for rot in range(0,25): e=self.error() print rot,e if e<self.minimum[1]: self.minimum[1]=e self.minimum[0]=rot x=self.arr[-1] del self.cor[-1] self.cor.insert(0,x) ciphertext="ymjwj fwj ybt ydujx tk jshwduynts: tsj ymfy bnqq "+\ "uwjajsy dtzw xnxyjw kwtr wjfinsl dtzw infwd fsi tsj ymfy bnqq "+\ "uwjajsy dtzw ltajwsrjsy. ymnx nx f ajwd nrutwyfsy qjxxts yt "+\ "wjrjgjwjxujhnfqqd ktw fyyfhpx zxnsl kwjvzjshd fsfqdxnx bmnhm "+\ "wjvznwj qtsljw ufxxflj tk yjcy ns twijw yt fhmnjaj gjyyjw wjxzqyx." FA=frequency_analysis(ciphertext) print FA.solution

This code will calculate the error in statistical frequency for each letter squared to generate an error for each possible rotation. Using a sufficiently long piece of ciphertext this code should accurately reveal the Caesar rotation use. The table below shows the error for each rotation:

```
```
Rotation
Error
0
4.11797847386
1
3.05305477067
2
3.70059678828
3
3.66330931218
4
3.5078619579
5
0.361318100755
6
3.17289666386
7
3.66072641654
8
3.39769855873
9
1.74854802027
10
2.92550921273
11
2.67524757297
12
2.86847189573
13
3.06980318397
14
2.56886153328
15
2.17180117031
16
2.24503724763
17
2.95579718798
18
1.74002183444
19
1.83328601011
20
1.74779021766
21
2.71332097813
22
1.5409364067
23
1.83209213494
24
1.54904808883

The lowest error is for 5 rotations (correctly so) with an error of 0.361318100755, the next lowest error is 22 rotations with an error of 1.5409364067. This is ~4.3x difference, which gives a very large degree of confidence to our solution and below is the deciphered text.

there are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government. this is a very important lesson to remeberespecially for attacks using frequency analysis which require longer passage of text in order to achieve better results.

**Future**

The frequency analysis presented here can be used along with some other techniques in order to crack the viginere cipher.

Hi!

That works on your example but not at all on this one…

ciphertext=”IKNHQHNWKZHTHNHPZKTPKAZYASNKOOAVHNPSAETKOHQHNCH\

HZSKBZRHYKBRCBRNHIBOHYRKCHXZKSXHYKBRAZYIKNHQHNWK\

ZHETKTAOORBVCFHYCBRORKKYNPDTRCASXBLAZYIKNHQHNWK\

ZHETKEKNXOTANYAZYZHQHNDPQHOBLRTPOKZHPOIKNWKBWKB\

XZKEETARRTHWOAWAOKTPKDKHOOKDKHORTHZARPKZEHFFRT\

POZARPKZOSKVPZDCASXAZYOKPORTPOSAVLAPDZRTHLHKLFHK\

IKTPKTAQHOAPYPRFKBYFWAZYSFHANFWEHNHDKPZDKZEHNHDK\

PZDORNKZDAZYEHNHDKPZDAFFRTHEAWWKBXZKERTHWSAFFKT\

PKACHFFEHRTHNORARHPROACARRFHDNKBZYORARHPROAORA\

RHRTARXZKEOTKERKLPSXALNHOPYHZRAZYZKSAZYPYARHPZNH\

SHZRTPORKNWYHVKSNARKNNHLBCFPSAZTAOEKZRTHETPRHTK\

BOHEPRTKBREPZZPZDRTHKTPKLNPVANW”

Thanks nonetheless

I wrote a small shell script to find the starting characters of titles in wikipedia. The interesting thing (to me) is that this is quite a different shape than the frequency list of English text in general (eg, your graph above).

Here is my graph:

http://misc.k5-stats.org/wikipedia-title-start-chars.png

And side-by-side with yours:

http://misc.k5-stats.org/text-frequencies-vs-title-frequencies.png