# Python: Cryptography Substitution Cipher improving on the Caesar cipher

This post builds upon the Caesar shift presented previously; converting it to a full substitution cipher. The substitution cipher will practically remove bruteforce style methods of defeating the encryption and provide a basis for more complicated ciphers.

**Subsitution Cipher**

Because a Caesar shift only rotates the alphabet there are only 25 possible unique solutions, this leaves the cipher quite vulnerable to brute force. If rather than just rotating the alphabet and keeping it ‘linear’ we can shuffle it to create a substitution cipher. This improves the number of possible solutions to a shocking

(26! – 1) = 4.03291461126605635584e26 (unfortunatly, the substitution cipher is alot weaker than it seems as it is vunerable to several different cryptanalysis attacks).

We start with the simple Caesar shift function with the following changes:

- A new array is introduced filled with the numbers 0 – the length of the alphabet. This array is shuffled using shuffle from the random module. The alphabet substitution dictionary is created using this array to decide which letters go where (this is probably clearer in the code than in my explanation).
- With the Caesar shift we only needed to know the number of rotations in order to decrypt the text, now we need a full list of the letter substitutions. This is stored as the key and will be needed in order to decode our substitution cipher.
- We also now store the alphabet outside of the function so that it can be used in a decode function. This function was added along with some example usage to make the full process more understandable

from random import shuffle alphabet="abcdefghijklmnopqrstuvwxyz" def substitution(alphabet,plaintext): # Create array to use to randomise alphabet position randarray=range(0,len(alphabet)) shuffle(randarray) key="" #Create our substitution dictionary dic={} for i in range(0,len(alphabet)): key+=alphabet[randarray[i]] dic[alphabet[i]]=alphabet[randarray[i]] #Convert each letter of plaintext to the corrsponding #encrypted letter in our dictionary creating the cryptext ciphertext="" for l in plaintext: if l in dic: l=dic[l] ciphertext+=l for i in alphabet: print i, print for i in key: print i, print return ciphertext,key # This function decodes the ciphertext using the key and creating # the reverse of the dictionary created in substitution to retrieve # the plaintext again def decode(alphabet,ciphertext,key): dic={} for i in range(0,len(key)): dic[key[i]]=alphabet[i] plaintext="" for l in ciphertext: if l in dic: l=dic[l] plaintext+=l return plaintext # Example useage plaintext="the cat sat on the mat" ciphertext,key=substitution(plaintext) print "Key: ", key print "Plaintext:", plaintext print "Cipertext:", ciphertext print "Decoded :", decode(ciphertext,key)

Running this will output the following (This will be different on each run due to the use of random to generate the key).

Key: miylbsowutgdkfvjepqhazrncx

Plaintext: the cat sat on the mat

Cipertext: hwb ymh qmh vf hwb kmh

Decoded : the cat sat on the mat

**Improvement by using additional characters**

Adding additional characters into the substitution will it more difficult to solve. For example if we change our alphabet from:

alphabet=”abcdefghijklmnopqrstuvwxyz”

If we include capital letters, numbers from 0 -9 and special characters:

alphabet=”ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890″+ \

“:.;,?!@#$%&()+=-*/_<> []{}`~^”+ \

“abcdefghijklmnopqrstuvwxyz”

We increase the number of possible solutions from (26!-1) to (63!-1) which is 1.98260831540444006412e87. This also has the added benefit of making the encrypted text alot cooler and harder to guess at by eye (unfortunately still very easy to see what character represents space).

Key: MXjAarLzWqFePI7E botO5f1kym29RZd3Sh8JTiQGKVDp6YBCsU4nucNgwlx0vH

Plaintext: this plaintext will be much more of a challenge to decode compared to the Caesar shift cipher

Cipertext: tzWoHEeMWIta1tHfWeeHXaHPOjzHP7baH7rHMHjzMeeaILaHt7HAaj7AaHj7PEMbaAHt7HtzaHjMaoMbHozWrtHjWEzab

Decoded : this plaintext will be much more of a challenge to decode compared to the caesar shift cipher

**Future**

The substitution cipher is a lot more secure than Caesar shift cipher but unfortunately is very insecure towards frequency analysis. In future posts I will address using frequency analysis and methods to prevent this type of attack as well as improving on this cipher by creating multiple-dicitionary based ciphers to create VigenĂ¨re style ciphers.

I imagine most people reading this will enjoy the simple challenge of solving some encrypted text. I have used this code to make some ciphertext, try and decode it! (extra points for knowing where it is from):

^’VtuBbtv3vut1u-w.G^’Vt&vnu-tZBtZnuIvwvtwn

-qbtuB6GN3vutbqBS-qt.BSt&wB~vtV.tqv1wbG}u5t

~nDDv5tVvG}u5tbBwvtVvtbBt@nvIvZG}u5tbqwv6tv

3vw.t@nvIvtnubBt1tUnwvG}Ztbqv.t&Swuv5tnbtqS

wbt&vI1SZvt^t61ZtZBtq1@@.tUBwt.BSm)B6tbqvZv

t@BnubZtBUt51b1tV1~vt1t&v1SbnUSDtDnuvG}u5t6

v’wvtBSbtBUt&vb1GQv’wvtwvDv1Znu-tButbnVvG

(note: newlines were placed to make it fit and do not represent a character)

Update: Made code a little cleaner by moving dictionary outside of functions.

Thanks for the fun words games to play, and a further reference to something I keep hearing about and need to experience myself.