Skip to content

Python: Cryptography Substitution Cipher improving on the Caesar cipher

by mat on December 22nd, 2009

This post builds upon the Caesar shift presented previously; converting it to a full substitution cipher. The substitution cipher will practically remove bruteforce style methods of defeating the encryption and provide a basis for more complicated ciphers.

Subsitution Cipher
Because a Caesar shift only rotates the alphabet there are only 25 possible unique solutions, this leaves the cipher quite vulnerable to brute force. If rather than just rotating the alphabet and keeping it ‘linear’ we can shuffle it to create a substitution cipher. This improves the number of possible solutions to a shocking
(26! – 1) = 4.03291461126605635584e26 (unfortunatly, the substitution cipher is alot weaker than it seems as it is vunerable to several different cryptanalysis attacks).

Substitution Cipher Diagram

Substitution Cipher Diagram

We start with the simple Caesar shift function with the following changes:

  • A new array is introduced filled with the numbers 0 – the length of the alphabet. This array is shuffled using shuffle from the random module. The alphabet substitution dictionary is created using this array to decide which letters go where (this is probably clearer in the code than in my explanation).
  • With the Caesar shift we only needed to know the number of rotations in order to decrypt the text, now we need a full list of the letter substitutions. This is stored as the key and will be needed in order to decode our substitution cipher.
  • We also now store the alphabet outside of the function so that it can be used in a decode function. This function was added along with some example usage to make the full process more understandable
from random import shuffle

alphabet="abcdefghijklmnopqrstuvwxyz"

def substitution(alphabet,plaintext):

	# Create array to use to randomise alphabet position
	randarray=range(0,len(alphabet))
	shuffle(randarray)

	key=""

	#Create our substitution dictionary
	dic={}
	for i in range(0,len(alphabet)):
		key+=alphabet[randarray[i]]
		dic[alphabet[i]]=alphabet[randarray[i]]

	#Convert each letter of plaintext to the corrsponding
	#encrypted letter in our dictionary creating the cryptext
	ciphertext=""
	for l in plaintext:
		if l in dic:
			l=dic[l]
		ciphertext+=l
	for i in alphabet:
		print i,
	print
	for i in key:
		print i,
	print
	return ciphertext,key


# This function decodes the ciphertext using the key and creating
# the reverse of the dictionary created in substitution to retrieve
# the plaintext again
def decode(alphabet,ciphertext,key):
	
	dic={}
	for i in range(0,len(key)):
		dic[key[i]]=alphabet[i]

	plaintext=""
	for l in ciphertext:
		if l in dic:
			l=dic[l]
		plaintext+=l

	return plaintext

# Example useage
plaintext="the cat sat on the mat"
ciphertext,key=substitution(plaintext)
print "Key: ", key
print "Plaintext:", plaintext
print "Cipertext:", ciphertext
print "Decoded  :", decode(ciphertext,key)

Running this will output the following (This will be different on each run due to the use of random to generate the key).

Key: miylbsowutgdkfvjepqhazrncx
Plaintext: the cat sat on the mat
Cipertext: hwb ymh qmh vf hwb kmh
Decoded : the cat sat on the mat

Improvement by using additional characters
Adding additional characters into the substitution will it more difficult to solve. For example if we change our alphabet from:

alphabet=”abcdefghijklmnopqrstuvwxyz”

If we include capital letters, numbers from 0 -9 and special characters:

alphabet=”ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890″+ \
“:.;,?!@#$%&()+=-*/_<> []{}`~^”+ \
“abcdefghijklmnopqrstuvwxyz”

We increase the number of possible solutions from (26!-1) to (63!-1) which is 1.98260831540444006412e87. This also has the added benefit of making the encrypted text alot cooler and harder to guess at by eye (unfortunately still very easy to see what character represents space).

Key: MXjAarLzWqFePI7E botO5f1kym29RZd3Sh8JTiQGKVDp6YBCsU4nucNgwlx0vH
Plaintext: this plaintext will be much more of a challenge to decode compared to the Caesar shift cipher
Cipertext: tzWoHEeMWIta1tHfWeeHXaHPOjzHP7baH7rHMHjzMeeaILaHt7HAaj7AaHj7PEMbaAHt7HtzaHjMaoMbHozWrtHjWEzab
Decoded : this plaintext will be much more of a challenge to decode compared to the caesar shift cipher

Future
The substitution cipher is a lot more secure than Caesar shift cipher but unfortunately is very insecure towards frequency analysis. In future posts I will address using frequency analysis and methods to prevent this type of attack as well as improving on this cipher by creating multiple-dicitionary based ciphers to create Vigenère style ciphers.

I imagine most people reading this will enjoy the simple challenge of solving some encrypted text. I have used this code to make some ciphertext, try and decode it! (extra points for knowing where it is from):

^’VtuBbtv3vut1u-w.G^’Vt&vnu-tZBtZnuIvwvtwn
-qbtuB6GN3vutbqBS-qt.BSt&wB~vtV.tqv1wbG}u5t
~nDDv5tVvG}u5tbBwvtVvtbBt@nvIvZG}u5tbqwv6tv
3vw.t@nvIvtnubBt1tUnwvG}Ztbqv.t&Swuv5tnbtqS
wbt&vI1SZvt^t61ZtZBtq1@@.tUBwt.BSm)B6tbqvZv
t@BnubZtBUt51b1tV1~vt1t&v1SbnUSDtDnuvG}u5t6
v’wvtBSbtBUt&vb1GQv’wvtwvDv1Znu-tButbnVvG

(note: newlines were placed to make it fit and do not represent a character)

Update: Made code a little cleaner by moving dictionary outside of functions.

2 Comments
  1. cybersol permalink

    Thanks for the fun words games to play, and a further reference to something I keep hearing about and need to experience myself.

Trackbacks & Pingbacks

  1. burglar alarm leeds

Leave a Reply

Note: I am currently writing my thesis so probably wont have time to reply to your comment
Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS