Skip to content

Cracking real world salted MD5 passwords in python with several dictionaries

by mat on June 28th, 2010

Recently a friend (who will remain unnamed for obvious reasons) asked me to penetration test a website he created. I found a very simple exploit where I could upload an avatar but the file was not checked to ensure it was an image, so I uploaded a php script I wrote an began exploring the server. I printed out all of the usernames, passwords and salts from the database to see how many of the 1,109 passwords could be easily cracked.

The passwords were stored as MD5 hashes with a random 6 character alphanumeric salt. To create the MD5 hash of the password the salt was prefixed to the password and then the combination was hashed. Thanks to this method we can employ a simple bruteforce/dictionary attack on the passwords. I will start with the wordlists creation, then results I obtained to keep your interest, and finally show my python code.

Creating wordlists
I already has two reasnoble sized dictionaries that I use for different things like wordcube. I used john the ripper on my double sized dictionary to create lots of common permutations on words, such as captial first letter, and a number affixed to the end. To do this you run john with the following parameters, where dic.txt is the input dictionary and dic_plus_rules.txt is the output from john with all of the additions it has made.

john –wordlist=dic.txt –rules –stdout > dic_plus_rules.txt

I also download two wordlists from openwall, one which is a list of ~3100 common passwords, and one labelled ALL that has a large amount of words (~4 million) in various languages. Because of the highly compressible nature of text the files are available in small gzip files. ALL is 11.5Mb which unzips to 41.4Mb and password 12kb which unzips to 21.8kb. There are also more wordlists avaliable for different languages, but the ALL file includes these.

The size of all of the wordlists I used is shown below:

Dictionary Combinations
English 42,987
Double-English 80,368
Double+john-rules 3,986,706
Openwall Common Passwords 3,158
Openwall ALL 3,917,116

Results

Dictionary Cracked Percentage Time
English 60 5.41% 80s
Double-English 65 5.86% 170s
Double+john-rules 116 10.46% 2.5hrs (8393s)
Openwall Common Passwords 112 10.10% 7s
Openwall All 210 18.94% 2.45hrs (8829s)
Total Passwords Obtained 254 22.90% ~5hrs

Comical passwords

Here are some of the more amusingly bad passwords, the number in brackets shows the frequency of the password.

Crap passwords: 123456 (18), password (4), 1234567 (4), 123456789 (3) 12345678 (2), 12345 (2), abc123 (2), asdfgh (2), nintendo (2), 123123, abcd1234, abcdefg, qwerty
Self-describing passwords: catholic, cowboy, creator, doger, ginger, killer, maggot, player, princess, skater, smallcock, smooth, super, superman, superstar, tester, veggie, winner, wolverine
Some other passwords:bananas, cheese, cinnamon, hampster ,DRAGON, dribble1, poopie, poopoo

Python Program

# -*- coding: utf-8 -*-
#pymd5cracker.py
import hashlib, sys
from time import time

# Change to commandline swtiches when you have the time!
hash = ""
hash_file = "hash2.csv"
wordlist = "mass_rules.txt"; 


# Read the hash file entered
try:
	hashdocument = open(hash_file,"r")
except IOError:
	print "Invalid file."
	raw_input()
	sys.exit()
else:
	# Read the csv values seperated by colons into an array
	hashes=[]
	for line in hashdocument:
		line=line.replace("\n","")
		inp = line.split(":")
		if (line.count(":")<2):
			inp.append("")
		hashes.append(inp)
	hashdocument.close();


# Read wordlist in
try:
	wordlistfile = open(wordlist,"r")
except IOError:
	print "Invalid file."
	raw_input()
	sys.exit()
else:
	pass

tested=0
cracked=0
tic = time()
for line in wordlistfile:
	
	line = line.replace("\n","")
	tested+=1
	for i in range(0,len(hashes)):
	
		m = hashlib.md5()
		m.update(hashes[i][2]+line)
		word_hash = m.hexdigest()
		if word_hash==hashes[i][1]:
			toc = time()
			cracked+=1
			hashes[i].append(line)
			print hashes[i][0]," : ", line, "\t(",time()-tic,"s)"

	# Show progress evey 1000 passwords tested
	if tested%1000==0:
		print "Cracked: ",cracked," (",tested,") ", line


# Save the output of this program so we can use again 
# with another program/dictionary adding the password 
# to each line we have solved.
crackout = open("pycrackout.txt","w")
for i in hashes:
	s=""
	for j in i:
		if s!="":
			s+=":"
		s+=j
	s+="\n"
	crackout.write(s)
crackout.close()

print "Passwords found: ",cracked,"/",len(hashes)
print "Wordlist Words :", test
print "Hashes computed: ",len(hashes)*tested
print "Total time taken: ",time()-tic,'s' 

Next

  • Play with more dictionaries
  • Speed up code:
    • Add multi-threading: My experience with multi-threading in python is that it doesn't work well for cpu intensive tasks, if you know otherwise please let me know.
    • Have a look at PyCUDA to see if I can use my graphics card to speed up the code significantly (another type of mutli-threading really...) without having to change language like in my previous post of CUDA MD5 cracking
  • Remove hash once found to stop pointless checking
  • Add command line switches to all it to be used like a real program
17 Comments
  1. CPU intensive tasks see no direct speed benefit in most cases in python due to the global interpreter lock. That said, if you’re using python 2.6+ you can (in most cases) easily swap the thread usage with the multiprocessing modules Process class.

  2. Nice article – it’s always good to see real code relating to “difficult” areas like crypto and penetration testing. The more daylight these topics see the more people will feel comfortable building sensible security into their lives.

    Jesse’s right about multiprocessing. When that module was first released I took a sample threaded program and /almost/ converted it by “import multiprocessing as threading” (there are a few differences, but if you have working threaded code it’s a cinch!)

  3. Mike Lowe permalink

    Jesse is correct in his assertion about multiprocessing. If you check pypi it’s been back ported to 2.4 so there shouldn’t be any obstacles to using it.

  4. @Jesse Noller @Steve Holden @Mike Lowe Thanks guys I will give the multiprocessing python modules a go, hopefully I can get a good speed up without significantly increasing the complexity (unlike with CUDA).

  5. Roger permalink

    I’ve used multiprocessing for similar “process huge data sets” work. The way I structured the code is that the main thread splits the work into chunks and posts each chunk to a multiprocessing.Queue. (The queue is created with a bounded size as it is pointless having too much queued up.)

    Workers are started (multiprocessing.cpu_count()) and they read from the queue. Depending on what is going on, they either do something directly with their results (eg updating a database) or post results to another Queue where a worker waiting on that does whatever is needed.

    Something to be very careful about is that on Unix multiprocessing creates new processes using fork. Many objects are not fork safe such as database connections, network connections, files being written to etc and either error when used by both parent and child after forking or corrupt things. It is far better to allocate those kind of things in each worker.

    Debugging is a little more painful with multiple processes since ‘import pdb ; pdb.set_trace()’ only stops the running process and not the others. What I do is something like this:

    if __debug__:
       import threading as parcomp
       queueclass=Queue.Queue
       workerclass=threading.Thread
       NUMWORKERS=1
    else:
       import multiprocessing as parcomp
       queueclass=parcomp.Queue
       workerclass=parcomp.Process
       NUMWORKERS=parcomp.cpu_count()
    

    Now you can easily create queues and workers and in __debug__ mode it is easy to actually debug.

  6. @Roger Thanks Roger, that’s some good advise I’ll take into account :) I’ll probably post up multi processing version of my code in a few days and then people can tell me how wrong I’ve done things :P

  7. George permalink

    As someone with no php experience, the most interesting part for me was the first step of the exploit. Even without checking that the avatars are indeed images, I would expect that the php “images” would fail to render or at most render as random garbage. The fact that a user-uploaded file can be executable is mind-boggling!

  8. Brian permalink

    @George
    I had the same reaction as George when I read the article. Can someone explain how a php file, uploaded as an image, could be executed. How would the php interpreter ever get started?

  9. ridiculous permalink

    This is a joke? cracking in python? you are wasting most of the CPU, you should use C or Java. Python ?! I cannot believe lamerism in 2010…

  10. @George @Brian Sorry I think you misunderstood, perhaps I didn’t phrase it very well (in fact I didn’t really explain it at all).

    1. Created a malicious php file to list database contents
    2. Uploaded the php file as an avatar image
    3. The php file now resides on the server, when my avatar is requested to be displayed as you rightly pointed out will not get fed through the php interpreter. However as it still resides on the server all you need to do is to find the location and then simply access it via it’s URL. To get this URL you can simply view the source to see where the html img src points to.
    4. Enter this URL and steal all the usernames/passwords/salts
    5. Crack

    Hope this makes more sense for you now :)

    @ridiculous I’d like to see some benchmark differences, I know python will be slower but not sure how much by. The reason I use python is because its very fast to code, very simple to spot errors, has lovely syntax, and I hate using strings in C. In a similar fashion you could complain about the languages you state and ask why not code things 100% in assembly (for which the answer is obvious).

  11. If you want speed and you need to pass a lot of data, instead of using the multiprocessing module it is much faster to communicate directly via a pipe. For you task, however it would be much simpler to launch a single instance of the program for each password and handle the number of concurrently running processes using ‘make’ :)

  12. @Aigars Mahinovs I still need to learn all the mutliprocessing stuff with python, does do (or anyone else) know any good tutorials as all I can find is documentation for the methods. I aim is to use learn how to use PyCUDA which would improve the speed significantly.

  13. Gregory P. Smith permalink

    The most recent versions of the hashlib module (in 2.6? if not its in 2.7, I’ve forgotten when I checked that in) will release the GIL during hash computation but only on data that is sufficiently large. A salted password hash is not sufficiently large. (“large” is hardcoded at the moment to a few K).

    Use multiprocessing as Jesse and others have suggested.

    Random other optimizations… pass your data directly to hashlib.md5(data). Time it with and without the hashes[i][2]+line string concatenation. is is faster to call m = hashlib.md5(hashes[i][2]); m.update(line); in sequence to avoid the string concatenation data copy and new string object creation… etc.. you also reference hashes[i] three times within your inner loop. assigning that to a local might help. try things and benchmark. [/ end random perf tips]

  14. @Gregory P. Smith Thanks for the tips, it’s very helpful to get feedback from people who know what they’re talking about :)

  15. The most professional android apps download site

  16. Mohammed permalink

    My Name is Mr. Mohammed we Offer out loan interest rate is 3%. Do you need a loan? if yes, for more information
    Full Name:
    Loan Amount:
    Duration:
    Country:
    Address:
    Phone Number
    God bless you.
    Mr.Mohammed Osman

Trackbacks & Pingbacks

  1. What I’m Reading « Andrew Jesaitis

Leave a Reply

Note: I am currently writing my thesis so probably wont have time to reply to your comment
Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS