Analyzing the XKCD Passphrase Comic

I rarely see any discussion of password strength without seeing th XKCD comic below brought up to illustrate that a long pass phrase is better than a shorter random jumble of characters. Since this is something I have been arguing for fifteen years, this is something I do agree with, although adding a little more randomness and complexity is still necessary.

XKCD: Password Strength

(XKCD: Password Strength - Creative Commons Attribution-NonCommercial 2.5 License.)

In 2006 I wrote Pafwert, a random but smart password generator, to illustrate this concept. Pass phrases are easier to remember, easier to type (we type in whole words), and are generally much stronger passwords. My philosophy has always been that length is more important than any other factor for password strength.

But not everyone agrees. Most often the argument against the pass phrase technique is that since the password is made up of 4 whole words, basically this isn’t that much different than a 4-character password, you just need to adjust the brute-force tools to work with whole words instead. While this is somewhat true, it doesn’t take much to turn this technique into something extremely effective.

How Strong are Pass Phrases?

To determine password strength, we generally determine how many passwords have similar characteristics. In other words, if finding a password is like finding needle in a haystack, the critical question is how big is that haystack?

To do the math on this, we need to determine how large a set of words the average English-speaking user would likely choose from. Some English language dictionaries include well over 150,000 words but most linguists agree that the average-intelligence English speaker has a vocabulary of somewhere between 7,000 and 15,000 words.

What is misleading about these numbers is that dictionary words are only a small part of our vocabulary. Consider these other non-dictionary words:

  1. Proper nouns such as McDonalds, Lady Gaga, Instagram, JQuery, and possibly hundreds of thousands of other words that are part of our daily vocabulary.
  2. Domain names like facebook.com, flickr.com, and thousands of others.
  3. Popular slang and social jargon (see your average Facebook post).
  4. Alternate spellings, leetspeek, etc.
  5. Acronyms such as WWW, CISPA, SSN, WWII, and SMS.
  6. Words from other languages
  7. Programming language elements and function names
  8. And don’t forget written-out numbers, you will never find “1,276,209″ in a dictionary and there are millions of those.

Forget dictionary words, our vocabularies are HUGE.

So how many actual words do we know? It is impossible to say but a very conservative estimate would be a minimum of about 25,000 words. Realistically this number is much higher than this but we will use 25,000 here just for illustration.

Now if we are picking 4 random words from a set of 25,000 words the number of possible combinations is 25,0004 or  390,625,000,000,000,000 (noted as #1 on the table below) which is about the strength of a 9-10 character alphanumeric password (see this chart). But passwords are case-sensitive and we often capitalize one of the words so realistically we are talking about 50,000 words or 50,0004 or  6,250,000,000,000,000,000 possible combinations (noted by #2 on the table below) which is about as strong as a 10-11 character alphanumeric password.

What’s interesting to note is that even a 3-word phrase results in 125,000,000,000,000 possibilities so even that would be roughly equivalent to a 7-8 character alphanumeric password which is the most commonly-seen password.

 

Making Them Even Stronger

Now most people have already developed techniques to make passwords stronger by adding some numbers or otherwise mutating that word so that it would not appear in a dictionary. That is why we often see passwords like dr@gon or freddy2000. Now these are very weak passwords by themselves but if you use this same technique in a pass phrase you can make them much stronger.

Remember, we are dealing with numbers that grow exponentially so a technique that is mediocre with a short password is incredibly effective with a long password.

Now consider the following pass phrase:  Picking at 200 p1ckles

Or this one:  I’m alway sthe first

Or this one:  How bout the 0xFC?

It’s a simple technique and a minor change but by doing this we have greatly expanded our 50,000 words. Many password cracking tools are very good at generating word permutations and can very quickly create and try hundreds of variants of a single dictionary word. But when you multiply that times 4 words, the numbers grow very fast.

Say, for example that for each of our original 25,000 words there are approximately 100 different mutations. That means we now potentially have a vocabulary of 2,500,000 words. And 2,500,000^4 equals 39,062,500,000,000,000,000,000,000 possible combinations of 4-word phrases (shown as #3 on the table above) which is stronger than a 14-character alphanumeric password.

So yeah, the XKCD recommendation is valid. And all you have to do is add a few simple mutations to make that method incredibly stronger.

My Password is 4.hub.route.edu.

Password security has always been a hot issue but events in the last few years have made it an even more pressing issue to a greater number of people. When I hear receptionists in a doctor’s office sharing strategies for creating secure passwords I know this is now beyond the realm of network administrators and security professionals.

But one thing I have noticed is that many people don’t truly understand why one password can be so much stronger than another so I thought I would walk through the process of cracking a password. In this case, I decided to use as an example the very password that (until I wrote this) I use for the admin account on this blog.

So like I said in the title, my password is 4.hub.route.edu.

That isn’t the best password I have come up with but it is still fairly strong. It is 15 characters long, contains a number, letters, and some periods. It took me just a couple logins to actually memorize that password. The word components are fast to type because we are trained to type in whole words. And there are four parts, each one ending with a period. The repetition of the period helps the memory process.

Chances are that no one would be able to go to my admin page (which itself is protected by a different password) and just guess that, no matter how much they knew about me and no matter how many of my other passwords they knew because I have never used that password anywhere else. As of writing this article, I can do a Google search for “4.hub.route.edu” and there will be no results.

But the real risk isn’t someone being able to keep trying to guess my password via the admin page, the real risk is someone finding a new 0-day exploit that allows them to dump the users table in my database and get the hash of my password (which happens to be $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/). If someone can get your hash, they can now try millions of password combinations without you ever knowing it.

Cracking a password hash is a lot like trying keys in a lock. A hash is a string of characters derived from your password that is calculated in such a way that it is nearly impossible to work backwards to discover the original password so it is relatively safe to store. When you log in to a system, it will run the password you enter through this same complex formula and the result should be the same.

So when I first created my password on this blog I entered 4.hub.route.edu. WordPress ran it through these formulas and came up with the hash $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/ which it saved it in the database. The next time I log in, I enter my 4.hub.route.edu password, WordPress runs the same formula on that password and it comes up with $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/ which matches the hash it has stored so it knows that I am using the correct password even though WordPress never stored my actual password. Now what is special about these formulas is that it is extremely rare that any two passwords will create the exact same hash (a concept known as collision).

So if someone is able to obtain my hash, they can’t directly get my password from that, but they can try millions or even billions of different passwords and run each one through the formula until they find one that produces that exact same hash. It is a lot like having a lock, you can’t easily create a key from it but you can try a bunch of keys until you find the one that works.

Now when it comes to passwords there are actually hundreds of trillions of possible passwords someone might choose. Even with a cluster of powerful computers it could take decades to try every possible password. Fortunately for hackers, most people aren’t that clever with their passwords. There are a number of strategies they use that can drastically reduce the number of passwords you need to test to crack a password. Below is that strategy

1. Hash Lookup

First, an attacker will check to see if someone else has cracked the password before, using either a local database or an online database such as onlinehashcrack.com or hash-database.net or one of the hundreds of other similar sites. In the past few years there have been many large sites that have been hacked and their passwords leaked. If you password was ever one of these, chances are it will appear in one of these databases. Likewise, if you select a common password that others may also be using, it also might be on this list.

In the case of WordPress, the hashes are created using PHPASS but for the sake of this example, let’s just assume they use MD5 hashes like many other systems use. The MD5 hash for my password 4.hub.route.edu is 7914881ba9b78fa307db6ef0db675e29. You can search any online databases for my hash and you will not find it listed anywhere (at least at the time of writing). If your password is one that you have never used before and others likely have not used, you should be safe (try googling one of your passwords, you may be surprised how many results you get).

If your password hash does not appear in one of these databases, there are also rainbow tables which are massive databases of precomputed hashes consisting of every possible password up to 8-10 characters in length, depending on the algorithm. If your password is less than eight characters long, your password surely will be cracked at this stage. However, you will not find 7914881ba9b78fa307db6ef0db675e29 in any of those databases so I am safe so far.

The lesson here is to never use a password less than ten characters long. Never use the same password on multiple systems. Don’t try to be clever with your password, that never works (NCC-1701 is a very common password).

2. The Word List

Since most passwords consist of dictionary words or something similar, checking every word in a dictionary or a specialized wordlist http://svn.isdpodcast.com/wordlists/ is a quick way to find a weak password. Most hackers will use lists of the most common passwords such as this because chances are very high that someone will be using one of those passwords. It normally doesn’t take more than a minute to go through even a gigantic list of words.

In my case, even a Google search for my password turns up nothing so even if you had the massive list of words that Google has indexed you still wouldn’t be able to crack my password.

Considering this, you can see why so many systems simply probihit any password that is a dictionary word.

3. Rules and Patterns

If a dictionary or wordlist check fails, the next step is to try some of the common (albeit innefective) tricks people use to make a password more complex. If you asked me what I thought was the most common password pattern I would say a proper noun (such as a name) followed by 2-3 numbers. So it would be smart for a hacker to take each word in a wordlist and add ever possible number from 1 through 999. If that doesn’t work, you could try reversing each word or doing simple substitutions like using the number 3 instead of the letter e. It really does not take much effort for a cracking program to try hundreds of different patterns.

For example, a dictionary word may be “password” so a rules-based attack my try PASSWORD, dRowssap, P@SSW0RD, p@ssW0rd, dr0Wss@p, passwordpassword, @ssW0rdp, dp@ssW0r, p@9sW0rd, 1p@ssW0rd, p@$$W0rd, ppp@ssW0rd, 1p@ssW0rd, and thousands of other variants of the word. Depending on the number of rules and the size of the wordlist, this step may take only five to ten minutes and will crack a great number of passwords.

If an attacker has sufficient processing power, another effective strategy is to try two dictionary words together with various delimiters between them (such as dashes or periods). If you had a wordlist of 100,000 words and tried every combination of two words that means you would have ten billion possible combinations. Trying different delimiters between the words would make it a little bit harder but not much.

You probably wouldn’t want to try three-word combinations because that would take you up to a quadillion (1,000,000,000,000,000) possible combinations which would not be an effective strategy. In the case of my password there is a number and three other words that would likely appear in a dictionary but testing for four-word combinations would mean there are 100 quintillion (100,000,000,000,000,000,000) possible passwords, so the odds are my password would still be pretty safe.

The lesson here is that a strong password is not a matter of being clever, it is a matter of beating the numbers. Passwords should always contain three or more words or other sequences.

4. Brute Force

If a password hash doesn’t show up in a database or hasn’t been cracked before, does not show up in a list of common passwords or dictionary words (even after trying hundreds of common variants), the only method left is to simply brute-force the password. This means trying every possible combination of letters until you find the password. It would be like trying to crack a simple bicycle lock, you would start with 000 and try 001, 002, 003, and so on until you got to 999.

In the case of passwords you would need to try every combination of lowercase letters, uppercase letters, numbers, and punctuation symbols. In other words, imagine a bicycle lock where each dial contains abcdefghijklmnopqrstuvwxyz ABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()_-+={[}]|:;"'.?/ and there are eight or more dials. This is why so many systems require that you use a variety of characters because using different types of characters is like making each dial larger. And making your password longer is like adding more dials.

 

 

Now brute-force attacks are much smarter nowadays using techniques such as mask-based attacks. These types of attacks basically use knowledge about passwords to make the brute-force process much smarter. For example, if you look at this chart http://xato.net/img/UpperCaseLettersLarge.jpg you will see that uppercase letters are very likely to show up in position 1 but are extremely rare after position 8. Knowing this, it would be more effective to not even bother looking for uppercase letters after the first few characters. Now if you look at the distribution of all character sets in this graph http://xato.net/img/CharacterDistributionByPositionLarge.jpg you can see that much can be done to optimize the brute-force process. Nevertheless, these rules become less and less effective the longer and more complex your password gets.

The big secret here is if you can force a hacker to have to use a brute-force attack and you have a password that is at least 15 characters long, chances are that you have won. Eventually computing power will catch up so that even 15 characters might be enough but the good thing is that these numbers grow exponentially so a 16-character password is almost 100 times stronger than a 15-character password and a 17-character is more than 9,000 times stronger!

So What Makes a Password Strong?

Your password must be something very unique and one that you have never used before. In fact it should be so unique that if you did a Google search for it, there would never be any results. You can’t just take a word and dress it up a bit, you need 3-4 words or other sequences to make a password strong. And finally it has to be long. It helps to throw in some numbers and pumctuation but most importantly it has to be long.

The Worst Password Tips

 

Because I have always been so fascinated with passwords, I always like to hear different tips people have for creating strong passwords. However, I have to admit that most of the tips I run across are actually kind of lame and really are not very secure. Unfortunately, some of these tips are quite popular and get passed around way too much. In fact, I rarely see any advice besides these I have listed. Continue reading “The Worst Password Tips” »