Fingerprints and Passwords: A Guide for Non-Security Experts

iphoneToday Apple announced that the iPhone 5S will have a fingerprint scanner. Many of us in the security community are highly sceptical of this feature, while others saw this as a smart security move. Then of course there are the journalists who see fingerprints as the ultimate password killer. Clearly there is some disagreement here. I thought I’d lay this out for those of you who need to better understand the implications of using fingerprints vs or in addition to passwords.

Biometrics, like usernames and passwords, are a way to identify and authenticate yourself to a system. We all know that passwords can be weak and difficult to manage, which makes it tempting to call every new authentication product a password killer. But despite their flaws, passwords must always play some role in authentication.

The fact is that while passwords do have their flaws, they also have their strengths. The same is true with biometrics. You can’t just replace passwords with fingerprints and say you’ve solved the problem because you have introduced a few new problems.

To clarify this, below is a table that compares the characteristics of biometrics vs passwords, with check marks where one method has a clear advantage:

Passwords Biometrics
Difficult to remember Don’t have to remember 
Requires unique passwords for each system Can be used on every system 
Nothing else to carry around Nothing else to carry around
Take time to type Easy to swipe/sense 
Prone to typing errors Prone to sensor or algorithm errors
Immune to false positives  Susceptible to false positives
Easy to enroll  Some effort to enroll
Easy to change  Impossible to change
Can be shared among users 1  Cannot be shared 
Can be used without your knowledge Less likely to be used without your knowledge 
Cheap to implement  Requires hardware sensors
Work anywhere including browsers & mobile  Require separate implementation
Mature security practice  Still evolving
Non-proprietary  Proprietary
Susceptible to physical observation Susceptible to public observation
Susceptible to brute force attacks Resistant to brute force attacks 
Can be stored as hashes by untrusted third party  Third party must have access to raw data
Cannot personally identify you  Could identify you in the real world
Allow for multiple accounts  Cannot use to create multiple accounts
Can be forgotten; password dies with a person Susceptible to injuries, aging, and death
Susceptible to replay attacks Susceptible to replay attacks
Susceptible to weak implementations Susceptible to weak implementations
Not universally accessible to everyone Not universally accessible to everyone
Susceptible to poor user security practices Not susceptible to poor practices 
Lacks non-repudiation Moderate non-repudiation 
1 Can be both a strength and a weakness

 

What Does This Tell Us?

As you can see, biometrics clearly are not the best replacement for passwords, which is why so many security experts cringe when every biometrics company in their press releases claim themselves as the ultimate password killer. Biometrics do have some clear advantages over passwords, but they also have numerous disadvantages; they both can be weak and yet each can be strong, depending on the situation. Now the list above is not weighted–certainly some of the items are more important than others–but the point here is that you can’t simply compare passwords to biometrics and say that one is better than the other.

However, one thing you can say is that when you use passwords together with biometrics, you have something that is significantly stronger than either of the two alone. This is because you get the advantages of both techniques and only a few of the disadvantages. For example, we all know that you can’t change your fingerprint if compromised, but pair it with a password and you can change that password. Using these two together is referred to as two-factor authentication: something you know plus something you are.

It’s not clear, however, if the Apple implementation will allow for you to use both a fingerprint and password (or PIN) together.

Now specifically talking about the iPhone’s implementation of a fingerprint sensor, there are some interesting points to note. First, including it on the phone makes up for some of the usual biometric disadvantages such as enrollment, having special hardware sensors, and privacy issues due to only storing that data locally. Another interesting fact is that the phone itself is actually a third factor of authentication: something you possess. When combined with the other two factors it becomes an extremely reliable form of identification for use with other systems. A compromise would require being in physical possession of your phone, having your fingerprint, and knowing your PIN.

Ultimately, the security of the fingerprint scanner largely depends on the implementation, but even if it isn’t perfect, it is better than those millions of phones with no protection at all.

There is the issue of security that some have brought up: is this just a method for the NSA to build a master fingerprint database? Apple’s implementation encrypts and stores fingerprint locally using trusted hardware. Whether this is actually secure remains to be seen, but keep in mind that your fingerprints aren’t really that private: you literally leave them on everything you touch.

 

 

So What Exactly Did The US Government Ask Lavabit to Do?

The recent shutdown of Lavabit’s email services prompted a flurry of reporting and speculation about the extent US Government spying, mostly due to the mysterious statement by Lavabit founder Ladar Levison:

Most of us saw this as yet another possibly overhyped government spying issue and didn’t really think too much of it. Much of the media coverage is already starting to die down but there still is some question as to exactly what the government required of Levison that left him with only one option: shutting down his entire business he built from ground up. I wondered if there were enough clues out there to get some more insight into this case. I started by looking at exactly what Lavabit offered and how that all worked behind the scenes.

Lavabit Encryption

Lavabit claimed they had “developed a system so secure that it prevents everyone, including us, from reading the e-mail of the people that use it. ” This is a bold claim and one that surely was a primary selling point for their services.

The way it worked is relatively simple: Lavabit encrypted all incoming mail with the user’s public key before storing the message on their servers. Only the user, with the private key and password could decrypt messages. Normally with encrypted email, users store private keys on their own computers, but it appears that in the case of Lavabit, they stored the users’ private keys, each encrypted with a hash of that user’s password. This is by no means the most secure way of doing this, but it dramatically increases transparency and usability for the user. By doing this, for example, users do not need to worry about private keys and they still have access to their email from any computer.

So let’s break this down: a user logs in with their password. This login might occur via POP3, IMAP4, or through the web interface (which in turn connected internally via IMAP). Because Lavabit used the user’s password to encrypt the private key, they will need the original plaintext password which means they would not be able to support any secure authentication methods. In other words, all clients must send passwords using AUTH PLAIN or AUTH LOGIN with nothing more than base64 encoding. The webmail interface appears to have been available as both SSL and non-SSL and the POP3, IMAP4, and SMTP interfaces all seem to have accepted connections with or without SSL. All SSL connections terminated at the application tier.

Once a user sends a password, the Lavabit servers create SHA-512 hashes explained as follows:

… Lavabit combines the password with the account name and a cryptographic salt. This combined string is then hashed three consecutive times, with the former iteration’s output being used as the input value of the next iteration. The output of the first hash iteration is used as the secret passphrase for AES [encryption of the private key]. The third iteration is stored in our password database and is used to verify that users entered their password correctly.

The process they describe produces two hashes: one for decrypting the user’s private key and after two more hashing iterations, a hash to store in the database for user authentication. While this is a fairly secure process, given strong user passwords, it does weaken Lavabit’s claim that even their administrators couldn’t read your email. In reality all it would take is a few lines of code code to log the user’s original password which allows you to decrypt the private key which in turn allows you to receive and send mail as that user as well as access any stored messages.

The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security.

It is important to note that the scope of Lavabit’s encryption was limited to storage on it’s own servers. The public keys were for internal use and not something you published for others to use. Full protection would require employing PGP or S/MIME and having untapped SSL connections between all intermediate servers. On the other hand, if an email was sent through Lavabit already using PGP or S/MIME encryption, they would never be able to intercept or read those emails.

The question here is what exactly did the government request Levison to do that was so bad that he’d rather shut down his entire business? What information could Lavabit even produce that would be of interest to a government agency? Unencrypted emails, customer IP addresses, customer payment methods, and customer passwords. Based on media statements, it appears that he would be required to provide unencrypted copies of all emails going through his system.

Let’s look at some quotes levison has given to various media outlets. First, here are some quotes from an interview with CNET:

“We’ve had a couple of dozen court orders served to us over the past 10 years, but they’ve never crossed the line.”

“Philosophically, I put myself in a position that I was comfortable turning over the information that I had. I built Lavabit in a reaction to the original Patriot Act.”

“Where the government would hypothetically cross the line is to violate the privacy of all of my users. This is not about protecting a single person or persons, it’s about protecting all my users. What level of access to this nation does the government have?”

“Why should I collect that info if I didn’t need it? [That philosophy] also governed what kind of information I logged.”

“Unfortunately, what’s become clear is that there’s no protections in our current body of law to keep the government from compelling us to provide the information necessary to decrypt those communications in secret.”

“If you knew what I know about e-mail, you might not use it either.”

In an article from NBC News, we have this:

Levison stressed that he has complied with “upwards of two dozen court orders” for information in the past that were targeted at “specific users” and that “I never had a problem with that.” But without disclosing details, he suggested that the order he received more recently was markedly different, requiring him to cooperate in broadly based surveillance that would scoop up information about all the users of his service. He likened the demands to a requirement to install a tap on his telephone. Those demands apparently began about the time that Snowden surfaced as one of his customers, apparently triggering a secret legal battle between Levison and federal prosecutors.

And finally in an interview with RT he said:

I think the amount of information that they’re collecting on people that they have no right to collect information on is the most alarming thing,” he told RT. “I mean, the Fourth Amendment is supposed to guarantee that our government will only conduct surveillance on people in which it has a probable suspicion or evidence that they are committing some crime, and that that evidence has been reviewed by a judge and signed off by a judge before that surveillance begins. And if there’s anything alarming, it’s that now that’s all being done after the fact. Everything’s being recorded, and then a judge can after the fact say it’s okay to go look at the information.

Given the above information, let’s analyze some of the facts we know:

  • The government asked Lavabit to do something which levison considered to be a crime against the American people.
  • Levison was comfortable and had complied with warrants requesting information on specific users.
  • Levison told Forbes that “This is about protecting all of our users, not just one in particular.”
  • Levison is not even able to reveal some details with his own attorney or employees.
  • Shutting down operations was an option to circumspect compliance, although there was a veiled threat he could be arrested for doing so.
  • He did not delete customer data, he still has that in his possession so this was a request for ongoing surveillance.
  • This was a court order, which levison is fighting through the US Court of Appeals for the Fourth Circuit.
  • Levison compared the request to installing a tap on his telephone.

Apparently what made Levison uncomfortable with the request was that the fact that it collected information about all users, without regards to a warrant. Presumably law enforcement wanted to collect all data that they would later retroactively view as necessary once they had a warrant. The two issues here are that the Government wanted to collect information on innocent users (including Levison himself) and Levison would be out of the loop completely, taking away his control over what information he provided to law enforcement. These were the lines the Government crossed.

What’s interesting here is that Lavabit terminated the SSL connections right on the application servers themselves. These are the servers that also performed the encryption of email messages. Because of that, a regular network tap would be ineffective. The only way to perform the broad surveillance Levinson objected to would be (in order of likelihood) :

  1. Force Lavabit to provide their private SSL keys and route all their traffic through a government machine that performed a man-in-the-middle style data collection;
  2. Change their software to subvert Lavabit’s own security measures and log emails after SSL decryption but before encrypting with the users’ public keys; or
  3. Require Lavabit to install malicious code to infect their own customers with government-supplied malware.

Sure, this could have been a simple request to put a black box on Lavabit’s network and Levinson is just overreacting, but the evidence doesn’t seem to indicate that. Regardless of which of the requests the Government made, they would all make Levison’s entire business a lie; all efforts to encrypt messages would be pointless. Surely there were some heated words spoken when the Department of Justice heard about Levison’s decision, but this is not an act of civil disobedience on Levison’s part, his personal integrity was on the line. Compliance would make his very reason for running Lavabit a deception; a government-sponsored fraud.

While Lavabit initially had quite a bit of media coverage over this issue, the hype seems to be a casualty of our frenzied newscycle. But after looking closely at the facts here, I now see that this is a monumentally important issue, one that the media needs to once again address. The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security. They can say what they want about security on their web sites, it means nothing. If they did it to Lavabit, how many hundreds or thousands of other US companies already participate in this deception?

If the courts can force a business to lie, we can never again trust the security claims of any US company. The reason so many businesses specifically rely on US services is the sense of stability and trust. How sad that an overreaching and panicked pursuit of a whistleblower has thrown that all away.

This issue is so much more than a simple civil liberties dispute, it is the integrity of a nation at stake. We walked with the devil in a time of need–that is a legacy we must live with–but at what point do we sever that relationship and return to the integrity required to lead the world through respect and not by fear?

 

Lavabit,png

2004-2013

 

UPDATE: Since publishing this post, this Wired article has since revealed that in fact Lavabit was required to supply their private SSL keys as suspected above.

 

Should You Ditch LastPass?

LastPassSteve Thomas, aka Sc00bz, has brought up some very interesting issues about the LastPass password monitor that are causing some confusion so I thought I’d give another perspective on the issue.

Summary of Steve’s points:

  1. When you use the LastPass web site to login to your account, your web browser will first send a hash with a single iteration, no matter how many iterations you have set for your account. It isn’t until this hash fails that the browser tells the user the correct number of iterations to use.
  2. LastPass has a default setting of 500 iterations (at least at that time, now it recommends 5000 iterations).
  3. The extension should warn you if it is going to send a hash with fewer iterations than what you have set.
  4. LastPass does not encrypt the URLs of sites stored in your password database

LastPass hashes your password rather than sending the plain text to the server when you login. The algorithm it uses is sha256(sha256(email + password) + password). This hash, while not necessarily insecure, can be cracked in a reasonable amount of time with ordinary hardware, unless the user has a relatively strong password. It isn’t until after this single iteration hash is sent that the LastPass server responds and tells the browser exactly how many iterations it should use; hash is sent again using the correct number of iterations. More iterations means it will take much more time to crack your password. A good minimum number of iterations is 5,000. If you go too high with the number of iterations, some clients such as mobile phones may be very slow logging in.

This is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass

The mitigating factors here are:

  1. You are logging in via SSL so the primary threats here are a MitM attack with spoofed SSL certificates, a government warrant, or a government spy agency.
  2. They still need to crack your hash so if you have a very strong password, even a single iteration hash could provide a reasonable amount of protection.
  3. A second factor of authentication, country restrictions, blocking tor logins,  restricting mobile access, and other settings still protect your account from unauthorized logins, unless the attacker is able to obtain your stored hashes through hacking, warrant, or spying.

One thing I might also add is that the server is telling the client how many iterations it expects, so this does make an attack much easier if someone acquires your hash.

My opinion is that this is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass altogether unless your inpidual threat model includes the NSA or other government agencies. The LastPass plugin should identify when someone is logging in to LastPass via the web login and provide the client-side script with the correct number of iterations. The server should never respond with this at all. However, if someone is logging in through a web browser that doesn’t have LastPass with their account data installed, the server sending the number of iterations is the only option.

The only proper solution here is to have your primary login different than the decryption login, at least for accessing the web interface if not everywhere. That way, the number of iterations is never publicly revealed and sending a single iteration hash would be unnecessary. Other companies such as RoboForm use this method. I have always wanted this feature and I would highly recommend LastPass implement this if it is feasible.

As for the other points, the default iteration count mentioned in number 2 has been addressed and the warning mentioned in number 3 would be a good thing to add if it already hasn’t but this would not be possible if using a web browser with your LastPass account installation.

iterations

As for encrypting URLs mentioned in number 4, LastPass’s response was that this is necessary to grab favicons. Although unencrypted URLs may not be an issue, there certainly are scenarios where you would want these encrypted. LastPass should make this an option for the user.

LastPass does provide strong security controls, although there clearly is room for improvement. If you do not find LastPass to be secure enough, the only reasonable alternative I would recommend is KeePass, which puts you in complete control over your data while still being quite usable. I would not recommend ditching LastPass, but I would recommend that LastPass address these issues. I would also recommend that Steve Thomas keep up the great research he provides to the community.

I have not heard a recent response from LastPass on these issues but would love to hear from them. I will update this post if and when I do.


Disclosure: I am a LastPass user and I get a free month of premium service whenever someone clicks on banners located on this site. I have no affiliation with LastPass and receive no other compensation from the company. 

93% of the Top 10,000 in the LinkedIn List

I would like to welcome LinkedIn to the not-so-exclusive club of major web sites that have experienced major password leaks. Like any other major leak it is hard to visit any forum or tech blog without seeing some mention of it. And like any other leak my inbox is starting to fill up with press requests for comments.

But what is interesting here is that there’s nothing interesting here. It’s the same thing we have seen so many times in the past and surely will continue to see.

One thing that highlights this, brought up to me by blogger Johnvey Hwang, is that 93% of the passwords in my Top 10,000 passwords list appear in the LinkedIn hashes dump. Here it is in his words:

I was curious as to what percentage of the most common passwords were present in this dump, as a proxy for gauging the password choices for a supposedly more professional population. A quick search led me to security guy Mark Burnett, who maintains a list of the top 10,000 most used passwords across the internet. He admits to some skew caused by a significant amount of sourcing from adult websites, but I don’t think it really matters.

The fact that such a large number of the LinkedIn passwords appear on the top 10,000 list certainly does help validate my data but more importantly it shows that despite all we have learned, very little has ever changed.

Here are some other interesting facts Johnvey discovered about the list:

  • 7,142 of the most common passwords were present
  • 546 of the most common passwords were not present
  • 2,312 of the most common passwords were too short for LinkedIn’s 6 character minimum

I think that 93% is an amazing number, yet again, the biggest story here is that nothing really has changed.

 

Sidenote:

I personally have three LinkedIn accounts that I maintain. None of those three passwords appear on the list. Apparently the list is not complete, but the question now is what criteria put those particular passwords on the list.

My Password is 4.hub.route.edu.

Password security has always been a hot issue but events in the last few years have made it an even more pressing issue to a greater number of people. When I hear receptionists in a doctor’s office sharing strategies for creating secure passwords I know this is now beyond the realm of network administrators and security professionals.

But one thing I have noticed is that many people don’t truly understand why one password can be so much stronger than another so I thought I would walk through the process of cracking a password. In this case, I decided to use as an example the very password that (until I wrote this) I use for the admin account on this blog.

So like I said in the title, my password is 4.hub.route.edu.

That isn’t the best password I have come up with but it is still fairly strong. It is 15 characters long, contains a number, letters, and some periods. It took me just a couple logins to actually memorize that password. The word components are fast to type because we are trained to type in whole words. And there are four parts, each one ending with a period. The repetition of the period helps the memory process.

Chances are that no one would be able to go to my admin page (which itself is protected by a different password) and just guess that, no matter how much they knew about me and no matter how many of my other passwords they knew because I have never used that password anywhere else. As of writing this article, I can do a Google search for “4.hub.route.edu” and there will be no results.

But the real risk isn’t someone being able to keep trying to guess my password via the admin page, the real risk is someone finding a new 0-day exploit that allows them to dump the users table in my database and get the hash of my password (which happens to be $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/). If someone can get your hash, they can now try millions of password combinations without you ever knowing it.

Cracking a password hash is a lot like trying keys in a lock. A hash is a string of characters derived from your password that is calculated in such a way that it is nearly impossible to work backwards to discover the original password so it is relatively safe to store. When you log in to a system, it will run the password you enter through this same complex formula and the result should be the same.

So when I first created my password on this blog I entered 4.hub.route.edu. WordPress ran it through these formulas and came up with the hash $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/ which it saved it in the database. The next time I log in, I enter my 4.hub.route.edu password, WordPress runs the same formula on that password and it comes up with $P$9YCJ/QwbFcgbo7OtfWGYYE8sVJBxtF/ which matches the hash it has stored so it knows that I am using the correct password even though WordPress never stored my actual password. Now what is special about these formulas is that it is extremely rare that any two passwords will create the exact same hash (a concept known as collision).

So if someone is able to obtain my hash, they can’t directly get my password from that, but they can try millions or even billions of different passwords and run each one through the formula until they find one that produces that exact same hash. It is a lot like having a lock, you can’t easily create a key from it but you can try a bunch of keys until you find the one that works.

Now when it comes to passwords there are actually hundreds of trillions of possible passwords someone might choose. Even with a cluster of powerful computers it could take decades to try every possible password. Fortunately for hackers, most people aren’t that clever with their passwords. There are a number of strategies they use that can drastically reduce the number of passwords you need to test to crack a password. Below is that strategy

1. Hash Lookup

First, an attacker will check to see if someone else has cracked the password before, using either a local database or an online database such as onlinehashcrack.com or hash-database.net or one of the hundreds of other similar sites. In the past few years there have been many large sites that have been hacked and their passwords leaked. If you password was ever one of these, chances are it will appear in one of these databases. Likewise, if you select a common password that others may also be using, it also might be on this list.

In the case of WordPress, the hashes are created using PHPASS but for the sake of this example, let’s just assume they use MD5 hashes like many other systems use. The MD5 hash for my password 4.hub.route.edu is 7914881ba9b78fa307db6ef0db675e29. You can search any online databases for my hash and you will not find it listed anywhere (at least at the time of writing). If your password is one that you have never used before and others likely have not used, you should be safe (try googling one of your passwords, you may be surprised how many results you get).

If your password hash does not appear in one of these databases, there are also rainbow tables which are massive databases of precomputed hashes consisting of every possible password up to 8-10 characters in length, depending on the algorithm. If your password is less than eight characters long, your password surely will be cracked at this stage. However, you will not find 7914881ba9b78fa307db6ef0db675e29 in any of those databases so I am safe so far.

The lesson here is to never use a password less than ten characters long. Never use the same password on multiple systems. Don’t try to be clever with your password, that never works (NCC-1701 is a very common password).

2. The Word List

Since most passwords consist of dictionary words or something similar, checking every word in a dictionary or a specialized wordlist http://svn.isdpodcast.com/wordlists/ is a quick way to find a weak password. Most hackers will use lists of the most common passwords such as this because chances are very high that someone will be using one of those passwords. It normally doesn’t take more than a minute to go through even a gigantic list of words.

In my case, even a Google search for my password turns up nothing so even if you had the massive list of words that Google has indexed you still wouldn’t be able to crack my password.

Considering this, you can see why so many systems simply probihit any password that is a dictionary word.

3. Rules and Patterns

If a dictionary or wordlist check fails, the next step is to try some of the common (albeit innefective) tricks people use to make a password more complex. If you asked me what I thought was the most common password pattern I would say a proper noun (such as a name) followed by 2-3 numbers. So it would be smart for a hacker to take each word in a wordlist and add ever possible number from 1 through 999. If that doesn’t work, you could try reversing each word or doing simple substitutions like using the number 3 instead of the letter e. It really does not take much effort for a cracking program to try hundreds of different patterns.

For example, a dictionary word may be “password” so a rules-based attack my try PASSWORD, dRowssap, P@SSW0RD, p@ssW0rd, dr0Wss@p, passwordpassword, @ssW0rdp, dp@ssW0r, p@9sW0rd, 1p@ssW0rd, p@$$W0rd, ppp@ssW0rd, 1p@ssW0rd, and thousands of other variants of the word. Depending on the number of rules and the size of the wordlist, this step may take only five to ten minutes and will crack a great number of passwords.

If an attacker has sufficient processing power, another effective strategy is to try two dictionary words together with various delimiters between them (such as dashes or periods). If you had a wordlist of 100,000 words and tried every combination of two words that means you would have ten billion possible combinations. Trying different delimiters between the words would make it a little bit harder but not much.

You probably wouldn’t want to try three-word combinations because that would take you up to a quadillion (1,000,000,000,000,000) possible combinations which would not be an effective strategy. In the case of my password there is a number and three other words that would likely appear in a dictionary but testing for four-word combinations would mean there are 100 quintillion (100,000,000,000,000,000,000) possible passwords, so the odds are my password would still be pretty safe.

The lesson here is that a strong password is not a matter of being clever, it is a matter of beating the numbers. Passwords should always contain three or more words or other sequences.

4. Brute Force

If a password hash doesn’t show up in a database or hasn’t been cracked before, does not show up in a list of common passwords or dictionary words (even after trying hundreds of common variants), the only method left is to simply brute-force the password. This means trying every possible combination of letters until you find the password. It would be like trying to crack a simple bicycle lock, you would start with 000 and try 001, 002, 003, and so on until you got to 999.

In the case of passwords you would need to try every combination of lowercase letters, uppercase letters, numbers, and punctuation symbols. In other words, imagine a bicycle lock where each dial contains abcdefghijklmnopqrstuvwxyz ABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()_-+={[}]|:;"'.?/ and there are eight or more dials. This is why so many systems require that you use a variety of characters because using different types of characters is like making each dial larger. And making your password longer is like adding more dials.

 

 

Now brute-force attacks are much smarter nowadays using techniques such as mask-based attacks. These types of attacks basically use knowledge about passwords to make the brute-force process much smarter. For example, if you look at this chart http://xato.net/img/UpperCaseLettersLarge.jpg you will see that uppercase letters are very likely to show up in position 1 but are extremely rare after position 8. Knowing this, it would be more effective to not even bother looking for uppercase letters after the first few characters. Now if you look at the distribution of all character sets in this graph http://xato.net/img/CharacterDistributionByPositionLarge.jpg you can see that much can be done to optimize the brute-force process. Nevertheless, these rules become less and less effective the longer and more complex your password gets.

The big secret here is if you can force a hacker to have to use a brute-force attack and you have a password that is at least 15 characters long, chances are that you have won. Eventually computing power will catch up so that even 15 characters might be enough but the good thing is that these numbers grow exponentially so a 16-character password is almost 100 times stronger than a 15-character password and a 17-character is more than 9,000 times stronger!

So What Makes a Password Strong?

Your password must be something very unique and one that you have never used before. In fact it should be so unique that if you did a Google search for it, there would never be any results. You can’t just take a word and dress it up a bit, you need 3-4 words or other sequences to make a password strong. And finally it has to be long. It helps to throw in some numbers and pumctuation but most importantly it has to be long.

How I Collect Passwords

Some of you out there know that I have been collecting passwords for quite some time. Since 1998 to be exact. Originally I did it just to have big wordlists for password cracking, then I started gathering them for research on my Perfect Passwords book, finally it became like a big ball of string where you just do it because it makes no sense to stop now. My list currently contains about 6 million unique username/password combinations (not counting those from public lists from Gawker, RockYou, and others).

So I thought that some people might be interested in how I collect these passwords. Note that all of these passwords have already been made public and can easily be found by anyone. There are no passwords on my list that have not already been made public. Also note that so far I have never shared this list with anyone.

  1. I use tools such as Athena, which does massive Google searches for and collects passwords in the format “http://user:password@example.com/members”. This tool can easily gather 200,000 combos in a day but the majority of these are already in my database. I run this about once a month.
  2. I have a script that nightly leeches from a huge list of well-known password sharing web sites.
  3. I use a number of Google alerts that watch for common keylogger log formats. This is just one of many that I use. There are a surprisingly huge number of these logs that can be found via Google, although it is sometimes difficult to parse the passwords from the content.
  4. I use Google alerts to watch for SQL database dumps of forum and other common software databases.
  5. I also use Google alerts to look for passwords on pastebin.com and other related sites.
  6. I use a script that grabs all the Google alerts as RSS feeds and parses out URLs, then another script visits each site and leeches the passwords.
  7. I use RSS feeds from filestube.com to watch for and download password lists that might show up on a number of file sharing sites.
  8. I use RSS feeds from various torrent searches that I put into uTorrent to download automatically.
  9. I use a number of IRC bots that hang out in a large number of IRC channels where password sharing happens. These aren’t as effective as they once were but I still use them occasionally.
  10. I use a script to automatically download posts from various Usenet newsgroups, although most of those are just spam nowadays.
  11. I visit a number of public and private hacking-related forums to get wordlists and hacked passwords. I often pay for VIP memberships (usually the lifetime ones) so that I can access premium content areas. Leeching from forums has to be done manually, because you often have to comment on posts to be able to download the lists, but occasionally I will spend half a day leeching from these forums. Some forums will let you subscribe to posts and will include the entire post contents in the email. This bypasses the often-used “hide hack” and I can just use another script to save that inbox to local files.
  12. I use various FTP search engines to watch for interesting filenames that might show up on FTP sites.
  13. In the past I have used various P2P networks (such as LimeWire) to search for files but those don’t produce many results nowadays.
  14. Every once in a while someone will send me a big dump of their own lists they have collected.

As these scripts collect data, it is all dumped into a directory on my hard drive and regularly I run program I wrote that parses all the data looking for password is common formats.

Here are some examples of what the program recognizes:

http://www.example.com/members/ L:user1 P:password1
http://www.example.com/members login:user1 password:password1
http://www.example.com/members user: user1 pass:  password1
Login: user1 passw:password1
L:user1 P:password1
username:user1 password:password1
http://www.example.com/members L: user1 P:  password1
username = user1  password= password1
u=user1 p=password1
username    user1  password    password1
login id: user1 password: password1

It grabs the username/password combos and saves them into text log file. After a while these files accumulate and I merge them into my master database. In the database I perform cleanup steps such as removing passwords from well-known password hackers (such as pr0test) and other junk that might appear. I also strip domain names off usernames that are email addresses.

What is interesting about all this is how difficult it is to find new username/passwords combos that aren’t already on my list. These scripts can easily collecting 100,000 unique username/password combos every day, but only a few thousand of those are not already on my list.

After 12+ years of collecting passwords, I have found a few interesting facts:

  • Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.
  • Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).
  • The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.

There are a few flaws with my list that I should point out:

  • Many of these passwords have been cracked from hashes so a good percentage of them would by nature be crackable, skewing the statistics some.
  • These passwords are largely dominated by passwords from adult web sites, which are the ones mostly publicly shared. This results in a higher percentage of adult-related and obscene passwords.
  • These passwords are usually from web sites that often do not enforce strong passwords policies that a private organization might. This is bad because this data doesn’t truly reflect all passwords, but on the other hand it shows the kind of passwords users will select if a password policy is not enforced.
  • My scripts only grab usernames and passwords between 3 and 30 characters long, all others are thrown out.
  • None of the passwords contain a colon, because that is the delimiter used to separate usernames and passwords in the combo lists my scripts generate.

So that is how I collect my passwords, maybe someday I will share the list itself.

Incidentally, the one tool I really wish I had time to build is either a proxy server or a Greasemonkey script that will automatically parse and log usernames and password combos from web pages that you visit. That would be extremely helpful!

 Update (4/25/12): Google has recently changed things that resulted in breaking several of the tools listed here. Now I collect many of my passwords using google alerts and custom searches turned into RSS feeds and automatically added into a private WordPress blog via AutoBlogged. Before each post is added it runs through a tool I have developed (which I will share eventually) that returns just the username/password combos. I can then use the RSS feed from that private blog as a raw combos list to merge into my master list.

 

 

Another Strange Password Policy

It still amazes me that after all the education over the years that there are still so many poor password policies out there. Anyone who has ever filled out a web form likely has run into these overly complex and frustrating passwords policies.

But sometimes a password policy is an indication of a bigger problem. For example, today I was setting up an account and entered a very strong password and was presented with the following error message:

Apparently what caused the error is that I used a period in my password and this policy only allows for numbers and letters. But the bigger question here is why doesn’t the policy allow for numbers? Why does the password have to start with a letter? And why is there a limit of 20 characters?

The reasons why these concern me is that they sound more like technical limitations rather than being motivated by strict password security. Normally when you store a password, you first create a hash of the passsword and then store the hash. The nice thing about hashes is that being hexidecimal values you don’t have to worry about the security risks of special metacharacters and symbols. You also don’t care about maximum length because the hash is always a fixed length, whether your password is 10 characters or 100 characters.

However, if you are worrying about non-alphanumeric characters and the password length, that suspiciously sounds like they are storing the password itself, not a hash of the password. Being a healthcare orginazation this is a big deal. And although we like to think that most big companies have security teams that prevent things like this, the recent announcement that Sony stored passwords in plaintext tells us otherwise.

Secret Questions
A little lower on the page I ran across another problem, they are letting the users select the secret question as shown here:

The problem with this is that most users are not qualified to come up with quality secret questions. At best you will see questions asking for their favorite color (how many colors are there really?), their dog’s name (just look on their Facebook profile), or where do they live. To make matters worse, a shockingly high number of people will actually put the answer as the question itself, as a hint. If you disagree with me on this and have a site that lets users set their own secret questions, check your database and you will be surprised how bad they can be and how often they reveal the answer in the question itself. This is one of those little secrets that hackers have known for years.

Finally, the worst offense is the way in which this site lets you recover your password:

If you click on the Forgot Password link, you are shown the above form which lets you set a new password if you know the secret question. The problem with this is that this makes the secret question as powerful as the password itself, because just knowing the answer lets you set a new password. Normally, answering a secret question will initiate a process whereby the registered email account receives a message that a password reset was initiated and that they need to click a link to finish the process.

The reason we do this is because secret questions are not secure. The information in a secret question is something that is easily discoverable, has a limited number of possible answers, and is a fact that will never change. We can only partially compensate for these problems by sending an e-mail notification to the user and requiring a click through.

In this particular case, the user sets their own secret question which has a high chance of being insecure and then all that is needed to set a new password is to be able to guess that answer, no email access required.

The sad thing is this company has already appeared in this database once before, I hope it doesn’t happen again, especially not with my medical information.

Does Windows Server 2003 Even Need Hardening?

Many people tell me they are surprised with how much effort I put into hardening Windows Server 2003–the last hardening document I wrote for a client was 112 pages long. That’s not 112 pages of writing, policy, and how-to’s, that’s 112 pages of nothing but settings. The process itself involves the modification, removal, or locking down of over 5,000 Registry keys and system files. Continue reading “Does Windows Server 2003 Even Need Hardening?” »

Long passwords are strong passwords

I noticed that Schneier wrote a bit on choosing passwords and gets into some detail on how to secure a password based on some of the techniques used to crack passwords.

His specific advice is:

“…if you want your password to be hard to guess, you should choose something not on any of the root or appendage lists. You should mix upper and lowercase in the middle of your root. You should add numbers and symbols in the middle of your root, not as common substitutions. Or drop your appendage in the middle of your root. Or use two roots with an appendage in the middle.”

While I certainly do agree with the validity of this advice, if you are an administrator, I wouldn’t recommend telling users to “drop their appendages in the middle of their roots.” Here’s some more practical advice: tell them to choose long passwords. Continue reading “Long passwords are strong passwords” »