A Glimpse Into the World of Internet Password Dumps

Reading the comments around the internet about my release of 10 million passwords, I realized that perhaps some people don’t quite grasp how bad the situation really is. It’s really bad. The target audience of my original article was IT security professionals and network administrators who see this stuff on a daily basis, but the news of my data release has reached far beyond that audience which has brought to my attention some misunderstanding of the context of my data release. I thought maybe it would be helpful for people to get a glimpse into what I see as I collect passwords.

My main source for passwords in the last few years has been Pastebin and similar sites. Pastebin is a web site where you can paste text data to share with others. You can also do so anonymously. There are twitter bots and web sites that monitor new pastes and look for hackers leaking—or dumping—sensitive data they have stolen. On a typical day it is common to see around a hundred of these leaks, about half of those contain both usernames and unencrypted passwords—often referred to as combos—that I collect.

Dump Monitor Twitter Bot

Dump Monitor Twitter Bot


Because the data may show up in various formats, I have to parse it which I do with a tool I wrote named Hurl. This tool recognizes many different dump formats and parses out the usernames and passwords. Here is an example of the types of formats it recognizes.

What’s interesting about Pastebin is the number of scrapers there really are out there. Within less than a minute of posting this paste there were 81 views. After a few more minutes there were 173 views as shown below. As you can see there are more than a few people monitoring this stuff. Want to set up your own scraper? The Dumpmon source code is available although I personally prefer Pystemon.

Pastebin Scraper Views

Pastebin Scraper Views

Pastebin scrapers only catch pastes when they include the actual data in the paste. Sometimes it is just a link to another file so it is important to monitor links as well. Further, by taking those pastes and seeing who links to them you can find sources, such as twitter accounts, that announce these things. I monitor those as well.

After Pastebin there are several sites I keep up with that post leaks and stolen databases. Below is a screenshot of one of these sites.

Database Dumps

Database Dumps

I then take the names of those files and set up google alerts (and sometimes Pastebin alerts) for them. This often leads me to file collections such as the two below:

A collection of password files

A collection of password files

A collection of password files

Another collection of password files

I also alert on combos that certain hackers frequently use to create accounts such as Cucum01:Ber02, zolushka:natasha, and many others. These combos are so common in password lists they always lead to more passwords. Take a look at this Google search and you’ll see how prevalent these are, the alerts keep my inbox full.

Furthermore, there are hundreds of forums that share passwords. Perhaps a few screenshots are the best way to see just how many passwords people are sharing:







Password Sharing Forum

Of course, this is only a small sampling of forums. There are more than any one person could ever monitor.

There are also hundreds of thousands of web sites that share hacked passwords for gaming, video, porn, and file sharing sites. These don’t always produce the best quality passwords, but I do have scripts to scrape a number of these sites. In a single day those scripts can produce well over a million passwords.

If you were shocked by my releasing password data, take an hour exploring the internet and you will see that 10 million passwords really is a drop in a bucket, even a drop in a thousand buckets. Keep in mind that a big part of the effort in producing my data was getting it all the way down to 10 million in a balanced manner (I couldn’t just remove millions from the end of the file). It took me about three weeks to whittle down and then sanitize the data.

What I have shown here is only a small number of sources available out there. Most of the forums listed above provide “VIP” access for a monthly payment. If you want to spend a little money you have access to tens of millions more passwords than the freebies shared publicly. There are also IRC channels, Usenet groups, torrents, file sharing sites, and of course a number of hidden sources on Tor.

Now not all of these passwords are plaintext. Many dumps include passwords in a hashed format that requires you to crack them yourself. But that’s no problem, there are tools such as Hashcat and John the Ripper as well as wordlists out there that make this a trivial task.

If this isn’t already overwhelming, keep in mind that this is just the stuff that certain hackers have decided to make public. Surely the troves of accounts that have been hacked over the years completely dwarf what has been publicly shared. There could be billions, or tens of billions more accounts that have been hacked. If you are worried that the data I released contains your password, you still aren’t worried enough. There is a very good chance your passwords have been hacked, go change them.

So who besides me collects these passwords? Use your imagination.




Today I Am Releasing Ten Million Passwords

Frequently I get requests from students and security researchers to get a copy of my password research data. I typically decline to share the passwords but for quite some time I have wanted to provide a clean set of data to share with the world. A carefully-selected set of data provides great insight into user behavior and is valuable for furthering password security. So I built a data set of ten million usernames and passwords that I am releasing to the public domain.
Common PasswordsBut recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to.

“This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution”

In 2011 and 2012 news stories about Anonymous, Wikileaks, LulzSec, and other groups were daily increasing and the FBI was looking more and more incompetent to the public. With these groups becoming more bold and boastful and pressure on the FBI building, it wasn’t too surprising to see Brown arrested. He was close to Anonymous and was in fact their spokesman. The FBI took advantage of him linking to a data dump to initiate charges of identity theft and trafficking of authentication features. Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.

At Brown’s sentencing, Judge Lindsay was quoted as saying “What took place is not going to chill any 1st Amendment expression by Journalists.” But he was so wrong. Brown’s arrest and prosecution had a substantial chilling effect on journalism. Some journalists have simply stopped reporting on hacks from fear of retribution and others who still do are forced to employ extraordinary measures to protect themselves from prosecution.

Which brings me back to these ten million passwords.

Why the FBI Shouldn’t Arrest Me

Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone. Most researchers are afraid to publish usernames and passwords together because combined they become an authentication feature. If simply linking to already released authentication features in a private IRC channel was considered trafficking, surely the FBI would consider releasing the actual data to the public a crime.

But is it against the law? There are several statutes that the government used against brown as summarized by the Digital Media Law Project:

Count One: Traffic in Stolen Authentication Features, 18 U.S.C. §§ 1028(a)(2), (b)(1)(B), and (c)(3)(A); Aid and Abet, 18 U.S.C. § 2: Transferring the hyperlink to stolen credit card account information from one IRC channel to his own (#ProjectPM), thereby making stolen information available to other persons without Stratfor or the card holders’ knowledge or consent; aiding and abetting in the trafficking of this stolen data.

Count Two: Access Device Fraud, 18 U.S.C. §§ 1029(a)(3) and (c)(1)(A)(i); Aid and Abet, 18 U.S.C. § 2: Aiding and abetting the possession of at least fifteen unauthorized access devices with intent to defraud by possessing card information without the card holders’ knowledge and authorization.

Counts Three Through Twelve: Aggravated Identity Theft, 18 U.S.C. § 1028A(a)(1); Aid and Abet, 18 U.S.C. § 2: Ten counts of aiding and abetting identity theft, for knowingly and without authorization transferring identification documents by transferring and possessing means of identifying ten individuals in Texas, Florida, and Arizona, in the form of their credit card numbers and the corresponding CVVs for authentication as well as personal addresses and other contact information.

While these particular indictments refer to credit card data, the laws do also reference authentication features. Two of the key points here are knowingly and with intent to defraud.

In the case of me releasing usernames and passwords, the intent here is certainly not to defraud, facilitate unauthorized access to a computer system, steal the identity of others, to aid any crime or to harm any individual or entity. The sole intent is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.

To ensure that these logins cannot be used for illegal purposes, I have:

  1. Limited identifying information by removing the domain portion from email addresses
  2. Combined data samples from thousands of global incidents from the last five years with other data mixed in going back an additional ten years so the accounts cannot be tied to any one company.
  3. Removed any keywords, such as company names, that might indicate the source of the login information.
  4. Manually reviewed much of the data to remove information that might be particularly linked to an individual
  5. Removed information that appeared to be a credit card or financial account number.
  6. Where possible, removed accounts belonging to employees of any government or military sources [Note: although I can identify government or military logins when they include full email addresses, sometimes these logins get posted without the domains, without mentioning the source, or aggregated on other lists and therefore it is impossible to know if I have removed all references.]

Furthermore, I believe these are primarily dead passwords, which cannot be defined as authentication features because dead passwords will not allow you to authenticate. The likelihood of any authentication information included still being valid is low and therefore this data is largely useless for illegal purposes. To my knowledge, these passwords are dead because:

  1. All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext (unhashed and unencrypted) format and therefore already widely available to those with an intent to defraud or gained unauthorized access to computer systems.
  2. The data has been publicly available long enough (up to ten years) for companies to reset passwords and notify users. In fact, I would consider any organization to be grossly negligent to be unaware of these leaks and still have not changed user passwords after these being publicly visible for such a long period of time.
  3. The data is collected by numerous web sites such as haveibeenpwned or pwnedlist and others where users can check and be notified if their own accounts have been compromised.
  4. Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users.
  5. A portion of users, either on their own or required by policy, change their passwords on a regular basis regardless of being aware of compromised login information.
  6. Many organizations, particularly in some industries, actively identify unusual login patterns and automatically disable accounts or notify account owners.

Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations. This data is extremely valuable for academic and research purposes and for furthering authentication security and this is why I have released it to the public domain.

Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me.

I could have released this data anonymously like everyone else does but why should I have to? I clearly have no criminal intent here. It is beyond all reason that any researcher, student, or journalist have to be afraid of law enforcement agencies that are supposed to be protecting us instead of trying to find ways to use the laws against us.

Slippery Slopes

For now the laws are on my side because there has to be intent to commit or facilitate a crime. However, the White House has proposed some disturbing changes to the Computer Fraud and Abuse act that will make things much worse. Of particular note is 18 U.S.C. § 1030. (a)(6):

(6) knowingly and with intent to defraud willfully traffics (as defined in section 1029) in any password or similar information, or any other means of access, knowing or having reason to know that a protected computer would be accessed or damaged without authorization in a manner prohibited by this section as the result of such trafficking;

The key change here is the removal of an intent to defraud and replacing it with willfully; it will be illegal to share this information as long as you have any reason to know someone else might use it for unauthorized computer access.

It is troublesome to consider the unintended consequences resulting from this small change. I wrote about something back in 2007 that I’d like to say again:

…it reminds me of IT security best practices. Based on experience and the lessons we have learned in the history of IT security, we have come up with some basic rules that, when followed, go a long way to preventing serious problems later.

So many of us security professionals have made recommendations to software companies about potential security threats and often the response is that they don’t see why that particular threat is a big deal. For example, a bug might reveal the physical path to a web content directory. The software company might just say “so what?” because they cannot see how that would result in a compromise. Unfortunately, many companies have learned “so what” the hard way.

The fact is that it doesn’t matter if you can see the threat or not, and it doesn’t matter if the flaw ever leads to a vulnerability. You just always follow the core rules and everything else seems to fall into place.

This principle equally applies to the laws of our country; we should never violate basic rights even if the consequences aren’t immediately evident. As serious leaks become more common, surely we can expect tougher laws. But these laws are also making it difficult for those of us who wish to improve security by studying actual data. For years we have fought increasingly restrictive laws but the government’s argument has always been that it would only affect criminals.

The problem is that it is that the laws themselves change the very definition of a criminal and put many innocent professionals at risk.

The Download Link

Again, this is stupid that I have to do this, but:


Torrent (84.7 mb): Magnet link

For more information on this data, please see this FAQ.

As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just Google it.

If you wish to discuss analysis of this data, you may do so at http://reddit.com/r/passwords
Related Articles

A Glimpse Into the World of Internet Password Dumps

Is 123456 Really The Most Common Password?



Is 123456 Really The Most Common Password?

2014 Top 10 Passwords

2014 Top 10 Passwords

I recently worked with SplashData to compile their 2014 Worst Passwords List and yes, 123456 tops the list. In the data set of 3.3 million passwords I used for SplashData, almost 20,000 of those were in fact 123456. But how often do you really see people using that, or the second most common password, password in real life? Are people still really that careless with their passwords?

While 123456 is indeed the most common password, that statistic is a bit misleading. Although 0.6% of all users on my list used that password, it’s important to remember that 99.4% of the users on my list didn’t use that password. What is noteworthy here is that while the top passwords are still the top passwords, the number of people using those passwords has dramatically decreased.

The fact is that the top passwords are always going to be the top passwords, it’s just that the percentage of users actually using those will–at least we hope–continually get smaller. This year, for example, a hacker using the top 10 password list would statistically be able to guess 16 out of 1000 passwords.

Getting a true picture of user passwords is surprisingly difficult. Even though password is #2 on the list, I don’t know if I have seen someone actually use that password for years. Part of the problem is how we collect and analyze password data. Because we typically can’t just go to some company and ask for all their user passwords, we have to go with the data that is available to us. And that data does have problems.

Anomalies are More Prominent

As we saw, user passwords are improving but as percentages of common passwords decrease, anomalies begin to float to the top. There was a time that I didn’t worry too much about minor flaws in the data because as my data set grew those tended to fall to the bottom of the list. Now, however, those anomalies are becoming a problem.

For example, when I first ran my stats for 2014, the password lonen0 ranked as #7 in the list. Looking through the data I saw that all of these passwords came from a single source, the Belgium company EASYPAY GROUP, which had their data leaked in November of 2014. Looking through the raw data it appears that lonen0 was a default password that 10% of their users failed to set to something stronger. It’s just 10% of users from one company but that was enough to push it to the #7 most common password in my data set.

In 2014, all it takes for a password to get on the top 1000 list is to be used by just 0.0044% of all users.

Single Sets of Data vs Aggregated Data

There are numerous variables that affect which passwords users choose and therefore many people like to analyze sets of passwords dumped from a single source. There are two problems with this: first, we don’t really know all the variables that determine how users choose passwords. Second, data is always skewed when you analyze a single company as we saw with EASYPAY GROUP. Another example is if you look at password dump from Adobe you will see that the word adobe appears in many of the passwords.

On the other hand, if we aggregate all the data from multiple dumps and analyze it together, we may get the wrong picture. Doing this gives us no control variables and we end up with passwords like 123456 on the top of the list. If we had enough aggregated data that wouldn’t be an issue, but what exactly is enough data?

Cracked Passwords are Crackable; Hacked Companies are Hackable

Since most of the data we are looking at comes from password leaks, it is possible that 123456 tops the list simple because it is the easiest password to crack. Perhaps some hacker checked tens of thousands of email accounts to see if the password was 123456 and dumped all positive matches on the internet. In fact, part of the reason I only analyzed 3.3 million passwords this year is due to a large number of mail.ru, yandex.com and other Russian accounts that had unusual passwords such as qwerty and other keyboard patterns. Here is the top 10 list including all the Russian email accounts:

1. qwerty
2. 123456
3. qwertyuiop
4. 123456789
5. password
6. 12345678
7. 12345
8. 111111
9. 1qaz2wsx
10. qwe123

While these are common passwords, the Russian data was highly skewed which made me suspect that these were either fake accounts or hacked by checking only certain passwords. So while 3.3 million passwords isn’t a huge dataset to analyze, it is a clean set of data that seems to accurately reflect results I have seen in the past.

The other problem is that when a company gets hacked, often it is because they have not properly secured their data. If they have poor security practices, this could affect password policies and user training which might result in poor quality passwords.

Unfortunately, we do not know to what extent crackable passwords and hackable companies affect the quality of the password data we have to analyze.

No Indication of Source

When we work with publicly leaked passwords, we often don’t know the source of the data. We don’t know if the passwords are from some corporation with strict password policies, or if they come from hacked adult sites where many users are choosing passwords such as boobies, or if they are hacked Minecraft accounts where a large chunk of the users are kids or teenagers. We don’t know if the data came from keyloggers or phishing or password hashes.

We also don’t know when users set these passwords. When Adobe had 150 million user accounts leaked, clearly those passwords were from accounts and passwords created years ago. We do know that users are slowly getting better with their passwords, but if we don’t know when they set these passwords it is impossible for us to gauge that progress.

These are all significant variables and therefore makes it impossible for us to get an accurate picture of which passwords people truly are using where and when.

No Indication of User Attitude

The source of the data also strongly affects user attitude towards security. Many users have several common passwords they use which typically includes a strong one for bank and other sensitive accounts and another one for casual or one-time-use accounts such as a flower company shopping cart. The data we have gives no indication of the users’ attitude when they selected their password.

In the fifteen years I have been collecting passwords I have seen just one of my passwords publicly leaked. It was in the Yahoo! Voices password dump. I actually remember setting this password, I was on my phone at the time and was I was researching possible ways to syndicate my writing. I set up various accounts on different sites I was checking out, all with the same password because I was on my phone and security for these sites wasn’t particularly a concern, this just being casual research.

My password was October38, a throwaway password I occasionally used around that time. Although it is a decent password, it doesn’t represent the type of password I normally use. None of my other passwords show up on public dumps and there is no indication how often I used this particular password and no indication how often I used it or how it compares to the rest of my passwords.

So where are the passwords coming from and how does this affect user attitude? Are they PayPal accounts or a quick login someone created to comment on a small web site? Are they computer accounts that you can’t manage with a password manager and therefore users must memorize? We just do not know for much of the data.

Bad Data

Finally, the biggest problem when dealing with public password dumps is that sometimes you just get bad data and sometimes good data is ruined through poor parsing or conversion. When dealing with tens of millions of passwords and hundreds of gigabytes of files, bad data will make its way in there, and it is usually hard to spot this data without a manual review. While I do manually glance over most data I include, it is impossible to catch everything.

Here are some examples of bad data that I sometimes catch manually but is extremely difficult to identify with my automated parsing scripts.

Yet another problem is that since my goal is to identify user-selected passwords, I need to be able to spot data that isn’t real user data. Here is an example of a dump where both the usernames and passwords follow an obvious pattern and are clearly machine-generated. I don’t want that type of data.

My parsing script performs dozens of checks on each username and password but ultimately I have to still manually review data and still don’t catch everything.

As you can see there are many variables that can affect the data and therefore we can’t truly say which passwords users have set for stuff that really matters. Nevertheless, the statistics are consistent and the same passwords show up on the top year after year. What it comes down to is if you are one of those people who are using 123456 or password, please stop.

Use a Password Manager

So what do we do about bad passwords? As I have said before, you need to use a password manager. When I analyzed passwords in my book Perfect Passwords in 2005 I published a list of the Top 500 Passwords. Later in 2011 I published a list of the top 10,000 passwords. Now with the latest analysis in 2014 we can clearly see that list really doesn’t change much over the years. But the threats are increasing much faster than we can keep up.

The only solution is to stop trying to create and remember your own passwords. You just can’t create strong, unique passwords for each account you have and keep it all in your head. You cannot consider yourself secure on the internet unless you are using some tool to manage your passwords. Password managers let you generate strong passwords and manage them in a central location, protected by a single strong password.

SplashData, who worked with my on this analysis is the developer of the password manager SplashID Safe. Other password managers I user or have tested include LastPass, KeePass1Password, and Dashlane. I would recommend any of these products.

And because I know I will be asked, the following articles will be coming soon:

The New Top 10,000 Password List

How I Collect and Process Passwords




Today is Password Day, Go Change Five Passwords Now

passdayI have always been a big fan of password days. While it is always important to regularly change your passwords, there is a specific benefit to changing a large number of passwords all at once. To understand why this is so effective, it is important to understand how hackers work.

Security intrusions are typically the result of a chain of failures. It’s usually not one big mistake that lets the hackers in, it is a series of smaller mistakes that eventually lead to compromise. Furthermore, the intrusion is rarely full admin access with the first exploit, it is an incremental process where each step leads to getting deeper and deeper into the target. The process involves compromising passwords for numerous accounts along the way.

Here’s the problem, changing one password or patching one hole doesn’t rarely locks out the hacker because at that point they probably have collected a good list of passwords and have many ways to get back in if needed. In fact, if someone is at the point where they have access to to several key accounts on several key servers, the chances of locking them out completely are pretty slim. Often the only effective way to lock them out is through a massive undertaking involving patching all holes and changing all account passwords in as small amount of time possible.

91% of all user passwords sampled all appear on the list of just the top 1,000 passwords.

While organizations with many servers or a large number of employees have a huge attack surface that will require a massive effort in the event of an intrusion, individuals users have a little more of an advantage in that we only need to change a dozen or so passwords.

At least that used to be the case. Nowadays, it is not uncommon for a heavy internet users to have hundreds or even over a thousand accounts. Clearly, changing all your accounts in a single day is nearly impossible for many people.

My strategy now is to take one day each month (I do the last Saturday of the month) and go through and change 5-10 passwords. While I’m at it, I go through and check the privacy and security settings for the account and add two-factor authentication if it is available.

So today, being World Password Day, now would be a good time to go and change some of your passwords. Here are a few links for some common web sites to get you started:

Change Password | Account Settings | Review App Authorizations | Activate Hardware Token

Change Password | Privacy Settings | Enable Two Factor Auth via SMS

Change Password | Enable Two Factor Auth | Review App Authorizations

Change Password | Enable Two Factor Auth | Review App Authorizations

Change Password & Enable Two Factor Auth 

The single most effective tool users have is a password manager such as LastPass or KeePass. A password manager gives you a list of all your accounts and usually shows the age of each password. Password managers also include password generators to create strong, unique passwords for each site.

The Pathetic Reality of Adobe Password Hints

AdobeThe leak of 150 million Adobe passwords in October this year is perhaps the most epic security leak we have ever seen. It was huge. Not just because of the sheer volume of passwords, but also because it’s such a large dump from a single site, allowing for a much better analysis than earlier sets. But there’s something unique about the Adobe dump that makes it even more insightful–the fact that there are about 44 million password hints included in this dump. Even though we still haven’t decrypted the passwords, the data is extremely useful

One thing I have pondered over the years in analyzing passwords is trying to figure out *what* the password is. I can determine if the password contains a noun or a common name, but I can’t always determine what that noun or name means to the user.

For example, if the password is Fred2000, is that a dog’s name and a date? An uncle and his anniversary? The user’s own name and the year they set up the account? Once we know the significance of a password we gain a huge insight into how users select passwords. But I have never been able to come up with a method to even remotely measure this factor. Then came the Adobe dump.

The sheer amount of data in the Adobe dump makes it a bit overwhelming and somewhat difficult to work with. But if you remove the least common and least useful hints the data becomes a bit more manageable. Using a trimmed down set of about 10 million passwords, I was able to better work with the data to come up with some interesting insights.

Just glancing at the top one hundred hints, several patterns immediately become clear. In fact, what we learn is that a large percentage of the passwords are the name of a person, the name of a pet, the name of a place, or an important date.

Take dates for example. Consider the following list of top date-related hints:

Hint Total Note
birthday 29425
bday 17697
date 15272
birth 14956
DOB 13109
niver 9484 Spanish: Anniversary (short for aniversario)
fecha 8899 Spanish: Date
naissance 7892 French: Birth
anniversary 6959

In all, there are about 420,000 passwords with a date-related hint which represents about 3.6% of the passwords in the working set.

We see similar trends with dog names which account for 375,000 passwords or 3.2% of the total (plus another 120,00 that mention “pet”):

Hint Total Note
dog 70550
dogs name 13559
my dog 9780
dog’s name 8191
dog name 8187
perro 8000 Spanish
hund 7185 German, Danish, Swedish,  Norwegian
first dog 5653
chien 5542 French
doggy 5184

One interesting insight offered here is something we already know but find difficult to measure: password reuse. Surely a large percentage of these users have the same password across multiple sites, but it is interesting to see that about 361,000 users (or 3.11%) state this fact in their password hints:

Hint Total Note
same 44565
password 14634
always 13329
la de siempre 8559 Spanish: as always or the usual
same as always 8289
usual password 5277
same old 5111
siempre 4163 Spanish: always
normal password 3898
my password 3022

Keep in mind that these are just those passwords that admit to reuse in the hint. The number of passwords actually in use across multiple sites certainly is much greater than this.

Looking at the three lists above, we see that nearly 10% of the passwords fall into just these 3 categories. Adding names of people and places will likely account for 10% more.

So what did we learn by analyzing these hints?  First, that you should never use password hints. If users forget their password, they should use the password reset process. Second, that decades of user education has completely failed. No matter how much we advise not to use dates, family names or pet names in your passwords and no matter how much we tell people not to use the same passwords on multiple sites, you people will just do it anyway.

This is why we can’t have nice password policies.



9 Ways to Restrain the NSA

Keith Alexander

With U.S. Government surveillance being a hot news issue lately, several members of Congress have stepped up and started working on bills to place limits on NSA powers. Although these are admirable attempts, most proposals likely won’t have much affect on NSA operations. So of course I thought I’d propose some points that I think at a minimum any surveillance bill should cover.

1. No backdoors or deliberate weakening of security

The single most damaging aspect of recent NSA revelations is that they have deliberately weakened cryptography and caused companies to bypass their own security measures. If we can’t trust the security of our own products, everything falls apart. Although this has had the side-effect of causing the internet community to fill that void, we still need to trust basic foundations such as crypto algorithms.

Approaching a company to even suggest they weaken security should be a crime.

Related issue: the mass collecting of 0-day exploits. I have mixed feelings on how to limit this, but we at least need limits. The fact is that government law enforcement and military organizations are sitting on tens of thousands of security flaws that put us all at risk. Rather than reporting these flaws to vendors to get them fixed and make us all secure, they set these flaws aside for years waiting for the opportunity to exploit them. There are many real threats we all face out there and it is absurd to think that others can’t discover these same flaws to exploit us. By sitting on 0-days, our own government is treating us all as their personal cyberwar pawns.

2. Create rules for collection as well as searches

We saw how the NSA exploited semantics to get away with gathering personal records and not actually calling it a search. They got away with it once, we should never allow that excuse again. Any new laws should clearly define both searches and collection and have strict laws that apply to both.

3. Clear definition of national security

Since the Patriot Act, law enforcement agencies have stretched and abused the definitions of national security and terrorism so much that almost anything can fall under those terms. National security should only refer to imminent or credible domestic threats from foreign entities. Drug trafficking is not terrorism. Hacking a school computer is not terrorism. Copyright infringement is not terrorism.

4. No open-ended gag orders

Gag orders make sense for ongoing investigations or perhaps to protect techniques used in other investigations but there has to be a limit. Once an investigation is over, there is no valid reason to indefinitely prevent someone from revealing basic facts about court orders. That is, there’s no reason to hide this fact unless your investigations are perhaps stretching the laws.

5. No lying to Congress or the courts

“There is only one way to ensure compliance to laws: strong whistleblower protection. We need insiders to let us know when the NSA or other agencies make a habit of letting the rules slide.”

It’s disconcerting that I would even need to say this, but giving false information to protect classified information should be a crime. The NSA can simply decline to answer certain questions like everyone else does when it comes to sensitive information. Or there’s always the 5th amendment if the answer to a question would implicate them in a crime.

6. Indirect association is not justification

Including direct contacts in surveillance may be justified, but including friends of friends of friends is really pushing it and includes just about everyone. So there’s that.

7. No using loopholes

The NSA is not supposed to be spying on Americans but they can legally spy on other countries. The same goes for other countries, they can spy on the US. If the NSA needs info on Americans, they can just go to their spying partners to bypass any legal restrictions. Spying on Americans must include getting information from spy partners.

And speaking of loopholes, many of the surveillance abuses we have seen recently are due to loopholes or creative interpretation of the laws. Allowing the Government to keep these interpretations secret is setting the system up for abuse. We need transparency for loopholes and creative interpretations.

8. No forcing companies to lie

Again, do I even have to say this? The NSA and FBI will ultimately destroy the credibility of US companies unless the law specifically states that people like Mark Zuckerberg can’t come out and say they don’t give secret access to the US government.

9. Strong whistleblower immunity

We saw how self regulation, court supervision, and congressional oversight has overwhelmingly failed to protect us from law enforcement abuses. There is only one way to ensure compliance to laws: strong whistleblower protection. We need insiders to let us know when the NSA or other agencies make a habit of letting the rules slide.

Whistleblowers need non-governmental and anonymous third party protection. We need to exempt these whistleblowers from prosecution and provide them legal yet powerful alternatives to going public. You’d think that even the NSA would prefer fighting this battle in a court over having to face leaks of highly confidential documents. In fact, I think that the only reason to oppose these laws if you actually have something to hide. The NSA’s fear of transparency should be a blaring alarm that something is horribly wrong.

The NSA thinks that public response has been unfair and will severely limit their ability to protect us. What they don’t seem to understand is the reasons we have these limits in the first place. When the NSA can only focus on foreign threats, they have no interest in domestic law enforcement. Suspicionless spying is incompatible with domestic law enforcement and justice systems.

The greatest concern, however, is the unchecked executive and military power. The fact that there has been so much for Snowden to reveal demonstrates the level of abuse. Unfortunately, the capabilities are already in place so even legal limits are largely superficial and self-enforced. It would be trivial to ignore those laws in a national security emergency.

I cringe at the thought of becoming one of those people warning others to be afraid, but that is why we put limits on the government, so we know we don’t ever have to be afraid. We solve the little problems now so we don’t have to face the big problems later. We understand the need for surveillance, we just need to know when the cameras point at us.



Fingerprints and Passwords: A Guide for Non-Security Experts

iphoneToday Apple announced that the iPhone 5S will have a fingerprint scanner. Many of us in the security community are highly sceptical of this feature, while others saw this as a smart security move. Then of course there are the journalists who see fingerprints as the ultimate password killer. Clearly there is some disagreement here. I thought I’d lay this out for those of you who need to better understand the implications of using fingerprints vs or in addition to passwords.

Biometrics, like usernames and passwords, are a way to identify and authenticate yourself to a system. We all know that passwords can be weak and difficult to manage, which makes it tempting to call every new authentication product a password killer. But despite their flaws, passwords must always play some role in authentication.

The fact is that while passwords do have their flaws, they also have their strengths. The same is true with biometrics. You can’t just replace passwords with fingerprints and say you’ve solved the problem because you have introduced a few new problems.

To clarify this, below is a table that compares the characteristics of biometrics vs passwords, with check marks where one method has a clear advantage:

Passwords Biometrics
Difficult to remember Don’t have to remember 
Requires unique passwords for each system Can be used on every system 
Nothing else to carry around Nothing else to carry around
Take time to type Easy to swipe/sense 
Prone to typing errors Prone to sensor or algorithm errors
Immune to false positives  Susceptible to false positives
Easy to enroll  Some effort to enroll
Easy to change  Impossible to change
Can be shared among users 1  Cannot be shared 
Can be used without your knowledge Less likely to be used without your knowledge 
Cheap to implement  Requires hardware sensors
Work anywhere including browsers & mobile  Require separate implementation
Mature security practice  Still evolving
Non-proprietary  Proprietary
Susceptible to physical observation Susceptible to public observation
Susceptible to brute force attacks Resistant to brute force attacks 
Can be stored as hashes by untrusted third party  Third party must have access to raw data
Cannot personally identify you  Could identify you in the real world
Allow for multiple accounts  Cannot use to create multiple accounts
Can be forgotten; password dies with a person Susceptible to injuries, aging, and death
Susceptible to replay attacks Susceptible to replay attacks
Susceptible to weak implementations Susceptible to weak implementations
Not universally accessible to everyone Not universally accessible to everyone
Susceptible to poor user security practices Not susceptible to poor practices 
Lacks non-repudiation Moderate non-repudiation 
1 Can be both a strength and a weakness


What Does This Tell Us?

As you can see, biometrics clearly are not the best replacement for passwords, which is why so many security experts cringe when every biometrics company in their press releases claim themselves as the ultimate password killer. Biometrics do have some clear advantages over passwords, but they also have numerous disadvantages; they both can be weak and yet each can be strong, depending on the situation. Now the list above is not weighted–certainly some of the items are more important than others–but the point here is that you can’t simply compare passwords to biometrics and say that one is better than the other.

However, one thing you can say is that when you use passwords together with biometrics, you have something that is significantly stronger than either of the two alone. This is because you get the advantages of both techniques and only a few of the disadvantages. For example, we all know that you can’t change your fingerprint if compromised, but pair it with a password and you can change that password. Using these two together is referred to as two-factor authentication: something you know plus something you are.

It’s not clear, however, if the Apple implementation will allow for you to use both a fingerprint and password (or PIN) together.

Now specifically talking about the iPhone’s implementation of a fingerprint sensor, there are some interesting points to note. First, including it on the phone makes up for some of the usual biometric disadvantages such as enrollment, having special hardware sensors, and privacy issues due to only storing that data locally. Another interesting fact is that the phone itself is actually a third factor of authentication: something you possess. When combined with the other two factors it becomes an extremely reliable form of identification for use with other systems. A compromise would require being in physical possession of your phone, having your fingerprint, and knowing your PIN.

Ultimately, the security of the fingerprint scanner largely depends on the implementation, but even if it isn’t perfect, it is better than those millions of phones with no protection at all.

There is the issue of security that some have brought up: is this just a method for the NSA to build a master fingerprint database? Apple’s implementation encrypts and stores fingerprint locally using trusted hardware. Whether this is actually secure remains to be seen, but keep in mind that your fingerprints aren’t really that private: you literally leave them on everything you touch.



8 Ways to Prepare for CSP

Cross-Site Scripting (XSS) is a critical threat that, despite widespread training, still plagues a large number of web sites. Preventing XSS attacks can get complicated but even a small effort can go a long way–a small effort that nevertheless seems to evade us. Still, developers are getting better at input filtering and output escaping which means we are at least headed in the right direction.

Handling input and output aren’t the only strategies available to us. Content Security Policy (CSP) is a HTTP response header that–when correctly implemented–significantly reduces exposure to XSS attacks. CSP is exactly what it’s name implies: a security policy for your web content.

CSP not only allows you to whitelist browser content features on a per-resource basis, but also lets you whitelist those features on a per-host basis.

CSP not only allows you to whitelist browser content features on a per-resource basis, but also lets you whitelist those features on a per-host basis. For example, you might tell the browser it can load scripts for a page, but only if they come from a specific directory on your own web server. CSP allows you to set restrictions on scripts, styles, images, XHR, WebSocket, EventSource, fonts, embedded objects, audio, video, and frames.

One powerful feature of CSP is that by default it blocks inline scripts, inline styles, the eval() function, and data: URI schemes–all common XSS vectors. The only problem is that’s where it starts breaking existing code and could be a major obstacle to its widespread adoption. This is an all-too-common problem with frameworks, code libraries, plugins, and open source applications. If you write code that many other people use and don’t start getting it ready for CSP, you kind of hold us all back.  CSP does allow you to re-enable blocked features but that defeats the purpose of implementing content security policies.

So getting to my point, here are some things developers can do to their code to at least get ready for CSP:

  1. Remove inline scripts and styles. Surely you already know that it’s a good practice to separate code from presentation, now’s a good time for us all to stop being lazy and separating our code.
  2. Ditch the eval() function. Another thing we’ve all known to avoid for quite some time, but it still seems to show up. Most importantly, if you are working with JSON, make sure you parse it instead of eval’ing it. It’s rare to find a situation where there are secure alternatives to eval, you’d might be surprised how creative you can be.
  3. Don’t rely on data: schemes. Most often used for embedding icons into your code, data: URI’s are a powerful XSS vector. The problem here isn’t using them yourself which normally is safe, it’s that attackers might use them so the best solution is to disable them altogether. On pages that don’t work with user input in any form, you are probably safe to keep data: URI’s enabled.
  4. Create an organized, isolated directory structure. Scripts with scripts, images with images. Keeping content separate makes fine-grained CSP policies so much easier.
  5. Document features needed for each file. A good place to document features required for each file is in the source code itself, such as in a PHPDoc comment. By doing this, when you implement CSP you can start with the most restrictive policy and add only necessary features.
  6. Centralize your HTTP response headers code. A centralized function to send required headers makes it easier to keep track of it all and avoid hard-to-debug errors later.
  7. Eliminate unnecessary and superfluous scripts. It’s sometimes hard to give up cool features for security, but good discipline here can pay off. This is a business decision based on your threat model, but it’s always a good question to ask when adding new stuff.
  8. Mention it whenever possible. Yes, you should be that person who talks about CSP; so much that people simply stop inviting you to meetings.

And if you aren’t quite sure about how CSP works, here is some recommended reading:

An Introduction to Content Security Policy

Preventing XSS with Content Security Policy Presentation

Content Security Policy 1.0 Spec

Browser Support


CSP Playground

CSP Readiness



So What Exactly Did The US Government Ask Lavabit to Do?

The recent shutdown of Lavabit’s email services prompted a flurry of reporting and speculation about the extent US Government spying, mostly due to the mysterious statement by Lavabit founder Ladar Levison:

Most of us saw this as yet another possibly overhyped government spying issue and didn’t really think too much of it. Much of the media coverage is already starting to die down but there still is some question as to exactly what the government required of Levison that left him with only one option: shutting down his entire business he built from ground up. I wondered if there were enough clues out there to get some more insight into this case. I started by looking at exactly what Lavabit offered and how that all worked behind the scenes.

Lavabit Encryption

Lavabit claimed they had “developed a system so secure that it prevents everyone, including us, from reading the e-mail of the people that use it. ” This is a bold claim and one that surely was a primary selling point for their services.

The way it worked is relatively simple: Lavabit encrypted all incoming mail with the user’s public key before storing the message on their servers. Only the user, with the private key and password could decrypt messages. Normally with encrypted email, users store private keys on their own computers, but it appears that in the case of Lavabit, they stored the users’ private keys, each encrypted with a hash of that user’s password. This is by no means the most secure way of doing this, but it dramatically increases transparency and usability for the user. By doing this, for example, users do not need to worry about private keys and they still have access to their email from any computer.

So let’s break this down: a user logs in with their password. This login might occur via POP3, IMAP4, or through the web interface (which in turn connected internally via IMAP). Because Lavabit used the user’s password to encrypt the private key, they will need the original plaintext password which means they would not be able to support any secure authentication methods. In other words, all clients must send passwords using AUTH PLAIN or AUTH LOGIN with nothing more than base64 encoding. The webmail interface appears to have been available as both SSL and non-SSL and the POP3, IMAP4, and SMTP interfaces all seem to have accepted connections with or without SSL. All SSL connections terminated at the application tier.

Once a user sends a password, the Lavabit servers create SHA-512 hashes explained as follows:

… Lavabit combines the password with the account name and a cryptographic salt. This combined string is then hashed three consecutive times, with the former iteration’s output being used as the input value of the next iteration. The output of the first hash iteration is used as the secret passphrase for AES [encryption of the private key]. The third iteration is stored in our password database and is used to verify that users entered their password correctly.

The process they describe produces two hashes: one for decrypting the user’s private key and after two more hashing iterations, a hash to store in the database for user authentication. While this is a fairly secure process, given strong user passwords, it does weaken Lavabit’s claim that even their administrators couldn’t read your email. In reality all it would take is a few lines of code code to log the user’s original password which allows you to decrypt the private key which in turn allows you to receive and send mail as that user as well as access any stored messages.

The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security.

It is important to note that the scope of Lavabit’s encryption was limited to storage on it’s own servers. The public keys were for internal use and not something you published for others to use. Full protection would require employing PGP or S/MIME and having untapped SSL connections between all intermediate servers. On the other hand, if an email was sent through Lavabit already using PGP or S/MIME encryption, they would never be able to intercept or read those emails.

The question here is what exactly did the government request Levison to do that was so bad that he’d rather shut down his entire business? What information could Lavabit even produce that would be of interest to a government agency? Unencrypted emails, customer IP addresses, customer payment methods, and customer passwords. Based on media statements, it appears that he would be required to provide unencrypted copies of all emails going through his system.

Let’s look at some quotes levison has given to various media outlets. First, here are some quotes from an interview with CNET:

“We’ve had a couple of dozen court orders served to us over the past 10 years, but they’ve never crossed the line.”

“Philosophically, I put myself in a position that I was comfortable turning over the information that I had. I built Lavabit in a reaction to the original Patriot Act.”

“Where the government would hypothetically cross the line is to violate the privacy of all of my users. This is not about protecting a single person or persons, it’s about protecting all my users. What level of access to this nation does the government have?”

“Why should I collect that info if I didn’t need it? [That philosophy] also governed what kind of information I logged.”

“Unfortunately, what’s become clear is that there’s no protections in our current body of law to keep the government from compelling us to provide the information necessary to decrypt those communications in secret.”

“If you knew what I know about e-mail, you might not use it either.”

In an article from NBC News, we have this:

Levison stressed that he has complied with “upwards of two dozen court orders” for information in the past that were targeted at “specific users” and that “I never had a problem with that.” But without disclosing details, he suggested that the order he received more recently was markedly different, requiring him to cooperate in broadly based surveillance that would scoop up information about all the users of his service. He likened the demands to a requirement to install a tap on his telephone. Those demands apparently began about the time that Snowden surfaced as one of his customers, apparently triggering a secret legal battle between Levison and federal prosecutors.

And finally in an interview with RT he said:

I think the amount of information that they’re collecting on people that they have no right to collect information on is the most alarming thing,” he told RT. “I mean, the Fourth Amendment is supposed to guarantee that our government will only conduct surveillance on people in which it has a probable suspicion or evidence that they are committing some crime, and that that evidence has been reviewed by a judge and signed off by a judge before that surveillance begins. And if there’s anything alarming, it’s that now that’s all being done after the fact. Everything’s being recorded, and then a judge can after the fact say it’s okay to go look at the information.

Given the above information, let’s analyze some of the facts we know:

  • The government asked Lavabit to do something which levison considered to be a crime against the American people.
  • Levison was comfortable and had complied with warrants requesting information on specific users.
  • Levison told Forbes that “This is about protecting all of our users, not just one in particular.”
  • Levison is not even able to reveal some details with his own attorney or employees.
  • Shutting down operations was an option to circumspect compliance, although there was a veiled threat he could be arrested for doing so.
  • He did not delete customer data, he still has that in his possession so this was a request for ongoing surveillance.
  • This was a court order, which levison is fighting through the US Court of Appeals for the Fourth Circuit.
  • Levison compared the request to installing a tap on his telephone.

Apparently what made Levison uncomfortable with the request was that the fact that it collected information about all users, without regards to a warrant. Presumably law enforcement wanted to collect all data that they would later retroactively view as necessary once they had a warrant. The two issues here are that the Government wanted to collect information on innocent users (including Levison himself) and Levison would be out of the loop completely, taking away his control over what information he provided to law enforcement. These were the lines the Government crossed.

What’s interesting here is that Lavabit terminated the SSL connections right on the application servers themselves. These are the servers that also performed the encryption of email messages. Because of that, a regular network tap would be ineffective. The only way to perform the broad surveillance Levinson objected to would be (in order of likelihood) :

  1. Force Lavabit to provide their private SSL keys and route all their traffic through a government machine that performed a man-in-the-middle style data collection;
  2. Change their software to subvert Lavabit’s own security measures and log emails after SSL decryption but before encrypting with the users’ public keys; or
  3. Require Lavabit to install malicious code to infect their own customers with government-supplied malware.

Sure, this could have been a simple request to put a black box on Lavabit’s network and Levinson is just overreacting, but the evidence doesn’t seem to indicate that. Regardless of which of the requests the Government made, they would all make Levison’s entire business a lie; all efforts to encrypt messages would be pointless. Surely there were some heated words spoken when the Department of Justice heard about Levison’s decision, but this is not an act of civil disobedience on Levison’s part, his personal integrity was on the line. Compliance would make his very reason for running Lavabit a deception; a government-sponsored fraud.

While Lavabit initially had quite a bit of media coverage over this issue, the hype seems to be a casualty of our frenzied newscycle. But after looking closely at the facts here, I now see that this is a monumentally important issue, one that the media needs to once again address. The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security. They can say what they want about security on their web sites, it means nothing. If they did it to Lavabit, how many hundreds or thousands of other US companies already participate in this deception?

If the courts can force a business to lie, we can never again trust the security claims of any US company. The reason so many businesses specifically rely on US services is the sense of stability and trust. How sad that an overreaching and panicked pursuit of a whistleblower has thrown that all away.

This issue is so much more than a simple civil liberties dispute, it is the integrity of a nation at stake. We walked with the devil in a time of need–that is a legacy we must live with–but at what point do we sever that relationship and return to the integrity required to lead the world through respect and not by fear?





UPDATE: Since publishing this post, this Wired article has since revealed that in fact Lavabit was required to supply their private SSL keys as suspected above.


Should You Ditch LastPass?

LastPassSteve Thomas, aka Sc00bz, has brought up some very interesting issues about the LastPass password monitor that are causing some confusion so I thought I’d give another perspective on the issue.

Summary of Steve’s points:

  1. When you use the LastPass web site to login to your account, your web browser will first send a hash with a single iteration, no matter how many iterations you have set for your account. It isn’t until this hash fails that the browser tells the user the correct number of iterations to use.
  2. LastPass has a default setting of 500 iterations (at least at that time, now it recommends 5000 iterations).
  3. The extension should warn you if it is going to send a hash with fewer iterations than what you have set.
  4. LastPass does not encrypt the URLs of sites stored in your password database

LastPass hashes your password rather than sending the plain text to the server when you login. The algorithm it uses is sha256(sha256(email + password) + password). This hash, while not necessarily insecure, can be cracked in a reasonable amount of time with ordinary hardware, unless the user has a relatively strong password. It isn’t until after this single iteration hash is sent that the LastPass server responds and tells the browser exactly how many iterations it should use; hash is sent again using the correct number of iterations. More iterations means it will take much more time to crack your password. A good minimum number of iterations is 5,000. If you go too high with the number of iterations, some clients such as mobile phones may be very slow logging in.

This is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass

The mitigating factors here are:

  1. You are logging in via SSL so the primary threats here are a MitM attack with spoofed SSL certificates, a government warrant, or a government spy agency.
  2. They still need to crack your hash so if you have a very strong password, even a single iteration hash could provide a reasonable amount of protection.
  3. A second factor of authentication, country restrictions, blocking tor logins,  restricting mobile access, and other settings still protect your account from unauthorized logins, unless the attacker is able to obtain your stored hashes through hacking, warrant, or spying.

One thing I might also add is that the server is telling the client how many iterations it expects, so this does make an attack much easier if someone acquires your hash.

My opinion is that this is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass altogether unless your inpidual threat model includes the NSA or other government agencies. The LastPass plugin should identify when someone is logging in to LastPass via the web login and provide the client-side script with the correct number of iterations. The server should never respond with this at all. However, if someone is logging in through a web browser that doesn’t have LastPass with their account data installed, the server sending the number of iterations is the only option.

The only proper solution here is to have your primary login different than the decryption login, at least for accessing the web interface if not everywhere. That way, the number of iterations is never publicly revealed and sending a single iteration hash would be unnecessary. Other companies such as RoboForm use this method. I have always wanted this feature and I would highly recommend LastPass implement this if it is feasible.

As for the other points, the default iteration count mentioned in number 2 has been addressed and the warning mentioned in number 3 would be a good thing to add if it already hasn’t but this would not be possible if using a web browser with your LastPass account installation.


As for encrypting URLs mentioned in number 4, LastPass’s response was that this is necessary to grab favicons. Although unencrypted URLs may not be an issue, there certainly are scenarios where you would want these encrypted. LastPass should make this an option for the user.

LastPass does provide strong security controls, although there clearly is room for improvement. If you do not find LastPass to be secure enough, the only reasonable alternative I would recommend is KeePass, which puts you in complete control over your data while still being quite usable. I would not recommend ditching LastPass, but I would recommend that LastPass address these issues. I would also recommend that Steve Thomas keep up the great research he provides to the community.

I have not heard a recent response from LastPass on these issues but would love to hear from them. I will update this post if and when I do.

Disclosure: I am a LastPass user and I get a free month of premium service whenever someone clicks on banners located on this site. I have no affiliation with LastPass and receive no other compensation from the company.