Password Stock Photos

I’m getting a little tired of seeing the same stock photos over and over on passwords articles so I thought I’d share a set I had made to use on my blog that are at least a little different.

If you find these photos useful they are available for use royalty free with the following options:

Non-Commercial Commercial
You may use these photos for non-commercial uses with attribution to Mark Burnett (xato.net). You are also welcome to donate! You may use these photos for commercial purposes with a payment of $10 per photo per use. Your payment receipt is your license.. Attribution to Mark Burnett (xato.net) is optional. For resell use please contact me.

 

The base word clouds in these photos are the top passwords from my 10 million passwords list (made SFW) and were made using Processing with the WordCram library (and Photoshop to fill in the gaps)

Click on the thumbnails below to view/download the full resolution photos.

 

Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo

 

 

A Glimpse Into the World of Internet Password Dumps

Reading the comments around the internet about my release of 10 million passwords, I realized that perhaps some people don’t quite grasp how bad the situation really is. It’s really bad. The target audience of my original article was IT security professionals and network administrators who see this stuff on a daily basis, but the news of my data release has reached far beyond that audience which has brought to my attention some misunderstanding of the context of my data release. I thought maybe it would be helpful for people to get a glimpse into what I see as I collect passwords.

My main source for passwords in the last few years has been Pastebin and similar sites. Pastebin is a web site where you can paste text data to share with others. You can also do so anonymously. There are twitter bots and web sites that monitor new pastes and look for hackers leaking—or dumping—sensitive data they have stolen. On a typical day it is common to see around a hundred of these leaks, about half of those contain both usernames and unencrypted passwords—often referred to as combos—that I collect.

Dump Monitor Twitter Bot

Dump Monitor Twitter Bot

 

Because the data may show up in various formats, I have to parse it which I do with a tool I wrote named Hurl. This tool recognizes many different dump formats and parses out the usernames and passwords. Here is an example of the types of formats it recognizes.

What’s interesting about Pastebin is the number of scrapers there really are out there. Within less than a minute of posting this paste there were 81 views. After a few more minutes there were 173 views as shown below. As you can see there are more than a few people monitoring this stuff. Want to set up your own scraper? The Dumpmon source code is available although I personally prefer Pystemon.

Pastebin Scraper Views

Pastebin Scraper Views

Pastebin scrapers only catch pastes when they include the actual data in the paste. Sometimes it is just a link to another file so it is important to monitor links as well. Further, by taking those pastes and seeing who links to them you can find sources, such as twitter accounts, that announce these things. I monitor those as well.

After Pastebin there are several sites I keep up with that post leaks and stolen databases. Below is a screenshot of one of these sites.

Database Dumps

Database Dumps

I then take the names of those files and set up google alerts (and sometimes Pastebin alerts) for them. This often leads me to file collections such as the two below:

A collection of password files

A collection of password files

A collection of password files

Another collection of password files

I also alert on combos that certain hackers frequently use to create accounts such as Cucum01:Ber02, zolushka:natasha, and many others. These combos are so common in password lists they always lead to more passwords. Take a look at this Google search and you’ll see how prevalent these are, the alerts keep my inbox full.

Furthermore, there are hundreds of forums that share passwords. Perhaps a few screenshots are the best way to see just how many passwords people are sharing:

forum7

forum5

forum4

forum3

forum2

forum1

Password Sharing Forum

Of course, this is only a small sampling of forums. There are more than any one person could ever monitor.

There are also hundreds of thousands of web sites that share hacked passwords for gaming, video, porn, and file sharing sites. These don’t always produce the best quality passwords, but I do have scripts to scrape a number of these sites. In a single day those scripts can produce well over a million passwords.

If you were shocked by my releasing password data, take an hour exploring the internet and you will see that 10 million passwords really is a drop in a bucket, even a drop in a thousand buckets. Keep in mind that a big part of the effort in producing my data was getting it all the way down to 10 million in a balanced manner (I couldn’t just remove millions from the end of the file). It took me about three weeks to whittle down and then sanitize the data.

What I have shown here is only a small number of sources available out there. Most of the forums listed above provide “VIP” access for a monthly payment. If you want to spend a little money you have access to tens of millions more passwords than the freebies shared publicly. There are also IRC channels, Usenet groups, torrents, file sharing sites, and of course a number of hidden sources on Tor.

Now not all of these passwords are plaintext. Many dumps include passwords in a hashed format that requires you to crack them yourself. But that’s no problem, there are tools such as Hashcat and John the Ripper as well as wordlists out there that make this a trivial task.

If this isn’t already overwhelming, keep in mind that this is just the stuff that certain hackers have decided to make public. Surely the troves of accounts that have been hacked over the years completely dwarf what has been publicly shared. There could be billions, or tens of billions more accounts that have been hacked. If you are worried that the data I released contains your password, you still aren’t worried enough. There is a very good chance your passwords have been hacked, go change them.

So who besides me collects these passwords? Use your imagination.

 

 

 

Ten Million Passwords FAQ

Common PasswordsIn response to my recent release of 10 million passwords, I thought I would address some of the questions I am getting.

Where are the passwords from?

These are old passwords that have already been released to the public; none of these passwords are new leaks. They all are or were at one time completely available to anyone in an uncracked format. I have not included passwords that required cracking, payment, exclusive forum access, or anything else not available to the general public. You should still be able to find a large number of these passwords via a Google search.

The passwords were compiled by taking samples from thousands of password dumps, mostly from the last five years although it also includes much older data. I wanted to mix data from multiple sources to normalize inconsistencies and skewed data due to the type of web site, it’s users, and it’s security policies. (see this article for problems with password data)

The size of the samples from each site were determined by the data itself. Since the top 100 passwords have been very consistent over the last 20 years, I was able to use that to determine the quality of the source data. Some dumps contained so much bad data that I had to limit how much of it I included.

What is bad data?

Here are some examples of bad password data: http://pastebin.com/v6HCVDHN

How was the data collected?

I have been collecting passwords for about 15 years. In the past I have used a number of scripts to scrape the web, forums, IRC, Usenet, and P2P sources to get even 1,000 new passwords per day. In fact, it took me almost 10 years to collect just 6 million unique username/password combos (and at the time I thought that was huge).

However, in the last 5 years things have changed tremendously. I am now able to manually collect 10-20 million unique passwords per year simply from paste sites and forums. There have also been a number of very large password dumps with tens of millions of passwords in a single dump. Anyone could easily gather several hundreds of millions of passwords without much effort.

Why did you release this data?

The primary purpose is to get good, clean, and consistent data out in the world so others can find new ways to explore and gain knowledge from it. The data isn’t perfect and there are a few anomalies, but it should provide good insight into user password selection.

Really, why did you release this data?

I’m a bit obsessed with passwords.

Won’t this help hackers?

If a hacker needs this list to hack someone, they probably aren’t much of a threat.

What should I do if my password is on the list?

If your password is on this list that means it has already been publicly available for some time. You should change your password and enable two-factor authentication if available. Several of my own passwords are on the list as well, I left them there because they are already many places on the web.

What if my password is not on the list?

It doesn’t mean you are safe. This is a tiny sample of the hundreds of millions of accounts that have been publicly dumped and doesn’t even include the hundreds of millions more that have never been made public.

Is this unethical to release these passwords?

Although I have justified the release of these passwords, I have to admit it is at least close to the line. I have considered releasing this data for a number of years and have put much thought into the ethics involved; it is not something I take lightly. I could have replaced all the usernames with random numbers or hashes, but I felt like the usernames just had to be included. I did make sure to remove domain names from email addresses and other identifiers so that they couldn’t be directly linked to specific accounts. I also aggregated data from many sources so that this data could not be used to target any particular site. The thing to remember here though is that I am not releasing this data, I have just aggregated and cleaned up already public data.

How can I monitor my accounts to know if they have been leaked?

I would suggest the following:

  1. Create a Google alert for your email address, username, and domain if you have one.
  2. Create a Pastebin account and set alerts for your email address, username, and domain if you have one.
  3. Sign up for account monitoring at haveibeenpwned.com, pwnedlist.com, breachalarm.com, canary.pw, or a similar site (feel free to add similar sites in the comments if you know of others).

Can I have your raw data?

No. Actually I have shared portions this data with companies who notify users of account leaks. Over the years I have gotten pretty good at finding passwords that others miss. But if you just want the raw data for any purpose than protecting users the answer is no.

Today I Am Releasing Ten Million Passwords

Frequently I get requests from students and security researchers to get a copy of my password research data. I typically decline to share the passwords but for quite some time I have wanted to provide a clean set of data to share with the world. A carefully-selected set of data provides great insight into user behavior and is valuable for furthering password security. So I built a data set of ten million usernames and passwords that I am releasing to the public domain.
Common PasswordsBut recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to.

“This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution”

In 2011 and 2012 news stories about Anonymous, Wikileaks, LulzSec, and other groups were daily increasing and the FBI was looking more and more incompetent to the public. With these groups becoming more bold and boastful and pressure on the FBI building, it wasn’t too surprising to see Brown arrested. He was close to Anonymous and was in fact their spokesman. The FBI took advantage of him linking to a data dump to initiate charges of identity theft and trafficking of authentication features. Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.

At Brown’s sentencing, Judge Lindsay was quoted as saying “What took place is not going to chill any 1st Amendment expression by Journalists.” But he was so wrong. Brown’s arrest and prosecution had a substantial chilling effect on journalism. Some journalists have simply stopped reporting on hacks from fear of retribution and others who still do are forced to employ extraordinary measures to protect themselves from prosecution.

Which brings me back to these ten million passwords.

Why the FBI Shouldn’t Arrest Me

Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone. Most researchers are afraid to publish usernames and passwords together because combined they become an authentication feature. If simply linking to already released authentication features in a private IRC channel was considered trafficking, surely the FBI would consider releasing the actual data to the public a crime.

But is it against the law? There are several statutes that the government used against brown as summarized by the Digital Media Law Project:

Count One: Traffic in Stolen Authentication Features, 18 U.S.C. §§ 1028(a)(2), (b)(1)(B), and (c)(3)(A); Aid and Abet, 18 U.S.C. § 2: Transferring the hyperlink to stolen credit card account information from one IRC channel to his own (#ProjectPM), thereby making stolen information available to other persons without Stratfor or the card holders’ knowledge or consent; aiding and abetting in the trafficking of this stolen data.

Count Two: Access Device Fraud, 18 U.S.C. §§ 1029(a)(3) and (c)(1)(A)(i); Aid and Abet, 18 U.S.C. § 2: Aiding and abetting the possession of at least fifteen unauthorized access devices with intent to defraud by possessing card information without the card holders’ knowledge and authorization.

Counts Three Through Twelve: Aggravated Identity Theft, 18 U.S.C. § 1028A(a)(1); Aid and Abet, 18 U.S.C. § 2: Ten counts of aiding and abetting identity theft, for knowingly and without authorization transferring identification documents by transferring and possessing means of identifying ten individuals in Texas, Florida, and Arizona, in the form of their credit card numbers and the corresponding CVVs for authentication as well as personal addresses and other contact information.

While these particular indictments refer to credit card data, the laws do also reference authentication features. Two of the key points here are knowingly and with intent to defraud.

In the case of me releasing usernames and passwords, the intent here is certainly not to defraud, facilitate unauthorized access to a computer system, steal the identity of others, to aid any crime or to harm any individual or entity. The sole intent is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.

To ensure that these logins cannot be used for illegal purposes, I have:

  1. Limited identifying information by removing the domain portion from email addresses
  2. Combined data samples from thousands of global incidents from the last five years with other data mixed in going back an additional ten years so the accounts cannot be tied to any one company.
  3. Removed any keywords, such as company names, that might indicate the source of the login information.
  4. Manually reviewed much of the data to remove information that might be particularly linked to an individual
  5. Removed information that appeared to be a credit card or financial account number.
  6. Where possible, removed accounts belonging to employees of any government or military sources [Note: although I can identify government or military logins when they include full email addresses, sometimes these logins get posted without the domains, without mentioning the source, or aggregated on other lists and therefore it is impossible to know if I have removed all references.]

Furthermore, I believe these are primarily dead passwords, which cannot be defined as authentication features because dead passwords will not allow you to authenticate. The likelihood of any authentication information included still being valid is low and therefore this data is largely useless for illegal purposes. To my knowledge, these passwords are dead because:

  1. All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext (unhashed and unencrypted) format and therefore already widely available to those with an intent to defraud or gained unauthorized access to computer systems.
  2. The data has been publicly available long enough (up to ten years) for companies to reset passwords and notify users. In fact, I would consider any organization to be grossly negligent to be unaware of these leaks and still have not changed user passwords after these being publicly visible for such a long period of time.
  3. The data is collected by numerous web sites such as haveibeenpwned or pwnedlist and others where users can check and be notified if their own accounts have been compromised.
  4. Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users.
  5. A portion of users, either on their own or required by policy, change their passwords on a regular basis regardless of being aware of compromised login information.
  6. Many organizations, particularly in some industries, actively identify unusual login patterns and automatically disable accounts or notify account owners.

Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations. This data is extremely valuable for academic and research purposes and for furthering authentication security and this is why I have released it to the public domain.

Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me.

I could have released this data anonymously like everyone else does but why should I have to? I clearly have no criminal intent here. It is beyond all reason that any researcher, student, or journalist have to be afraid of law enforcement agencies that are supposed to be protecting us instead of trying to find ways to use the laws against us.

Slippery Slopes

For now the laws are on my side because there has to be intent to commit or facilitate a crime. However, the White House has proposed some disturbing changes to the Computer Fraud and Abuse act that will make things much worse. Of particular note is 18 U.S.C. § 1030. (a)(6):

(6) knowingly and with intent to defraud willfully traffics (as defined in section 1029) in any password or similar information, or any other means of access, knowing or having reason to know that a protected computer would be accessed or damaged without authorization in a manner prohibited by this section as the result of such trafficking;

The key change here is the removal of an intent to defraud and replacing it with willfully; it will be illegal to share this information as long as you have any reason to know someone else might use it for unauthorized computer access.

It is troublesome to consider the unintended consequences resulting from this small change. I wrote about something back in 2007 that I’d like to say again:

…it reminds me of IT security best practices. Based on experience and the lessons we have learned in the history of IT security, we have come up with some basic rules that, when followed, go a long way to preventing serious problems later.

So many of us security professionals have made recommendations to software companies about potential security threats and often the response is that they don’t see why that particular threat is a big deal. For example, a bug might reveal the physical path to a web content directory. The software company might just say “so what?” because they cannot see how that would result in a compromise. Unfortunately, many companies have learned “so what” the hard way.

The fact is that it doesn’t matter if you can see the threat or not, and it doesn’t matter if the flaw ever leads to a vulnerability. You just always follow the core rules and everything else seems to fall into place.

This principle equally applies to the laws of our country; we should never violate basic rights even if the consequences aren’t immediately evident. As serious leaks become more common, surely we can expect tougher laws. But these laws are also making it difficult for those of us who wish to improve security by studying actual data. For years we have fought increasingly restrictive laws but the government’s argument has always been that it would only affect criminals.

The problem is that it is that the laws themselves change the very definition of a criminal and put many innocent professionals at risk.

The Download Link

Again, this is stupid that I have to do this, but:

BY DOWNLOADING THIS AUTHENTICATION DATA YOU AGREE NOT TO USE IT IN ANY MANNER WHICH IS UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL, OR IN CONNECTION WITH ANY UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL PURPOSE OR ACTIVITY INCLUDING BUT NOT LIMITED TO FRAUD, IDENTITY THEFT, OR UNAUTHORIZED COMPUTER SYSTEM ACCESS. THIS DATA IS ONLY MADE AVAILABLE FOR ACADEMIC AND RESEARCH PURPOSES.

Torrent (84.7 mb): Magnet link

For more information on this data, please see this FAQ.

As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just Google it.

If you wish to discuss analysis of this data, you may do so at http://reddit.com/r/passwords
Related Articles

A Glimpse Into the World of Internet Password Dumps

Is 123456 Really The Most Common Password?

 

 

Is 123456 Really The Most Common Password?

2014 Top 10 Passwords

2014 Top 10 Passwords

I recently worked with SplashData to compile their 2014 Worst Passwords List and yes, 123456 tops the list. In the data set of 3.3 million passwords I used for SplashData, almost 20,000 of those were in fact 123456. But how often do you really see people using that, or the second most common password, password in real life? Are people still really that careless with their passwords?

While 123456 is indeed the most common password, that statistic is a bit misleading. Although 0.6% of all users on my list used that password, it’s important to remember that 99.4% of the users on my list didn’t use that password. What is noteworthy here is that while the top passwords are still the top passwords, the number of people using those passwords has dramatically decreased.

The fact is that the top passwords are always going to be the top passwords, it’s just that the percentage of users actually using those will–at least we hope–continually get smaller. This year, for example, a hacker using the top 10 password list would statistically be able to guess 16 out of 1000 passwords.

Getting a true picture of user passwords is surprisingly difficult. Even though password is #2 on the list, I don’t know if I have seen someone actually use that password for years. Part of the problem is how we collect and analyze password data. Because we typically can’t just go to some company and ask for all their user passwords, we have to go with the data that is available to us. And that data does have problems.

Anomalies are More Prominent

As we saw, user passwords are improving but as percentages of common passwords decrease, anomalies begin to float to the top. There was a time that I didn’t worry too much about minor flaws in the data because as my data set grew those tended to fall to the bottom of the list. Now, however, those anomalies are becoming a problem.

For example, when I first ran my stats for 2014, the password lonen0 ranked as #7 in the list. Looking through the data I saw that all of these passwords came from a single source, the Belgium company EASYPAY GROUP, which had their data leaked in November of 2014. Looking through the raw data it appears that lonen0 was a default password that 10% of their users failed to set to something stronger. It’s just 10% of users from one company but that was enough to push it to the #7 most common password in my data set.

In 2014, all it takes for a password to get on the top 1000 list is to be used by just 0.0044% of all users.

Single Sets of Data vs Aggregated Data

There are numerous variables that affect which passwords users choose and therefore many people like to analyze sets of passwords dumped from a single source. There are two problems with this: first, we don’t really know all the variables that determine how users choose passwords. Second, data is always skewed when you analyze a single company as we saw with EASYPAY GROUP. Another example is if you look at password dump from Adobe you will see that the word adobe appears in many of the passwords.

On the other hand, if we aggregate all the data from multiple dumps and analyze it together, we may get the wrong picture. Doing this gives us no control variables and we end up with passwords like 123456 on the top of the list. If we had enough aggregated data that wouldn’t be an issue, but what exactly is enough data?

Cracked Passwords are Crackable; Hacked Companies are Hackable

Since most of the data we are looking at comes from password leaks, it is possible that 123456 tops the list simple because it is the easiest password to crack. Perhaps some hacker checked tens of thousands of email accounts to see if the password was 123456 and dumped all positive matches on the internet. In fact, part of the reason I only analyzed 3.3 million passwords this year is due to a large number of mail.ru, yandex.com and other Russian accounts that had unusual passwords such as qwerty and other keyboard patterns. Here is the top 10 list including all the Russian email accounts:

1. qwerty
2. 123456
3. qwertyuiop
4. 123456789
5. password
6. 12345678
7. 12345
8. 111111
9. 1qaz2wsx
10. qwe123

While these are common passwords, the Russian data was highly skewed which made me suspect that these were either fake accounts or hacked by checking only certain passwords. So while 3.3 million passwords isn’t a huge dataset to analyze, it is a clean set of data that seems to accurately reflect results I have seen in the past.

The other problem is that when a company gets hacked, often it is because they have not properly secured their data. If they have poor security practices, this could affect password policies and user training which might result in poor quality passwords.

Unfortunately, we do not know to what extent crackable passwords and hackable companies affect the quality of the password data we have to analyze.

No Indication of Source

When we work with publicly leaked passwords, we often don’t know the source of the data. We don’t know if the passwords are from some corporation with strict password policies, or if they come from hacked adult sites where many users are choosing passwords such as boobies, or if they are hacked Minecraft accounts where a large chunk of the users are kids or teenagers. We don’t know if the data came from keyloggers or phishing or password hashes.

We also don’t know when users set these passwords. When Adobe had 150 million user accounts leaked, clearly those passwords were from accounts and passwords created years ago. We do know that users are slowly getting better with their passwords, but if we don’t know when they set these passwords it is impossible for us to gauge that progress.

These are all significant variables and therefore makes it impossible for us to get an accurate picture of which passwords people truly are using where and when.

No Indication of User Attitude

The source of the data also strongly affects user attitude towards security. Many users have several common passwords they use which typically includes a strong one for bank and other sensitive accounts and another one for casual or one-time-use accounts such as a flower company shopping cart. The data we have gives no indication of the users’ attitude when they selected their password.

In the fifteen years I have been collecting passwords I have seen just one of my passwords publicly leaked. It was in the Yahoo! Voices password dump. I actually remember setting this password, I was on my phone at the time and was I was researching possible ways to syndicate my writing. I set up various accounts on different sites I was checking out, all with the same password because I was on my phone and security for these sites wasn’t particularly a concern, this just being casual research.

My password was October38, a throwaway password I occasionally used around that time. Although it is a decent password, it doesn’t represent the type of password I normally use. None of my other passwords show up on public dumps and there is no indication how often I used this particular password and no indication how often I used it or how it compares to the rest of my passwords.

So where are the passwords coming from and how does this affect user attitude? Are they PayPal accounts or a quick login someone created to comment on a small web site? Are they computer accounts that you can’t manage with a password manager and therefore users must memorize? We just do not know for much of the data.

Bad Data

Finally, the biggest problem when dealing with public password dumps is that sometimes you just get bad data and sometimes good data is ruined through poor parsing or conversion. When dealing with tens of millions of passwords and hundreds of gigabytes of files, bad data will make its way in there, and it is usually hard to spot this data without a manual review. While I do manually glance over most data I include, it is impossible to catch everything.

Here are some examples of bad data that I sometimes catch manually but is extremely difficult to identify with my automated parsing scripts.

Yet another problem is that since my goal is to identify user-selected passwords, I need to be able to spot data that isn’t real user data. Here is an example of a dump where both the usernames and passwords follow an obvious pattern and are clearly machine-generated. I don’t want that type of data.

My parsing script performs dozens of checks on each username and password but ultimately I have to still manually review data and still don’t catch everything.

As you can see there are many variables that can affect the data and therefore we can’t truly say which passwords users have set for stuff that really matters. Nevertheless, the statistics are consistent and the same passwords show up on the top year after year. What it comes down to is if you are one of those people who are using 123456 or password, please stop.

Use a Password Manager

So what do we do about bad passwords? As I have said before, you need to use a password manager. When I analyzed passwords in my book Perfect Passwords in 2005 I published a list of the Top 500 Passwords. Later in 2011 I published a list of the top 10,000 passwords. Now with the latest analysis in 2014 we can clearly see that list really doesn’t change much over the years. But the threats are increasing much faster than we can keep up.

The only solution is to stop trying to create and remember your own passwords. You just can’t create strong, unique passwords for each account you have and keep it all in your head. You cannot consider yourself secure on the internet unless you are using some tool to manage your passwords. Password managers let you generate strong passwords and manage them in a central location, protected by a single strong password.

SplashData, who worked with my on this analysis is the developer of the password manager SplashID Safe. Other password managers I user or have tested include LastPass, KeePass1Password, and Dashlane. I would recommend any of these products.

And because I know I will be asked, the following articles will be coming soon:

The New Top 10,000 Password List

How I Collect and Process Passwords

 

 

 

The Pathetic Reality of Adobe Password Hints

AdobeThe leak of 150 million Adobe passwords in October this year is perhaps the most epic security leak we have ever seen. It was huge. Not just because of the sheer volume of passwords, but also because it’s such a large dump from a single site, allowing for a much better analysis than earlier sets. But there’s something unique about the Adobe dump that makes it even more insightful–the fact that there are about 44 million password hints included in this dump. Even though we still haven’t decrypted the passwords, the data is extremely useful

One thing I have pondered over the years in analyzing passwords is trying to figure out *what* the password is. I can determine if the password contains a noun or a common name, but I can’t always determine what that noun or name means to the user.

For example, if the password is Fred2000, is that a dog’s name and a date? An uncle and his anniversary? The user’s own name and the year they set up the account? Once we know the significance of a password we gain a huge insight into how users select passwords. But I have never been able to come up with a method to even remotely measure this factor. Then came the Adobe dump.

The sheer amount of data in the Adobe dump makes it a bit overwhelming and somewhat difficult to work with. But if you remove the least common and least useful hints the data becomes a bit more manageable. Using a trimmed down set of about 10 million passwords, I was able to better work with the data to come up with some interesting insights.

Just glancing at the top one hundred hints, several patterns immediately become clear. In fact, what we learn is that a large percentage of the passwords are the name of a person, the name of a pet, the name of a place, or an important date.

Take dates for example. Consider the following list of top date-related hints:

Hint Total Note
birthday 29425
bday 17697
date 15272
birth 14956
DOB 13109
niver 9484 Spanish: Anniversary (short for aniversario)
fecha 8899 Spanish: Date
naissance 7892 French: Birth
anniversary 6959

In all, there are about 420,000 passwords with a date-related hint which represents about 3.6% of the passwords in the working set.

We see similar trends with dog names which account for 375,000 passwords or 3.2% of the total (plus another 120,00 that mention “pet”):

Hint Total Note
dog 70550
dogs name 13559
my dog 9780
dog’s name 8191
dog name 8187
perro 8000 Spanish
hund 7185 German, Danish, Swedish,  Norwegian
first dog 5653
chien 5542 French
doggy 5184

One interesting insight offered here is something we already know but find difficult to measure: password reuse. Surely a large percentage of these users have the same password across multiple sites, but it is interesting to see that about 361,000 users (or 3.11%) state this fact in their password hints:

Hint Total Note
same 44565
password 14634
always 13329
la de siempre 8559 Spanish: as always or the usual
same as always 8289
usual password 5277
same old 5111
siempre 4163 Spanish: always
normal password 3898
my password 3022

Keep in mind that these are just those passwords that admit to reuse in the hint. The number of passwords actually in use across multiple sites certainly is much greater than this.

Looking at the three lists above, we see that nearly 10% of the passwords fall into just these 3 categories. Adding names of people and places will likely account for 10% more.

So what did we learn by analyzing these hints?  First, that you should never use password hints. If users forget their password, they should use the password reset process. Second, that decades of user education has completely failed. No matter how much we advise not to use dates, family names or pet names in your passwords and no matter how much we tell people not to use the same passwords on multiple sites, you people will just do it anyway.

This is why we can’t have nice password policies.

 

 

Fingerprints and Passwords: A Guide for Non-Security Experts

iphoneToday Apple announced that the iPhone 5S will have a fingerprint scanner. Many of us in the security community are highly sceptical of this feature, while others saw this as a smart security move. Then of course there are the journalists who see fingerprints as the ultimate password killer. Clearly there is some disagreement here. I thought I’d lay this out for those of you who need to better understand the implications of using fingerprints vs or in addition to passwords.

Biometrics, like usernames and passwords, are a way to identify and authenticate yourself to a system. We all know that passwords can be weak and difficult to manage, which makes it tempting to call every new authentication product a password killer. But despite their flaws, passwords must always play some role in authentication.

The fact is that while passwords do have their flaws, they also have their strengths. The same is true with biometrics. You can’t just replace passwords with fingerprints and say you’ve solved the problem because you have introduced a few new problems.

To clarify this, below is a table that compares the characteristics of biometrics vs passwords, with check marks where one method has a clear advantage:

Passwords Biometrics
Difficult to remember Don’t have to remember 
Requires unique passwords for each system Can be used on every system 
Nothing else to carry around Nothing else to carry around
Take time to type Easy to swipe/sense 
Prone to typing errors Prone to sensor or algorithm errors
Immune to false positives  Susceptible to false positives
Easy to enroll  Some effort to enroll
Easy to change  Impossible to change
Can be shared among users 1  Cannot be shared 
Can be used without your knowledge Less likely to be used without your knowledge 
Cheap to implement  Requires hardware sensors
Work anywhere including browsers & mobile  Require separate implementation
Mature security practice  Still evolving
Non-proprietary  Proprietary
Susceptible to physical observation Susceptible to public observation
Susceptible to brute force attacks Resistant to brute force attacks 
Can be stored as hashes by untrusted third party  Third party must have access to raw data
Cannot personally identify you  Could identify you in the real world
Allow for multiple accounts  Cannot use to create multiple accounts
Can be forgotten; password dies with a person Susceptible to injuries, aging, and death
Susceptible to replay attacks Susceptible to replay attacks
Susceptible to weak implementations Susceptible to weak implementations
Not universally accessible to everyone Not universally accessible to everyone
Susceptible to poor user security practices Not susceptible to poor practices 
Lacks non-repudiation Moderate non-repudiation 
1 Can be both a strength and a weakness

 

What Does This Tell Us?

As you can see, biometrics clearly are not the best replacement for passwords, which is why so many security experts cringe when every biometrics company in their press releases claim themselves as the ultimate password killer. Biometrics do have some clear advantages over passwords, but they also have numerous disadvantages; they both can be weak and yet each can be strong, depending on the situation. Now the list above is not weighted–certainly some of the items are more important than others–but the point here is that you can’t simply compare passwords to biometrics and say that one is better than the other.

However, one thing you can say is that when you use passwords together with biometrics, you have something that is significantly stronger than either of the two alone. This is because you get the advantages of both techniques and only a few of the disadvantages. For example, we all know that you can’t change your fingerprint if compromised, but pair it with a password and you can change that password. Using these two together is referred to as two-factor authentication: something you know plus something you are.

It’s not clear, however, if the Apple implementation will allow for you to use both a fingerprint and password (or PIN) together.

Now specifically talking about the iPhone’s implementation of a fingerprint sensor, there are some interesting points to note. First, including it on the phone makes up for some of the usual biometric disadvantages such as enrollment, having special hardware sensors, and privacy issues due to only storing that data locally. Another interesting fact is that the phone itself is actually a third factor of authentication: something you possess. When combined with the other two factors it becomes an extremely reliable form of identification for use with other systems. A compromise would require being in physical possession of your phone, having your fingerprint, and knowing your PIN.

Ultimately, the security of the fingerprint scanner largely depends on the implementation, but even if it isn’t perfect, it is better than those millions of phones with no protection at all.

There is the issue of security that some have brought up: is this just a method for the NSA to build a master fingerprint database? Apple’s implementation encrypts and stores fingerprint locally using trusted hardware. Whether this is actually secure remains to be seen, but keep in mind that your fingerprints aren’t really that private: you literally leave them on everything you touch.

 

 

So What Exactly Did The US Government Ask Lavabit to Do?

The recent shutdown of Lavabit’s email services prompted a flurry of reporting and speculation about the extent US Government spying, mostly due to the mysterious statement by Lavabit founder Ladar Levison:

Most of us saw this as yet another possibly overhyped government spying issue and didn’t really think too much of it. Much of the media coverage is already starting to die down but there still is some question as to exactly what the government required of Levison that left him with only one option: shutting down his entire business he built from ground up. I wondered if there were enough clues out there to get some more insight into this case. I started by looking at exactly what Lavabit offered and how that all worked behind the scenes.

Lavabit Encryption

Lavabit claimed they had “developed a system so secure that it prevents everyone, including us, from reading the e-mail of the people that use it. ” This is a bold claim and one that surely was a primary selling point for their services.

The way it worked is relatively simple: Lavabit encrypted all incoming mail with the user’s public key before storing the message on their servers. Only the user, with the private key and password could decrypt messages. Normally with encrypted email, users store private keys on their own computers, but it appears that in the case of Lavabit, they stored the users’ private keys, each encrypted with a hash of that user’s password. This is by no means the most secure way of doing this, but it dramatically increases transparency and usability for the user. By doing this, for example, users do not need to worry about private keys and they still have access to their email from any computer.

So let’s break this down: a user logs in with their password. This login might occur via POP3, IMAP4, or through the web interface (which in turn connected internally via IMAP). Because Lavabit used the user’s password to encrypt the private key, they will need the original plaintext password which means they would not be able to support any secure authentication methods. In other words, all clients must send passwords using AUTH PLAIN or AUTH LOGIN with nothing more than base64 encoding. The webmail interface appears to have been available as both SSL and non-SSL and the POP3, IMAP4, and SMTP interfaces all seem to have accepted connections with or without SSL. All SSL connections terminated at the application tier.

Once a user sends a password, the Lavabit servers create SHA-512 hashes explained as follows:

… Lavabit combines the password with the account name and a cryptographic salt. This combined string is then hashed three consecutive times, with the former iteration’s output being used as the input value of the next iteration. The output of the first hash iteration is used as the secret passphrase for AES [encryption of the private key]. The third iteration is stored in our password database and is used to verify that users entered their password correctly.

The process they describe produces two hashes: one for decrypting the user’s private key and after two more hashing iterations, a hash to store in the database for user authentication. While this is a fairly secure process, given strong user passwords, it does weaken Lavabit’s claim that even their administrators couldn’t read your email. In reality all it would take is a few lines of code code to log the user’s original password which allows you to decrypt the private key which in turn allows you to receive and send mail as that user as well as access any stored messages.

The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security.

It is important to note that the scope of Lavabit’s encryption was limited to storage on it’s own servers. The public keys were for internal use and not something you published for others to use. Full protection would require employing PGP or S/MIME and having untapped SSL connections between all intermediate servers. On the other hand, if an email was sent through Lavabit already using PGP or S/MIME encryption, they would never be able to intercept or read those emails.

The question here is what exactly did the government request Levison to do that was so bad that he’d rather shut down his entire business? What information could Lavabit even produce that would be of interest to a government agency? Unencrypted emails, customer IP addresses, customer payment methods, and customer passwords. Based on media statements, it appears that he would be required to provide unencrypted copies of all emails going through his system.

Let’s look at some quotes levison has given to various media outlets. First, here are some quotes from an interview with CNET:

“We’ve had a couple of dozen court orders served to us over the past 10 years, but they’ve never crossed the line.”

“Philosophically, I put myself in a position that I was comfortable turning over the information that I had. I built Lavabit in a reaction to the original Patriot Act.”

“Where the government would hypothetically cross the line is to violate the privacy of all of my users. This is not about protecting a single person or persons, it’s about protecting all my users. What level of access to this nation does the government have?”

“Why should I collect that info if I didn’t need it? [That philosophy] also governed what kind of information I logged.”

“Unfortunately, what’s become clear is that there’s no protections in our current body of law to keep the government from compelling us to provide the information necessary to decrypt those communications in secret.”

“If you knew what I know about e-mail, you might not use it either.”

In an article from NBC News, we have this:

Levison stressed that he has complied with “upwards of two dozen court orders” for information in the past that were targeted at “specific users” and that “I never had a problem with that.” But without disclosing details, he suggested that the order he received more recently was markedly different, requiring him to cooperate in broadly based surveillance that would scoop up information about all the users of his service. He likened the demands to a requirement to install a tap on his telephone. Those demands apparently began about the time that Snowden surfaced as one of his customers, apparently triggering a secret legal battle between Levison and federal prosecutors.

And finally in an interview with RT he said:

I think the amount of information that they’re collecting on people that they have no right to collect information on is the most alarming thing,” he told RT. “I mean, the Fourth Amendment is supposed to guarantee that our government will only conduct surveillance on people in which it has a probable suspicion or evidence that they are committing some crime, and that that evidence has been reviewed by a judge and signed off by a judge before that surveillance begins. And if there’s anything alarming, it’s that now that’s all being done after the fact. Everything’s being recorded, and then a judge can after the fact say it’s okay to go look at the information.

Given the above information, let’s analyze some of the facts we know:

  • The government asked Lavabit to do something which levison considered to be a crime against the American people.
  • Levison was comfortable and had complied with warrants requesting information on specific users.
  • Levison told Forbes that “This is about protecting all of our users, not just one in particular.”
  • Levison is not even able to reveal some details with his own attorney or employees.
  • Shutting down operations was an option to circumspect compliance, although there was a veiled threat he could be arrested for doing so.
  • He did not delete customer data, he still has that in his possession so this was a request for ongoing surveillance.
  • This was a court order, which levison is fighting through the US Court of Appeals for the Fourth Circuit.
  • Levison compared the request to installing a tap on his telephone.

Apparently what made Levison uncomfortable with the request was that the fact that it collected information about all users, without regards to a warrant. Presumably law enforcement wanted to collect all data that they would later retroactively view as necessary once they had a warrant. The two issues here are that the Government wanted to collect information on innocent users (including Levison himself) and Levison would be out of the loop completely, taking away his control over what information he provided to law enforcement. These were the lines the Government crossed.

What’s interesting here is that Lavabit terminated the SSL connections right on the application servers themselves. These are the servers that also performed the encryption of email messages. Because of that, a regular network tap would be ineffective. The only way to perform the broad surveillance Levinson objected to would be (in order of likelihood) :

  1. Force Lavabit to provide their private SSL keys and route all their traffic through a government machine that performed a man-in-the-middle style data collection;
  2. Change their software to subvert Lavabit’s own security measures and log emails after SSL decryption but before encrypting with the users’ public keys; or
  3. Require Lavabit to install malicious code to infect their own customers with government-supplied malware.

Sure, this could have been a simple request to put a black box on Lavabit’s network and Levinson is just overreacting, but the evidence doesn’t seem to indicate that. Regardless of which of the requests the Government made, they would all make Levison’s entire business a lie; all efforts to encrypt messages would be pointless. Surely there were some heated words spoken when the Department of Justice heard about Levison’s decision, but this is not an act of civil disobedience on Levison’s part, his personal integrity was on the line. Compliance would make his very reason for running Lavabit a deception; a government-sponsored fraud.

While Lavabit initially had quite a bit of media coverage over this issue, the hype seems to be a casualty of our frenzied newscycle. But after looking closely at the facts here, I now see that this is a monumentally important issue, one that the media needs to once again address. The message here is that US courts can force a business to subvert their own security measures and lie to their customers, deliberately giving them a false sense of security. They can say what they want about security on their web sites, it means nothing. If they did it to Lavabit, how many hundreds or thousands of other US companies already participate in this deception?

If the courts can force a business to lie, we can never again trust the security claims of any US company. The reason so many businesses specifically rely on US services is the sense of stability and trust. How sad that an overreaching and panicked pursuit of a whistleblower has thrown that all away.

This issue is so much more than a simple civil liberties dispute, it is the integrity of a nation at stake. We walked with the devil in a time of need–that is a legacy we must live with–but at what point do we sever that relationship and return to the integrity required to lead the world through respect and not by fear?

 

Lavabit,png

2004-2013

 

UPDATE: Since publishing this post, this Wired article has since revealed that in fact Lavabit was required to supply their private SSL keys as suspected above.

 

Pafwert: Now Open Source

PafwertMore than 15 years ago I started working on a unique password generator that eventually evolved into a small program I now call Pafwert.

Pafwert is an unique tool to help you to select strong passwords that are easy to remember. Using strong entropy, tens of thousands of seed words, more than a hundred patterns with endless variations, and following password best practices, Pafwert can help you to select very strong passwords that are surprisingly easy to memorize. We have all seen random password generators, but Pafwert is very different.

Of course, while I still recommend using a password manager and generating completely random passwords, there are plenty of passwords we need to remember that we just aren’t able to save in a password manager. That is where Pafwert comes in.

Pafwert uses familiar patterns and a variety of memorization techniques to help you create strong passwords that are also easy to remember. Keep in mind that you don’t have to use the passwords exactly as it spits them out, you can use it simply as a tool to spark your own imagination when creating your passwords.

Pafwert is actually much more complex than it appears on the surface and generates passwords based on patterns and wordlists that you can customize. It then runs these passwords through a number of filters to obscure them just enough to make them unique. Yes, I probably wasted many thousands of hours overthinking this thing. Nevertheless, over the years it has gotten buried on my web site and largely forgotten (although I still use it myself every day).

I thought it was about time to update this tool and open source it (under the Apache license) to share it with the community. I would like to see it updated with new features and maybe even ported to PHP, but for now the code is there for anyone to play with. Note that I began work on this version of the code in 1999 so it is written in Visual Basic 6. That means that few of you will have the tools to do anything with the program itself (although I do have a complete dev environment in a VM if someone is serious enough about working on it).

If you would simply like to download the latest compiled version to install yourself, you can always grab it at http://xato.net/pafwert or you can check out the source code at GitHub.

If you want to get a taste for the complexity of this tool, you may want to spend a few minutes and read the Pattern Guide.

Hopefully someone can find this useful, if you do, let me know!


Pafwert – Smart Password Generator
https://github.com/m8urnett/pafwert
6 forks.
0 open issues.
Recent commits:


 

Email: The Security Industry’s Single Biggest Failure

Email securityI still remember so clearly the frustration I felt back in the 90’s when starting in the security industry and trying to sell my services. It was so difficult trying to emphasize just how much at risk potential clients were and then get them to pay me to fix their stuff. Too often I came off like the paranoid conspiracy theorist–their sky wasn’t falling and they saw no wolf.

I remember one particular conference call at the peak of my frustration where a network administrator confidently bragged to me and the managers on the call just how secure their network really was. What the managers didn’t know at the time was that as we were all talking, the network administrator was scrambling to lock things down as I was furiously trying to break in. Being that I was pretty good at that stuff at the time, I was able to quickly drop a little program called cdtray.exe onto a number computers, including the admin’s own PC, and used the at command to schedule all of their CD trays to open in one minute. I started asking the admin some questions and could hardly contain my amusement sixty seconds later as he suddenly seemed distracted. Then I went in for the kill: “are you convinced now you need more security?” I asked.

That was over a decade ago but I still remember the password: superchicken.

I didn’t get that job.

Nor did I get any work from Bank of America when I notified them of a glaring security flaw that exposed their global.asa file which contained their database username and password. That was over a decade ago but I still remember the password: superchicken. More on email security