Password Stock Photos

I’m getting a little tired of seeing the same stock photos over and over on passwords articles so I thought I’d share a set I had made to use on my blog that are at least a little different.

If you find these photos useful they are available for use royalty free with the following options:

Non-Commercial Commercial
You may use these photos for non-commercial uses with attribution to Mark Burnett (xato.net). You are also welcome to donate! You may use these photos for commercial purposes with a payment of $10 per photo per use. Your payment receipt is your license.. Attribution to Mark Burnett (xato.net) is optional. For resell use please contact me.

 

The base word clouds in these photos are the top passwords from my 10 million passwords list (made SFW) and were made using Processing with the WordCram library (and Photoshop to fill in the gaps)

Click on the thumbnails below to view/download the full resolution photos.

 

Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo

 

 

A Glimpse Into the World of Internet Password Dumps

Reading the comments around the internet about my release of 10 million passwords, I realized that perhaps some people don’t quite grasp how bad the situation really is. It’s really bad. The target audience of my original article was IT security professionals and network administrators who see this stuff on a daily basis, but the news of my data release has reached far beyond that audience which has brought to my attention some misunderstanding of the context of my data release. I thought maybe it would be helpful for people to get a glimpse into what I see as I collect passwords.

My main source for passwords in the last few years has been Pastebin and similar sites. Pastebin is a web site where you can paste text data to share with others. You can also do so anonymously. There are twitter bots and web sites that monitor new pastes and look for hackers leaking—or dumping—sensitive data they have stolen. On a typical day it is common to see around a hundred of these leaks, about half of those contain both usernames and unencrypted passwords—often referred to as combos—that I collect.

Dump Monitor Twitter Bot

Dump Monitor Twitter Bot

 

Because the data may show up in various formats, I have to parse it which I do with a tool I wrote named Hurl. This tool recognizes many different dump formats and parses out the usernames and passwords. Here is an example of the types of formats it recognizes.

What’s interesting about Pastebin is the number of scrapers there really are out there. Within less than a minute of posting this paste there were 81 views. After a few more minutes there were 173 views as shown below. As you can see there are more than a few people monitoring this stuff. Want to set up your own scraper? The Dumpmon source code is available although I personally prefer Pystemon.

Pastebin Scraper Views

Pastebin Scraper Views

Pastebin scrapers only catch pastes when they include the actual data in the paste. Sometimes it is just a link to another file so it is important to monitor links as well. Further, by taking those pastes and seeing who links to them you can find sources, such as twitter accounts, that announce these things. I monitor those as well.

After Pastebin there are several sites I keep up with that post leaks and stolen databases. Below is a screenshot of one of these sites.

Database Dumps

Database Dumps

I then take the names of those files and set up google alerts (and sometimes Pastebin alerts) for them. This often leads me to file collections such as the two below:

A collection of password files

A collection of password files

A collection of password files

Another collection of password files

I also alert on combos that certain hackers frequently use to create accounts such as Cucum01:Ber02, zolushka:natasha, and many others. These combos are so common in password lists they always lead to more passwords. Take a look at this Google search and you’ll see how prevalent these are, the alerts keep my inbox full.

Furthermore, there are hundreds of forums that share passwords. Perhaps a few screenshots are the best way to see just how many passwords people are sharing:

forum7

forum5

forum4

forum3

forum2

forum1

Password Sharing Forum

Of course, this is only a small sampling of forums. There are more than any one person could ever monitor.

There are also hundreds of thousands of web sites that share hacked passwords for gaming, video, porn, and file sharing sites. These don’t always produce the best quality passwords, but I do have scripts to scrape a number of these sites. In a single day those scripts can produce well over a million passwords.

If you were shocked by my releasing password data, take an hour exploring the internet and you will see that 10 million passwords really is a drop in a bucket, even a drop in a thousand buckets. Keep in mind that a big part of the effort in producing my data was getting it all the way down to 10 million in a balanced manner (I couldn’t just remove millions from the end of the file). It took me about three weeks to whittle down and then sanitize the data.

What I have shown here is only a small number of sources available out there. Most of the forums listed above provide “VIP” access for a monthly payment. If you want to spend a little money you have access to tens of millions more passwords than the freebies shared publicly. There are also IRC channels, Usenet groups, torrents, file sharing sites, and of course a number of hidden sources on Tor.

Now not all of these passwords are plaintext. Many dumps include passwords in a hashed format that requires you to crack them yourself. But that’s no problem, there are tools such as Hashcat and John the Ripper as well as wordlists out there that make this a trivial task.

If this isn’t already overwhelming, keep in mind that this is just the stuff that certain hackers have decided to make public. Surely the troves of accounts that have been hacked over the years completely dwarf what has been publicly shared. There could be billions, or tens of billions more accounts that have been hacked. If you are worried that the data I released contains your password, you still aren’t worried enough. There is a very good chance your passwords have been hacked, go change them.

So who besides me collects these passwords? Use your imagination.

 

 

 

Ten Million Passwords FAQ

Common PasswordsIn response to my recent release of 10 million passwords, I thought I would address some of the questions I am getting.

Where are the passwords from?

These are old passwords that have already been released to the public; none of these passwords are new leaks. They all are or were at one time completely available to anyone in an uncracked format. I have not included passwords that required cracking, payment, exclusive forum access, or anything else not available to the general public. You should still be able to find a large number of these passwords via a Google search.

The passwords were compiled by taking samples from thousands of password dumps, mostly from the last five years although it also includes much older data. I wanted to mix data from multiple sources to normalize inconsistencies and skewed data due to the type of web site, it’s users, and it’s security policies. (see this article for problems with password data)

The size of the samples from each site were determined by the data itself. Since the top 100 passwords have been very consistent over the last 20 years, I was able to use that to determine the quality of the source data. Some dumps contained so much bad data that I had to limit how much of it I included.

What is bad data?

Here are some examples of bad password data: http://pastebin.com/v6HCVDHN

How was the data collected?

I have been collecting passwords for about 15 years. In the past I have used a number of scripts to scrape the web, forums, IRC, Usenet, and P2P sources to get even 1,000 new passwords per day. In fact, it took me almost 10 years to collect just 6 million unique username/password combos (and at the time I thought that was huge).

However, in the last 5 years things have changed tremendously. I am now able to manually collect 10-20 million unique passwords per year simply from paste sites and forums. There have also been a number of very large password dumps with tens of millions of passwords in a single dump. Anyone could easily gather several hundreds of millions of passwords without much effort.

Why did you release this data?

The primary purpose is to get good, clean, and consistent data out in the world so others can find new ways to explore and gain knowledge from it. The data isn’t perfect and there are a few anomalies, but it should provide good insight into user password selection.

Really, why did you release this data?

I’m a bit obsessed with passwords.

Won’t this help hackers?

If a hacker needs this list to hack someone, they probably aren’t much of a threat.

What should I do if my password is on the list?

If your password is on this list that means it has already been publicly available for some time. You should change your password and enable two-factor authentication if available. Several of my own passwords are on the list as well, I left them there because they are already many places on the web.

What if my password is not on the list?

It doesn’t mean you are safe. This is a tiny sample of the hundreds of millions of accounts that have been publicly dumped and doesn’t even include the hundreds of millions more that have never been made public.

Is this unethical to release these passwords?

Although I have justified the release of these passwords, I have to admit it is at least close to the line. I have considered releasing this data for a number of years and have put much thought into the ethics involved; it is not something I take lightly. I could have replaced all the usernames with random numbers or hashes, but I felt like the usernames just had to be included. I did make sure to remove domain names from email addresses and other identifiers so that they couldn’t be directly linked to specific accounts. I also aggregated data from many sources so that this data could not be used to target any particular site. The thing to remember here though is that I am not releasing this data, I have just aggregated and cleaned up already public data.

How can I monitor my accounts to know if they have been leaked?

I would suggest the following:

  1. Create a Google alert for your email address, username, and domain if you have one.
  2. Create a Pastebin account and set alerts for your email address, username, and domain if you have one.
  3. Sign up for account monitoring at haveibeenpwned.com, pwnedlist.com, breachalarm.com, canary.pw, or a similar site (feel free to add similar sites in the comments if you know of others).

Can I have your raw data?

No. Actually I have shared portions this data with companies who notify users of account leaks. Over the years I have gotten pretty good at finding passwords that others miss. But if you just want the raw data for any purpose than protecting users the answer is no.

Today I Am Releasing Ten Million Passwords

Frequently I get requests from students and security researchers to get a copy of my password research data. I typically decline to share the passwords but for quite some time I have wanted to provide a clean set of data to share with the world. A carefully-selected set of data provides great insight into user behavior and is valuable for furthering password security. So I built a data set of ten million usernames and passwords that I am releasing to the public domain.
Common PasswordsBut recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to.

“This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution”

In 2011 and 2012 news stories about Anonymous, Wikileaks, LulzSec, and other groups were daily increasing and the FBI was looking more and more incompetent to the public. With these groups becoming more bold and boastful and pressure on the FBI building, it wasn’t too surprising to see Brown arrested. He was close to Anonymous and was in fact their spokesman. The FBI took advantage of him linking to a data dump to initiate charges of identity theft and trafficking of authentication features. Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.

At Brown’s sentencing, Judge Lindsay was quoted as saying “What took place is not going to chill any 1st Amendment expression by Journalists.” But he was so wrong. Brown’s arrest and prosecution had a substantial chilling effect on journalism. Some journalists have simply stopped reporting on hacks from fear of retribution and others who still do are forced to employ extraordinary measures to protect themselves from prosecution.

Which brings me back to these ten million passwords.

Why the FBI Shouldn’t Arrest Me

Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone. Most researchers are afraid to publish usernames and passwords together because combined they become an authentication feature. If simply linking to already released authentication features in a private IRC channel was considered trafficking, surely the FBI would consider releasing the actual data to the public a crime.

But is it against the law? There are several statutes that the government used against brown as summarized by the Digital Media Law Project:

Count One: Traffic in Stolen Authentication Features, 18 U.S.C. §§ 1028(a)(2), (b)(1)(B), and (c)(3)(A); Aid and Abet, 18 U.S.C. § 2: Transferring the hyperlink to stolen credit card account information from one IRC channel to his own (#ProjectPM), thereby making stolen information available to other persons without Stratfor or the card holders’ knowledge or consent; aiding and abetting in the trafficking of this stolen data.

Count Two: Access Device Fraud, 18 U.S.C. §§ 1029(a)(3) and (c)(1)(A)(i); Aid and Abet, 18 U.S.C. § 2: Aiding and abetting the possession of at least fifteen unauthorized access devices with intent to defraud by possessing card information without the card holders’ knowledge and authorization.

Counts Three Through Twelve: Aggravated Identity Theft, 18 U.S.C. § 1028A(a)(1); Aid and Abet, 18 U.S.C. § 2: Ten counts of aiding and abetting identity theft, for knowingly and without authorization transferring identification documents by transferring and possessing means of identifying ten individuals in Texas, Florida, and Arizona, in the form of their credit card numbers and the corresponding CVVs for authentication as well as personal addresses and other contact information.

While these particular indictments refer to credit card data, the laws do also reference authentication features. Two of the key points here are knowingly and with intent to defraud.

In the case of me releasing usernames and passwords, the intent here is certainly not to defraud, facilitate unauthorized access to a computer system, steal the identity of others, to aid any crime or to harm any individual or entity. The sole intent is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.

To ensure that these logins cannot be used for illegal purposes, I have:

  1. Limited identifying information by removing the domain portion from email addresses
  2. Combined data samples from thousands of global incidents from the last five years with other data mixed in going back an additional ten years so the accounts cannot be tied to any one company.
  3. Removed any keywords, such as company names, that might indicate the source of the login information.
  4. Manually reviewed much of the data to remove information that might be particularly linked to an individual
  5. Removed information that appeared to be a credit card or financial account number.
  6. Where possible, removed accounts belonging to employees of any government or military sources [Note: although I can identify government or military logins when they include full email addresses, sometimes these logins get posted without the domains, without mentioning the source, or aggregated on other lists and therefore it is impossible to know if I have removed all references.]

Furthermore, I believe these are primarily dead passwords, which cannot be defined as authentication features because dead passwords will not allow you to authenticate. The likelihood of any authentication information included still being valid is low and therefore this data is largely useless for illegal purposes. To my knowledge, these passwords are dead because:

  1. All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext (unhashed and unencrypted) format and therefore already widely available to those with an intent to defraud or gained unauthorized access to computer systems.
  2. The data has been publicly available long enough (up to ten years) for companies to reset passwords and notify users. In fact, I would consider any organization to be grossly negligent to be unaware of these leaks and still have not changed user passwords after these being publicly visible for such a long period of time.
  3. The data is collected by numerous web sites such as haveibeenpwned or pwnedlist and others where users can check and be notified if their own accounts have been compromised.
  4. Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users.
  5. A portion of users, either on their own or required by policy, change their passwords on a regular basis regardless of being aware of compromised login information.
  6. Many organizations, particularly in some industries, actively identify unusual login patterns and automatically disable accounts or notify account owners.

Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations. This data is extremely valuable for academic and research purposes and for furthering authentication security and this is why I have released it to the public domain.

Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me.

I could have released this data anonymously like everyone else does but why should I have to? I clearly have no criminal intent here. It is beyond all reason that any researcher, student, or journalist have to be afraid of law enforcement agencies that are supposed to be protecting us instead of trying to find ways to use the laws against us.

Slippery Slopes

For now the laws are on my side because there has to be intent to commit or facilitate a crime. However, the White House has proposed some disturbing changes to the Computer Fraud and Abuse act that will make things much worse. Of particular note is 18 U.S.C. § 1030. (a)(6):

(6) knowingly and with intent to defraud willfully traffics (as defined in section 1029) in any password or similar information, or any other means of access, knowing or having reason to know that a protected computer would be accessed or damaged without authorization in a manner prohibited by this section as the result of such trafficking;

The key change here is the removal of an intent to defraud and replacing it with willfully; it will be illegal to share this information as long as you have any reason to know someone else might use it for unauthorized computer access.

It is troublesome to consider the unintended consequences resulting from this small change. I wrote about something back in 2007 that I’d like to say again:

…it reminds me of IT security best practices. Based on experience and the lessons we have learned in the history of IT security, we have come up with some basic rules that, when followed, go a long way to preventing serious problems later.

So many of us security professionals have made recommendations to software companies about potential security threats and often the response is that they don’t see why that particular threat is a big deal. For example, a bug might reveal the physical path to a web content directory. The software company might just say “so what?” because they cannot see how that would result in a compromise. Unfortunately, many companies have learned “so what” the hard way.

The fact is that it doesn’t matter if you can see the threat or not, and it doesn’t matter if the flaw ever leads to a vulnerability. You just always follow the core rules and everything else seems to fall into place.

This principle equally applies to the laws of our country; we should never violate basic rights even if the consequences aren’t immediately evident. As serious leaks become more common, surely we can expect tougher laws. But these laws are also making it difficult for those of us who wish to improve security by studying actual data. For years we have fought increasingly restrictive laws but the government’s argument has always been that it would only affect criminals.

The problem is that it is that the laws themselves change the very definition of a criminal and put many innocent professionals at risk.

The Download Link

Again, this is stupid that I have to do this, but:

BY DOWNLOADING THIS AUTHENTICATION DATA YOU AGREE NOT TO USE IT IN ANY MANNER WHICH IS UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL, OR IN CONNECTION WITH ANY UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL PURPOSE OR ACTIVITY INCLUDING BUT NOT LIMITED TO FRAUD, IDENTITY THEFT, OR UNAUTHORIZED COMPUTER SYSTEM ACCESS. THIS DATA IS ONLY MADE AVAILABLE FOR ACADEMIC AND RESEARCH PURPOSES.

Torrent (84.7 mb): Magnet link

For more information on this data, please see this FAQ.

As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just Google it.

If you wish to discuss analysis of this data, you may do so at http://reddit.com/r/passwords
Related Articles

A Glimpse Into the World of Internet Password Dumps

Is 123456 Really The Most Common Password?

 

 

Is 123456 Really The Most Common Password?

2014 Top 10 Passwords

2014 Top 10 Passwords

I recently worked with SplashData to compile their 2014 Worst Passwords List and yes, 123456 tops the list. In the data set of 3.3 million passwords I used for SplashData, almost 20,000 of those were in fact 123456. But how often do you really see people using that, or the second most common password, password in real life? Are people still really that careless with their passwords?

While 123456 is indeed the most common password, that statistic is a bit misleading. Although 0.6% of all users on my list used that password, it’s important to remember that 99.4% of the users on my list didn’t use that password. What is noteworthy here is that while the top passwords are still the top passwords, the number of people using those passwords has dramatically decreased.

The fact is that the top passwords are always going to be the top passwords, it’s just that the percentage of users actually using those will–at least we hope–continually get smaller. This year, for example, a hacker using the top 10 password list would statistically be able to guess 16 out of 1000 passwords.

Getting a true picture of user passwords is surprisingly difficult. Even though password is #2 on the list, I don’t know if I have seen someone actually use that password for years. Part of the problem is how we collect and analyze password data. Because we typically can’t just go to some company and ask for all their user passwords, we have to go with the data that is available to us. And that data does have problems.

Anomalies are More Prominent

As we saw, user passwords are improving but as percentages of common passwords decrease, anomalies begin to float to the top. There was a time that I didn’t worry too much about minor flaws in the data because as my data set grew those tended to fall to the bottom of the list. Now, however, those anomalies are becoming a problem.

For example, when I first ran my stats for 2014, the password lonen0 ranked as #7 in the list. Looking through the data I saw that all of these passwords came from a single source, the Belgium company EASYPAY GROUP, which had their data leaked in November of 2014. Looking through the raw data it appears that lonen0 was a default password that 10% of their users failed to set to something stronger. It’s just 10% of users from one company but that was enough to push it to the #7 most common password in my data set.

In 2014, all it takes for a password to get on the top 1000 list is to be used by just 0.0044% of all users.

Single Sets of Data vs Aggregated Data

There are numerous variables that affect which passwords users choose and therefore many people like to analyze sets of passwords dumped from a single source. There are two problems with this: first, we don’t really know all the variables that determine how users choose passwords. Second, data is always skewed when you analyze a single company as we saw with EASYPAY GROUP. Another example is if you look at password dump from Adobe you will see that the word adobe appears in many of the passwords.

On the other hand, if we aggregate all the data from multiple dumps and analyze it together, we may get the wrong picture. Doing this gives us no control variables and we end up with passwords like 123456 on the top of the list. If we had enough aggregated data that wouldn’t be an issue, but what exactly is enough data?

Cracked Passwords are Crackable; Hacked Companies are Hackable

Since most of the data we are looking at comes from password leaks, it is possible that 123456 tops the list simple because it is the easiest password to crack. Perhaps some hacker checked tens of thousands of email accounts to see if the password was 123456 and dumped all positive matches on the internet. In fact, part of the reason I only analyzed 3.3 million passwords this year is due to a large number of mail.ru, yandex.com and other Russian accounts that had unusual passwords such as qwerty and other keyboard patterns. Here is the top 10 list including all the Russian email accounts:

1. qwerty
2. 123456
3. qwertyuiop
4. 123456789
5. password
6. 12345678
7. 12345
8. 111111
9. 1qaz2wsx
10. qwe123

While these are common passwords, the Russian data was highly skewed which made me suspect that these were either fake accounts or hacked by checking only certain passwords. So while 3.3 million passwords isn’t a huge dataset to analyze, it is a clean set of data that seems to accurately reflect results I have seen in the past.

The other problem is that when a company gets hacked, often it is because they have not properly secured their data. If they have poor security practices, this could affect password policies and user training which might result in poor quality passwords.

Unfortunately, we do not know to what extent crackable passwords and hackable companies affect the quality of the password data we have to analyze.

No Indication of Source

When we work with publicly leaked passwords, we often don’t know the source of the data. We don’t know if the passwords are from some corporation with strict password policies, or if they come from hacked adult sites where many users are choosing passwords such as boobies, or if they are hacked Minecraft accounts where a large chunk of the users are kids or teenagers. We don’t know if the data came from keyloggers or phishing or password hashes.

We also don’t know when users set these passwords. When Adobe had 150 million user accounts leaked, clearly those passwords were from accounts and passwords created years ago. We do know that users are slowly getting better with their passwords, but if we don’t know when they set these passwords it is impossible for us to gauge that progress.

These are all significant variables and therefore makes it impossible for us to get an accurate picture of which passwords people truly are using where and when.

No Indication of User Attitude

The source of the data also strongly affects user attitude towards security. Many users have several common passwords they use which typically includes a strong one for bank and other sensitive accounts and another one for casual or one-time-use accounts such as a flower company shopping cart. The data we have gives no indication of the users’ attitude when they selected their password.

In the fifteen years I have been collecting passwords I have seen just one of my passwords publicly leaked. It was in the Yahoo! Voices password dump. I actually remember setting this password, I was on my phone at the time and was I was researching possible ways to syndicate my writing. I set up various accounts on different sites I was checking out, all with the same password because I was on my phone and security for these sites wasn’t particularly a concern, this just being casual research.

My password was October38, a throwaway password I occasionally used around that time. Although it is a decent password, it doesn’t represent the type of password I normally use. None of my other passwords show up on public dumps and there is no indication how often I used this particular password and no indication how often I used it or how it compares to the rest of my passwords.

So where are the passwords coming from and how does this affect user attitude? Are they PayPal accounts or a quick login someone created to comment on a small web site? Are they computer accounts that you can’t manage with a password manager and therefore users must memorize? We just do not know for much of the data.

Bad Data

Finally, the biggest problem when dealing with public password dumps is that sometimes you just get bad data and sometimes good data is ruined through poor parsing or conversion. When dealing with tens of millions of passwords and hundreds of gigabytes of files, bad data will make its way in there, and it is usually hard to spot this data without a manual review. While I do manually glance over most data I include, it is impossible to catch everything.

Here are some examples of bad data that I sometimes catch manually but is extremely difficult to identify with my automated parsing scripts.

Yet another problem is that since my goal is to identify user-selected passwords, I need to be able to spot data that isn’t real user data. Here is an example of a dump where both the usernames and passwords follow an obvious pattern and are clearly machine-generated. I don’t want that type of data.

My parsing script performs dozens of checks on each username and password but ultimately I have to still manually review data and still don’t catch everything.

As you can see there are many variables that can affect the data and therefore we can’t truly say which passwords users have set for stuff that really matters. Nevertheless, the statistics are consistent and the same passwords show up on the top year after year. What it comes down to is if you are one of those people who are using 123456 or password, please stop.

Use a Password Manager

So what do we do about bad passwords? As I have said before, you need to use a password manager. When I analyzed passwords in my book Perfect Passwords in 2005 I published a list of the Top 500 Passwords. Later in 2011 I published a list of the top 10,000 passwords. Now with the latest analysis in 2014 we can clearly see that list really doesn’t change much over the years. But the threats are increasing much faster than we can keep up.

The only solution is to stop trying to create and remember your own passwords. You just can’t create strong, unique passwords for each account you have and keep it all in your head. You cannot consider yourself secure on the internet unless you are using some tool to manage your passwords. Password managers let you generate strong passwords and manage them in a central location, protected by a single strong password.

SplashData, who worked with my on this analysis is the developer of the password manager SplashID Safe. Other password managers I user or have tested include LastPass, KeePass1Password, and Dashlane. I would recommend any of these products.

And because I know I will be asked, the following articles will be coming soon:

The New Top 10,000 Password List

How I Collect and Process Passwords

 

 

 

Today is Password Day, Go Change Five Passwords Now

passdayI have always been a big fan of password days. While it is always important to regularly change your passwords, there is a specific benefit to changing a large number of passwords all at once. To understand why this is so effective, it is important to understand how hackers work.

Security intrusions are typically the result of a chain of failures. It’s usually not one big mistake that lets the hackers in, it is a series of smaller mistakes that eventually lead to compromise. Furthermore, the intrusion is rarely full admin access with the first exploit, it is an incremental process where each step leads to getting deeper and deeper into the target. The process involves compromising passwords for numerous accounts along the way.

Here’s the problem, changing one password or patching one hole doesn’t rarely locks out the hacker because at that point they probably have collected a good list of passwords and have many ways to get back in if needed. In fact, if someone is at the point where they have access to to several key accounts on several key servers, the chances of locking them out completely are pretty slim. Often the only effective way to lock them out is through a massive undertaking involving patching all holes and changing all account passwords in as small amount of time possible.

91% of all user passwords sampled all appear on the list of just the top 1,000 passwords.

While organizations with many servers or a large number of employees have a huge attack surface that will require a massive effort in the event of an intrusion, individuals users have a little more of an advantage in that we only need to change a dozen or so passwords.

At least that used to be the case. Nowadays, it is not uncommon for a heavy internet users to have hundreds or even over a thousand accounts. Clearly, changing all your accounts in a single day is nearly impossible for many people.

My strategy now is to take one day each month (I do the last Saturday of the month) and go through and change 5-10 passwords. While I’m at it, I go through and check the privacy and security settings for the account and add two-factor authentication if it is available.

So today, being World Password Day, now would be a good time to go and change some of your passwords. Here are a few links for some common web sites to get you started:

eBay
Change Password | Account Settings | Review App Authorizations | Activate Hardware Token

LinkedIn
Change Password | Privacy Settings | Enable Two Factor Auth via SMS

GitHub
Change Password | Enable Two Factor Auth | Review App Authorizations

WordPress
Change Password | Enable Two Factor Auth | Review App Authorizations

DropBox
Change Password & Enable Two Factor Auth 

The single most effective tool users have is a password manager such as LastPass or KeePass. A password manager gives you a list of all your accounts and usually shows the age of each password. Password managers also include password generators to create strong, unique passwords for each site.

Should You Ditch LastPass?

LastPassSteve Thomas, aka Sc00bz, has brought up some very interesting issues about the LastPass password monitor that are causing some confusion so I thought I’d give another perspective on the issue.

Summary of Steve’s points:

  1. When you use the LastPass web site to login to your account, your web browser will first send a hash with a single iteration, no matter how many iterations you have set for your account. It isn’t until this hash fails that the browser tells the user the correct number of iterations to use.
  2. LastPass has a default setting of 500 iterations (at least at that time, now it recommends 5000 iterations).
  3. The extension should warn you if it is going to send a hash with fewer iterations than what you have set.
  4. LastPass does not encrypt the URLs of sites stored in your password database

LastPass hashes your password rather than sending the plain text to the server when you login. The algorithm it uses is sha256(sha256(email + password) + password). This hash, while not necessarily insecure, can be cracked in a reasonable amount of time with ordinary hardware, unless the user has a relatively strong password. It isn’t until after this single iteration hash is sent that the LastPass server responds and tells the browser exactly how many iterations it should use; hash is sent again using the correct number of iterations. More iterations means it will take much more time to crack your password. A good minimum number of iterations is 5,000. If you go too high with the number of iterations, some clients such as mobile phones may be very slow logging in.

This is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass

The mitigating factors here are:

  1. You are logging in via SSL so the primary threats here are a MitM attack with spoofed SSL certificates, a government warrant, or a government spy agency.
  2. They still need to crack your hash so if you have a very strong password, even a single iteration hash could provide a reasonable amount of protection.
  3. A second factor of authentication, country restrictions, blocking tor logins,  restricting mobile access, and other settings still protect your account from unauthorized logins, unless the attacker is able to obtain your stored hashes through hacking, warrant, or spying.

One thing I might also add is that the server is telling the client how many iterations it expects, so this does make an attack much easier if someone acquires your hash.

My opinion is that this is an issue that certainly should be addressed, but it is not serious enough to warrant abandoning LastPass altogether unless your inpidual threat model includes the NSA or other government agencies. The LastPass plugin should identify when someone is logging in to LastPass via the web login and provide the client-side script with the correct number of iterations. The server should never respond with this at all. However, if someone is logging in through a web browser that doesn’t have LastPass with their account data installed, the server sending the number of iterations is the only option.

The only proper solution here is to have your primary login different than the decryption login, at least for accessing the web interface if not everywhere. That way, the number of iterations is never publicly revealed and sending a single iteration hash would be unnecessary. Other companies such as RoboForm use this method. I have always wanted this feature and I would highly recommend LastPass implement this if it is feasible.

As for the other points, the default iteration count mentioned in number 2 has been addressed and the warning mentioned in number 3 would be a good thing to add if it already hasn’t but this would not be possible if using a web browser with your LastPass account installation.

iterations

As for encrypting URLs mentioned in number 4, LastPass’s response was that this is necessary to grab favicons. Although unencrypted URLs may not be an issue, there certainly are scenarios where you would want these encrypted. LastPass should make this an option for the user.

LastPass does provide strong security controls, although there clearly is room for improvement. If you do not find LastPass to be secure enough, the only reasonable alternative I would recommend is KeePass, which puts you in complete control over your data while still being quite usable. I would not recommend ditching LastPass, but I would recommend that LastPass address these issues. I would also recommend that Steve Thomas keep up the great research he provides to the community.

I have not heard a recent response from LastPass on these issues but would love to hear from them. I will update this post if and when I do.


Disclosure: I am a LastPass user and I get a free month of premium service whenever someone clicks on banners located on this site. I have no affiliation with LastPass and receive no other compensation from the company. 

Pafwert: Now Open Source

PafwertMore than 15 years ago I started working on a unique password generator that eventually evolved into a small program I now call Pafwert.

Pafwert is an unique tool to help you to select strong passwords that are easy to remember. Using strong entropy, tens of thousands of seed words, more than a hundred patterns with endless variations, and following password best practices, Pafwert can help you to select very strong passwords that are surprisingly easy to memorize. We have all seen random password generators, but Pafwert is very different.

Of course, while I still recommend using a password manager and generating completely random passwords, there are plenty of passwords we need to remember that we just aren’t able to save in a password manager. That is where Pafwert comes in.

Pafwert uses familiar patterns and a variety of memorization techniques to help you create strong passwords that are also easy to remember. Keep in mind that you don’t have to use the passwords exactly as it spits them out, you can use it simply as a tool to spark your own imagination when creating your passwords.

Pafwert is actually much more complex than it appears on the surface and generates passwords based on patterns and wordlists that you can customize. It then runs these passwords through a number of filters to obscure them just enough to make them unique. Yes, I probably wasted many thousands of hours overthinking this thing. Nevertheless, over the years it has gotten buried on my web site and largely forgotten (although I still use it myself every day).

I thought it was about time to update this tool and open source it (under the Apache license) to share it with the community. I would like to see it updated with new features and maybe even ported to PHP, but for now the code is there for anyone to play with. Note that I began work on this version of the code in 1999 so it is written in Visual Basic 6. That means that few of you will have the tools to do anything with the program itself (although I do have a complete dev environment in a VM if someone is serious enough about working on it).

If you would simply like to download the latest compiled version to install yourself, you can always grab it at http://xato.net/pafwert or you can check out the source code at GitHub.

If you want to get a taste for the complexity of this tool, you may want to spend a few minutes and read the Pattern Guide.

Hopefully someone can find this useful, if you do, let me know!


Pafwert – Smart Password Generator
https://github.com/m8urnett/pafwert
5 forks.
0 open issues.
Recent commits:


 

Now eBay Wants in on Password Patents

I wrote a couple months ago about the many attempts to patent various methods of checking passwords. Now eBay wants in on the game with United States Patent Application 20120284783. Here’s their summary:

A proposed password is decomposed into basic components to determine and score transitions between the basic components and create a password score that measures the strength of the proposed password based on rules, such as concatenation, insertion, and replacement. The proposed password is scored against all known words, such as when a user is first asked to create a password for an account or access. The proposed password can also be scored against one or more previous passwords for the user, such as when the user is asked to change the user’s previous password, to determine similarity between the two passwords.

Reading through the claims, this is by no means novel or innovative and there certainly is plenty of prior art for this. Want to help prevent yet another abuse of the patent system? You can post any evidence of prior art on this Ask Patents post.

 

RSA’s Distributed Credential Protection: Yeah They Are Overselling it a Bit.

RSA recently announced their new Distributed Credential Protection (DCP) product which they proudly tout as a “revolutionary” way to secure user credentials. But looking closer (especially at that $160,000 per license price tag), I’m not so sure this product will do much to protect anyone’s credentials.

But let me say this first, the technology itself is absolutely brilliant. Without getting into the details of threshold cryptography (there’s an excellent article by Peter S. Gemmell on page 7 of this PDF), what it does is allow you to split up a secret into any number of parts but you only need a specified number of parts to reproduce the data.

“…let me say this first, the technology itself is absolutely brilliant”

It’s kind of like how you see nuclear missile launches in movies: two people have to insert and turn their keys at the same time to initiate the launch. But threshold cryptography is even more advanced, it would be like handing out 5 keys but you only need any 2 of them to fire the missile. What makes the technology so cool is that it gives you redundancy, integrity, and secrecy but no single piece is useful for obtaining the secret. This technology has many uses in cryptography (it would be perfect for Bitcoin) but I think that RSA’s claim that it will revolutionize password protection is greatly overstated.

The problem is that yes, you are splitting up credentials into multiple parts but all of those parts are components of the same system. It would be like handing both missile launch keys to the same person. Yes, someone would have to steal both keys, but if they can steal one from you couldn’t they just steal the other?

Now one of the claims RSA makes is that if you suspect that an attacker has compromised one of the databases, you can immediately randomize and rescramble the pieces so when they grab the second database the data is useless. So yeah if you happen to catch an attack right after an attacker grabs the first bundle of data but before they grab the second bundle, and you are able to immediately identify all points of intrusion and lock out the attacker so they can’t go back in and re-grab the first bundle, then yes this will work. What are the chances of that happening? Slim to none.

Splitting the databases into two locations is not particularly helpful because both must be accessible to the web server, which is usually the point of entry in these types of attacks, and therefore if an attacker can access one database they can likely access them both. Again, it’s like handing both keys to the same person.

The thing is that RSA’s DCP product is addressing the wrong problem with the wrong solution. The reason most companies get their data leaked is because they have poorly secured their public-facing servers and applications and that they don’t follow best practices for storing user credentials. Both of these problems already have solutions and any organization would be better off spending their money on some code audits and pen-testing.

The fact is that if you have problems with hackers getting into your databases, I think you will still have problems even after shelling out $160,000 for DCP. If you don’t have that problem because you have proper security controls and practices already in place, chances are you don’t even need DCP.

To be fair I have to mention that I have not seen or reviewed this implementation in depth so I could in fact be completely wrong with my criticisms. Perhaps this system could be deployed in such a way that it is much more resilient than I am supposing. And certainly RSA acknowledges that this product is just one layer in a multi-layered defense-in-depth strategy. But I still come back to the fact that you are giving both keys to the same person.

What I would like to see is this technology implemented in a much smarter manner. For example, distributing credentials across multiple distinct trust authorities. For example, it would be a great way to overcome many of the weaknesses and distribution issues we see with SSL certificates. Having multiple holders of a secret not only better protects the secrets but upholds integrity in the case a small number of authorities are compromised. This technology could be helpful for preventing insider attacks and would be useful if you have your servers at third-party data centers that you may not completely trust. There are also some legal advantages with having databases distributed across multiple jurisdictions. And hey, if this technology prevented just one attack, in the absence of other attacks it would probably be worth the expense.

There are many other areas that could greatly benefit from threshold cryptography, but splitting credential storage within an organization is probably not one of them. The concept of a black box authentication appliance (although this is vm-based) is a great direction to be going, considering how many organizations simply don’t implement credential storage correctly, but they seem to be overselling (and overpricing) what this product really can accomplish.