2014 Top 10 Passwords
I recently worked with SplashData to compile their 2014 Worst Passwords List and yes, 123456 tops the list. In the data set of 3.3 million passwords I used for SplashData, almost 20,000 of those were in fact 123456. But how often do you really see people using that, or the second most common password, password in real life? Are people still really that careless with their passwords?
While 123456 is indeed the most common password, that statistic is a bit misleading. Although 0.6% of all users on my list used that password, it’s important to remember that 99.4% of the users on my list didn’t use that password. What is noteworthy here is that while the top passwords are still the top passwords, the number of people using those passwords has dramatically decreased.
The fact is that the top passwords are always going to be the top passwords, it’s just that the percentage of users actually using those will–at least we hope–continually get smaller. This year, for example, a hacker using the top 10 password list would statistically be able to guess 16 out of 1000 passwords.
Getting a true picture of user passwords is surprisingly difficult. Even though password is #2 on the list, I don’t know if I have seen someone actually use that password for years. Part of the problem is how we collect and analyze password data. Because we typically can’t just go to some company and ask for all their user passwords, we have to go with the data that is available to us. And that data does have problems.
Anomalies are More Prominent
As we saw, user passwords are improving but as percentages of common passwords decrease, anomalies begin to float to the top. There was a time that I didn’t worry too much about minor flaws in the data because as my data set grew those tended to fall to the bottom of the list. Now, however, those anomalies are becoming a problem.
For example, when I first ran my stats for 2014, the password lonen0 ranked as #7 in the list. Looking through the data I saw that all of these passwords came from a single source, the Belgium company EASYPAY GROUP, which had their data leaked in November of 2014. Looking through the raw data it appears that lonen0 was a default password that 10% of their users failed to set to something stronger. It’s just 10% of users from one company but that was enough to push it to the #7 most common password in my data set.
In 2014, all it takes for a password to get on the top 1000 list is to be used by just 0.0044% of all users.
Single Sets of Data vs Aggregated Data
There are numerous variables that affect which passwords users choose and therefore many people like to analyze sets of passwords dumped from a single source. There are two problems with this: first, we don’t really know all the variables that determine how users choose passwords. Second, data is always skewed when you analyze a single company as we saw with EASYPAY GROUP. Another example is if you look at password dump from Adobe you will see that the word adobe appears in many of the passwords.
On the other hand, if we aggregate all the data from multiple dumps and analyze it together, we may get the wrong picture. Doing this gives us no control variables and we end up with passwords like 123456 on the top of the list. If we had enough aggregated data that wouldn’t be an issue, but what exactly is enough data?
Cracked Passwords are Crackable; Hacked Companies are Hackable
Since most of the data we are looking at comes from password leaks, it is possible that 123456 tops the list simple because it is the easiest password to crack. Perhaps some hacker checked tens of thousands of email accounts to see if the password was 123456 and dumped all positive matches on the internet. In fact, part of the reason I only analyzed 3.3 million passwords this year is due to a large number of mail.ru, yandex.com and other Russian accounts that had unusual passwords such as qwerty and other keyboard patterns. Here is the top 10 list including all the Russian email accounts:
While these are common passwords, the Russian data was highly skewed which made me suspect that these were either fake accounts or hacked by checking only certain passwords. So while 3.3 million passwords isn’t a huge dataset to analyze, it is a clean set of data that seems to accurately reflect results I have seen in the past.
The other problem is that when a company gets hacked, often it is because they have not properly secured their data. If they have poor security practices, this could affect password policies and user training which might result in poor quality passwords.
Unfortunately, we do not know to what extent crackable passwords and hackable companies affect the quality of the password data we have to analyze.
No Indication of Source
When we work with publicly leaked passwords, we often don’t know the source of the data. We don’t know if the passwords are from some corporation with strict password policies, or if they come from hacked adult sites where many users are choosing passwords such as boobies, or if they are hacked Minecraft accounts where a large chunk of the users are kids or teenagers. We don’t know if the data came from keyloggers or phishing or password hashes.
We also don’t know when users set these passwords. When Adobe had 150 million user accounts leaked, clearly those passwords were from accounts and passwords created years ago. We do know that users are slowly getting better with their passwords, but if we don’t know when they set these passwords it is impossible for us to gauge that progress.
These are all significant variables and therefore makes it impossible for us to get an accurate picture of which passwords people truly are using where and when.
No Indication of User Attitude
The source of the data also strongly affects user attitude towards security. Many users have several common passwords they use which typically includes a strong one for bank and other sensitive accounts and another one for casual or one-time-use accounts such as a flower company shopping cart. The data we have gives no indication of the users’ attitude when they selected their password.
In the fifteen years I have been collecting passwords I have seen just one of my passwords publicly leaked. It was in the Yahoo! Voices password dump. I actually remember setting this password, I was on my phone at the time and was I was researching possible ways to syndicate my writing. I set up various accounts on different sites I was checking out, all with the same password because I was on my phone and security for these sites wasn’t particularly a concern, this just being casual research.
My password was October38, a throwaway password I occasionally used around that time. Although it is a decent password, it doesn’t represent the type of password I normally use. None of my other passwords show up on public dumps and there is no indication how often I used this particular password and no indication how often I used it or how it compares to the rest of my passwords.
So where are the passwords coming from and how does this affect user attitude? Are they PayPal accounts or a quick login someone created to comment on a small web site? Are they computer accounts that you can’t manage with a password manager and therefore users must memorize? We just do not know for much of the data.
Finally, the biggest problem when dealing with public password dumps is that sometimes you just get bad data and sometimes good data is ruined through poor parsing or conversion. When dealing with tens of millions of passwords and hundreds of gigabytes of files, bad data will make its way in there, and it is usually hard to spot this data without a manual review. While I do manually glance over most data I include, it is impossible to catch everything.
Here are some examples of bad data that I sometimes catch manually but is extremely difficult to identify with my automated parsing scripts.
Yet another problem is that since my goal is to identify user-selected passwords, I need to be able to spot data that isn’t real user data. Here is an example of a dump where both the usernames and passwords follow an obvious pattern and are clearly machine-generated. I don’t want that type of data.
My parsing script performs dozens of checks on each username and password but ultimately I have to still manually review data and still don’t catch everything.
As you can see there are many variables that can affect the data and therefore we can’t truly say which passwords users have set for stuff that really matters. Nevertheless, the statistics are consistent and the same passwords show up on the top year after year. What it comes down to is if you are one of those people who are using 123456 or password, please stop.
Use a Password Manager
So what do we do about bad passwords? As I have said before, you need to use a password manager. When I analyzed passwords in my book Perfect Passwords in 2005 I published a list of the Top 500 Passwords. Later in 2011 I published a list of the top 10,000 passwords. Now with the latest analysis in 2014 we can clearly see that list really doesn’t change much over the years. But the threats are increasing much faster than we can keep up.
The only solution is to stop trying to create and remember your own passwords. You just can’t create strong, unique passwords for each account you have and keep it all in your head. You cannot consider yourself secure on the internet unless you are using some tool to manage your passwords. Password managers let you generate strong passwords and manage them in a central location, protected by a single strong password.
SplashData, who worked with my on this analysis is the developer of the password manager SplashID Safe. Other password managers I user or have tested include LastPass, KeePass, 1Password, and Dashlane. I would recommend any of these products.
And because I know I will be asked, the following articles will be coming soon:
The New Top 10,000 Password List
How I Collect and Process Passwords