The leak of 150 million Adobe passwords in October this year is perhaps the most epic security leak we have ever seen. It was huge. Not just because of the sheer volume of passwords, but also because it’s such a large dump from a single site, allowing for a much better analysis than earlier sets. But there’s something unique about the Adobe dump that makes it even more insightful–the fact that there are about 44 million password hints included in this dump. Even though we still haven’t decrypted the passwords, the data is extremely useful
One thing I have pondered over the years in analyzing passwords is trying to figure out *what* the password is. I can determine if the password contains a noun or a common name, but I can’t always determine what that noun or name means to the user.
For example, if the password is Fred2000, is that a dog’s name and a date? An uncle and his anniversary? The user’s own name and the year they set up the account? Once we know the significance of a password we gain a huge insight into how users select passwords. But I have never been able to come up with a method to even remotely measure this factor. Then came the Adobe dump.
The sheer amount of data in the Adobe dump makes it a bit overwhelming and somewhat difficult to work with. But if you remove the least common and least useful hints the data becomes a bit more manageable. Using a trimmed down set of about 10 million passwords, I was able to better work with the data to come up with some interesting insights.
Just glancing at the top one hundred hints, several patterns immediately become clear. In fact, what we learn is that a large percentage of the passwords are the name of a person, the name of a pet, the name of a place, or an important date.
Take dates for example. Consider the following list of top date-related hints:
|niver||9484||Spanish: Anniversary (short for aniversario)|
In all, there are about 420,000 passwords with a date-related hint which represents about 3.6% of the passwords in the working set.
We see similar trends with dog names which account for 375,000 passwords or 3.2% of the total (plus another 120,00 that mention “pet”):
|hund||7185||German, Danish, Swedish, Norwegian|
One interesting insight offered here is something we already know but find difficult to measure: password reuse. Surely a large percentage of these users have the same password across multiple sites, but it is interesting to see that about 361,000 users (or 3.11%) state this fact in their password hints:
|la de siempre||8559||Spanish: as always or the usual|
|same as always||8289|
Keep in mind that these are just those passwords that admit to reuse in the hint. The number of passwords actually in use across multiple sites certainly is much greater than this.
Looking at the three lists above, we see that nearly 10% of the passwords fall into just these 3 categories. Adding names of people and places will likely account for 10% more.
So what did we learn by analyzing these hints? First, that you should never use password hints. If users forget their password, they should use the password reset process. Second, that decades of user education has completely failed. No matter how much we advise not to use dates, family names or pet names in your passwords and no matter how much we tell people not to use the same passwords on multiple sites, you people will just do it anyway.
This is why we can’t have nice password policies.