Password Stock Photos

I’m getting a little tired of seeing the same stock photos over and over on passwords articles so I thought I’d share a set I had made to use on my blog that are at least a little different.

If you find these photos useful they are available for use royalty free with the following options:

Non-Commercial Commercial
You may use these photos for non-commercial uses with attribution to Mark Burnett (xato.net). You are also welcome to donate! You may use these photos for commercial purposes with a payment of $10 per photo per use. Your payment receipt is your license.. Attribution to Mark Burnett (xato.net) is optional. For resell use please contact me.

 

The base word clouds in these photos are the top passwords from my 10 million passwords list (made SFW) and were made using Processing with the WordCram library (and Photoshop to fill in the gaps)

Click on the thumbnails below to view/download the full resolution photos.

 

Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo
Passwords stock photo Passwords stock photo Passwords stock photo Passwords stock photo

 

 

A Glimpse Into the World of Internet Password Dumps

Reading the comments around the internet about my release of 10 million passwords, I realized that perhaps some people don’t quite grasp how bad the situation really is. It’s really bad. The target audience of my original article was IT security professionals and network administrators who see this stuff on a daily basis, but the news of my data release has reached far beyond that audience which has brought to my attention some misunderstanding of the context of my data release. I thought maybe it would be helpful for people to get a glimpse into what I see as I collect passwords.

My main source for passwords in the last few years has been Pastebin and similar sites. Pastebin is a web site where you can paste text data to share with others. You can also do so anonymously. There are twitter bots and web sites that monitor new pastes and look for hackers leaking—or dumping—sensitive data they have stolen. On a typical day it is common to see around a hundred of these leaks, about half of those contain both usernames and unencrypted passwords—often referred to as combos—that I collect.

Dump Monitor Twitter Bot

Dump Monitor Twitter Bot

 

Because the data may show up in various formats, I have to parse it which I do with a tool I wrote named Hurl. This tool recognizes many different dump formats and parses out the usernames and passwords. Here is an example of the types of formats it recognizes.

What’s interesting about Pastebin is the number of scrapers there really are out there. Within less than a minute of posting this paste there were 81 views. After a few more minutes there were 173 views as shown below. As you can see there are more than a few people monitoring this stuff. Want to set up your own scraper? The Dumpmon source code is available although I personally prefer Pystemon.

Pastebin Scraper Views

Pastebin Scraper Views

Pastebin scrapers only catch pastes when they include the actual data in the paste. Sometimes it is just a link to another file so it is important to monitor links as well. Further, by taking those pastes and seeing who links to them you can find sources, such as twitter accounts, that announce these things. I monitor those as well.

After Pastebin there are several sites I keep up with that post leaks and stolen databases. Below is a screenshot of one of these sites.

Database Dumps

Database Dumps

I then take the names of those files and set up google alerts (and sometimes Pastebin alerts) for them. This often leads me to file collections such as the two below:

A collection of password files

A collection of password files

A collection of password files

Another collection of password files

I also alert on combos that certain hackers frequently use to create accounts such as Cucum01:Ber02, zolushka:natasha, and many others. These combos are so common in password lists they always lead to more passwords. Take a look at this Google search and you’ll see how prevalent these are, the alerts keep my inbox full.

Furthermore, there are hundreds of forums that share passwords. Perhaps a few screenshots are the best way to see just how many passwords people are sharing:

forum7

forum5

forum4

forum3

forum2

forum1

Password Sharing Forum

Of course, this is only a small sampling of forums. There are more than any one person could ever monitor.

There are also hundreds of thousands of web sites that share hacked passwords for gaming, video, porn, and file sharing sites. These don’t always produce the best quality passwords, but I do have scripts to scrape a number of these sites. In a single day those scripts can produce well over a million passwords.

If you were shocked by my releasing password data, take an hour exploring the internet and you will see that 10 million passwords really is a drop in a bucket, even a drop in a thousand buckets. Keep in mind that a big part of the effort in producing my data was getting it all the way down to 10 million in a balanced manner (I couldn’t just remove millions from the end of the file). It took me about three weeks to whittle down and then sanitize the data.

What I have shown here is only a small number of sources available out there. Most of the forums listed above provide “VIP” access for a monthly payment. If you want to spend a little money you have access to tens of millions more passwords than the freebies shared publicly. There are also IRC channels, Usenet groups, torrents, file sharing sites, and of course a number of hidden sources on Tor.

Now not all of these passwords are plaintext. Many dumps include passwords in a hashed format that requires you to crack them yourself. But that’s no problem, there are tools such as Hashcat and John the Ripper as well as wordlists out there that make this a trivial task.

If this isn’t already overwhelming, keep in mind that this is just the stuff that certain hackers have decided to make public. Surely the troves of accounts that have been hacked over the years completely dwarf what has been publicly shared. There could be billions, or tens of billions more accounts that have been hacked. If you are worried that the data I released contains your password, you still aren’t worried enough. There is a very good chance your passwords have been hacked, go change them.

So who besides me collects these passwords? Use your imagination.

 

 

 

Ten Million Passwords FAQ

Common PasswordsIn response to my recent release of 10 million passwords, I thought I would address some of the questions I am getting.

Where are the passwords from?

These are old passwords that have already been released to the public; none of these passwords are new leaks. They all are or were at one time completely available to anyone in an uncracked format. I have not included passwords that required cracking, payment, exclusive forum access, or anything else not available to the general public. You should still be able to find a large number of these passwords via a Google search.

The passwords were compiled by taking samples from thousands of password dumps, mostly from the last five years although it also includes much older data. I wanted to mix data from multiple sources to normalize inconsistencies and skewed data due to the type of web site, it’s users, and it’s security policies. (see this article for problems with password data)

The size of the samples from each site were determined by the data itself. Since the top 100 passwords have been very consistent over the last 20 years, I was able to use that to determine the quality of the source data. Some dumps contained so much bad data that I had to limit how much of it I included.

What is bad data?

Here are some examples of bad password data: http://pastebin.com/v6HCVDHN

How was the data collected?

I have been collecting passwords for about 15 years. In the past I have used a number of scripts to scrape the web, forums, IRC, Usenet, and P2P sources to get even 1,000 new passwords per day. In fact, it took me almost 10 years to collect just 6 million unique username/password combos (and at the time I thought that was huge).

However, in the last 5 years things have changed tremendously. I am now able to manually collect 10-20 million unique passwords per year simply from paste sites and forums. There have also been a number of very large password dumps with tens of millions of passwords in a single dump. Anyone could easily gather several hundreds of millions of passwords without much effort.

Why did you release this data?

The primary purpose is to get good, clean, and consistent data out in the world so others can find new ways to explore and gain knowledge from it. The data isn’t perfect and there are a few anomalies, but it should provide good insight into user password selection.

Really, why did you release this data?

I’m a bit obsessed with passwords.

Won’t this help hackers?

If a hacker needs this list to hack someone, they probably aren’t much of a threat.

What should I do if my password is on the list?

If your password is on this list that means it has already been publicly available for some time. You should change your password and enable two-factor authentication if available. Several of my own passwords are on the list as well, I left them there because they are already many places on the web.

What if my password is not on the list?

It doesn’t mean you are safe. This is a tiny sample of the hundreds of millions of accounts that have been publicly dumped and doesn’t even include the hundreds of millions more that have never been made public.

Is this unethical to release these passwords?

Although I have justified the release of these passwords, I have to admit it is at least close to the line. I have considered releasing this data for a number of years and have put much thought into the ethics involved; it is not something I take lightly. I could have replaced all the usernames with random numbers or hashes, but I felt like the usernames just had to be included. I did make sure to remove domain names from email addresses and other identifiers so that they couldn’t be directly linked to specific accounts. I also aggregated data from many sources so that this data could not be used to target any particular site. The thing to remember here though is that I am not releasing this data, I have just aggregated and cleaned up already public data.

How can I monitor my accounts to know if they have been leaked?

I would suggest the following:

  1. Create a Google alert for your email address, username, and domain if you have one.
  2. Create a Pastebin account and set alerts for your email address, username, and domain if you have one.
  3. Sign up for account monitoring at haveibeenpwned.com, pwnedlist.com, breachalarm.com, canary.pw, or a similar site (feel free to add similar sites in the comments if you know of others).

Can I have your raw data?

No. Actually I have shared portions this data with companies who notify users of account leaks. Over the years I have gotten pretty good at finding passwords that others miss. But if you just want the raw data for any purpose than protecting users the answer is no.

Today I Am Releasing Ten Million Passwords

Frequently I get requests from students and security researchers to get a copy of my password research data. I typically decline to share the passwords but for quite some time I have wanted to provide a clean set of data to share with the world. A carefully-selected set of data provides great insight into user behavior and is valuable for furthering password security. So I built a data set of ten million usernames and passwords that I am releasing to the public domain.
Common PasswordsBut recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to.

“This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution”

In 2011 and 2012 news stories about Anonymous, Wikileaks, LulzSec, and other groups were daily increasing and the FBI was looking more and more incompetent to the public. With these groups becoming more bold and boastful and pressure on the FBI building, it wasn’t too surprising to see Brown arrested. He was close to Anonymous and was in fact their spokesman. The FBI took advantage of him linking to a data dump to initiate charges of identity theft and trafficking of authentication features. Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.

At Brown’s sentencing, Judge Lindsay was quoted as saying “What took place is not going to chill any 1st Amendment expression by Journalists.” But he was so wrong. Brown’s arrest and prosecution had a substantial chilling effect on journalism. Some journalists have simply stopped reporting on hacks from fear of retribution and others who still do are forced to employ extraordinary measures to protect themselves from prosecution.

Which brings me back to these ten million passwords.

Why the FBI Shouldn’t Arrest Me

Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone. Most researchers are afraid to publish usernames and passwords together because combined they become an authentication feature. If simply linking to already released authentication features in a private IRC channel was considered trafficking, surely the FBI would consider releasing the actual data to the public a crime.

But is it against the law? There are several statutes that the government used against brown as summarized by the Digital Media Law Project:

Count One: Traffic in Stolen Authentication Features, 18 U.S.C. §§ 1028(a)(2), (b)(1)(B), and (c)(3)(A); Aid and Abet, 18 U.S.C. § 2: Transferring the hyperlink to stolen credit card account information from one IRC channel to his own (#ProjectPM), thereby making stolen information available to other persons without Stratfor or the card holders’ knowledge or consent; aiding and abetting in the trafficking of this stolen data.

Count Two: Access Device Fraud, 18 U.S.C. §§ 1029(a)(3) and (c)(1)(A)(i); Aid and Abet, 18 U.S.C. § 2: Aiding and abetting the possession of at least fifteen unauthorized access devices with intent to defraud by possessing card information without the card holders’ knowledge and authorization.

Counts Three Through Twelve: Aggravated Identity Theft, 18 U.S.C. § 1028A(a)(1); Aid and Abet, 18 U.S.C. § 2: Ten counts of aiding and abetting identity theft, for knowingly and without authorization transferring identification documents by transferring and possessing means of identifying ten individuals in Texas, Florida, and Arizona, in the form of their credit card numbers and the corresponding CVVs for authentication as well as personal addresses and other contact information.

While these particular indictments refer to credit card data, the laws do also reference authentication features. Two of the key points here are knowingly and with intent to defraud.

In the case of me releasing usernames and passwords, the intent here is certainly not to defraud, facilitate unauthorized access to a computer system, steal the identity of others, to aid any crime or to harm any individual or entity. The sole intent is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.

To ensure that these logins cannot be used for illegal purposes, I have:

  1. Limited identifying information by removing the domain portion from email addresses
  2. Combined data samples from thousands of global incidents from the last five years with other data mixed in going back an additional ten years so the accounts cannot be tied to any one company.
  3. Removed any keywords, such as company names, that might indicate the source of the login information.
  4. Manually reviewed much of the data to remove information that might be particularly linked to an individual
  5. Removed information that appeared to be a credit card or financial account number.
  6. Where possible, removed accounts belonging to employees of any government or military sources [Note: although I can identify government or military logins when they include full email addresses, sometimes these logins get posted without the domains, without mentioning the source, or aggregated on other lists and therefore it is impossible to know if I have removed all references.]

Furthermore, I believe these are primarily dead passwords, which cannot be defined as authentication features because dead passwords will not allow you to authenticate. The likelihood of any authentication information included still being valid is low and therefore this data is largely useless for illegal purposes. To my knowledge, these passwords are dead because:

  1. All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext (unhashed and unencrypted) format and therefore already widely available to those with an intent to defraud or gained unauthorized access to computer systems.
  2. The data has been publicly available long enough (up to ten years) for companies to reset passwords and notify users. In fact, I would consider any organization to be grossly negligent to be unaware of these leaks and still have not changed user passwords after these being publicly visible for such a long period of time.
  3. The data is collected by numerous web sites such as haveibeenpwned or pwnedlist and others where users can check and be notified if their own accounts have been compromised.
  4. Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users.
  5. A portion of users, either on their own or required by policy, change their passwords on a regular basis regardless of being aware of compromised login information.
  6. Many organizations, particularly in some industries, actively identify unusual login patterns and automatically disable accounts or notify account owners.

Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations. This data is extremely valuable for academic and research purposes and for furthering authentication security and this is why I have released it to the public domain.

Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me.

I could have released this data anonymously like everyone else does but why should I have to? I clearly have no criminal intent here. It is beyond all reason that any researcher, student, or journalist have to be afraid of law enforcement agencies that are supposed to be protecting us instead of trying to find ways to use the laws against us.

Slippery Slopes

For now the laws are on my side because there has to be intent to commit or facilitate a crime. However, the White House has proposed some disturbing changes to the Computer Fraud and Abuse act that will make things much worse. Of particular note is 18 U.S.C. § 1030. (a)(6):

(6) knowingly and with intent to defraud willfully traffics (as defined in section 1029) in any password or similar information, or any other means of access, knowing or having reason to know that a protected computer would be accessed or damaged without authorization in a manner prohibited by this section as the result of such trafficking;

The key change here is the removal of an intent to defraud and replacing it with willfully; it will be illegal to share this information as long as you have any reason to know someone else might use it for unauthorized computer access.

It is troublesome to consider the unintended consequences resulting from this small change. I wrote about something back in 2007 that I’d like to say again:

…it reminds me of IT security best practices. Based on experience and the lessons we have learned in the history of IT security, we have come up with some basic rules that, when followed, go a long way to preventing serious problems later.

So many of us security professionals have made recommendations to software companies about potential security threats and often the response is that they don’t see why that particular threat is a big deal. For example, a bug might reveal the physical path to a web content directory. The software company might just say “so what?” because they cannot see how that would result in a compromise. Unfortunately, many companies have learned “so what” the hard way.

The fact is that it doesn’t matter if you can see the threat or not, and it doesn’t matter if the flaw ever leads to a vulnerability. You just always follow the core rules and everything else seems to fall into place.

This principle equally applies to the laws of our country; we should never violate basic rights even if the consequences aren’t immediately evident. As serious leaks become more common, surely we can expect tougher laws. But these laws are also making it difficult for those of us who wish to improve security by studying actual data. For years we have fought increasingly restrictive laws but the government’s argument has always been that it would only affect criminals.

The problem is that it is that the laws themselves change the very definition of a criminal and put many innocent professionals at risk.

The Download Link

Again, this is stupid that I have to do this, but:

BY DOWNLOADING THIS AUTHENTICATION DATA YOU AGREE NOT TO USE IT IN ANY MANNER WHICH IS UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL, OR IN CONNECTION WITH ANY UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL PURPOSE OR ACTIVITY INCLUDING BUT NOT LIMITED TO FRAUD, IDENTITY THEFT, OR UNAUTHORIZED COMPUTER SYSTEM ACCESS. THIS DATA IS ONLY MADE AVAILABLE FOR ACADEMIC AND RESEARCH PURPOSES.

Torrent (84.7 mb): Magnet link

For more information on this data, please see this FAQ.

As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just Google it.

If you wish to discuss analysis of this data, you may do so at http://reddit.com/r/passwords
Related Articles

A Glimpse Into the World of Internet Password Dumps

Is 123456 Really The Most Common Password?

 

 

Is 123456 Really The Most Common Password?

2014 Top 10 Passwords

2014 Top 10 Passwords

I recently worked with SplashData to compile their 2014 Worst Passwords List and yes, 123456 tops the list. In the data set of 3.3 million passwords I used for SplashData, almost 20,000 of those were in fact 123456. But how often do you really see people using that, or the second most common password, password in real life? Are people still really that careless with their passwords?

While 123456 is indeed the most common password, that statistic is a bit misleading. Although 0.6% of all users on my list used that password, it’s important to remember that 99.4% of the users on my list didn’t use that password. What is noteworthy here is that while the top passwords are still the top passwords, the number of people using those passwords has dramatically decreased.

The fact is that the top passwords are always going to be the top passwords, it’s just that the percentage of users actually using those will–at least we hope–continually get smaller. This year, for example, a hacker using the top 10 password list would statistically be able to guess 16 out of 1000 passwords.

Getting a true picture of user passwords is surprisingly difficult. Even though password is #2 on the list, I don’t know if I have seen someone actually use that password for years. Part of the problem is how we collect and analyze password data. Because we typically can’t just go to some company and ask for all their user passwords, we have to go with the data that is available to us. And that data does have problems.

Anomalies are More Prominent

As we saw, user passwords are improving but as percentages of common passwords decrease, anomalies begin to float to the top. There was a time that I didn’t worry too much about minor flaws in the data because as my data set grew those tended to fall to the bottom of the list. Now, however, those anomalies are becoming a problem.

For example, when I first ran my stats for 2014, the password lonen0 ranked as #7 in the list. Looking through the data I saw that all of these passwords came from a single source, the Belgium company EASYPAY GROUP, which had their data leaked in November of 2014. Looking through the raw data it appears that lonen0 was a default password that 10% of their users failed to set to something stronger. It’s just 10% of users from one company but that was enough to push it to the #7 most common password in my data set.

In 2014, all it takes for a password to get on the top 1000 list is to be used by just 0.0044% of all users.

Single Sets of Data vs Aggregated Data

There are numerous variables that affect which passwords users choose and therefore many people like to analyze sets of passwords dumped from a single source. There are two problems with this: first, we don’t really know all the variables that determine how users choose passwords. Second, data is always skewed when you analyze a single company as we saw with EASYPAY GROUP. Another example is if you look at password dump from Adobe you will see that the word adobe appears in many of the passwords.

On the other hand, if we aggregate all the data from multiple dumps and analyze it together, we may get the wrong picture. Doing this gives us no control variables and we end up with passwords like 123456 on the top of the list. If we had enough aggregated data that wouldn’t be an issue, but what exactly is enough data?

Cracked Passwords are Crackable; Hacked Companies are Hackable

Since most of the data we are looking at comes from password leaks, it is possible that 123456 tops the list simple because it is the easiest password to crack. Perhaps some hacker checked tens of thousands of email accounts to see if the password was 123456 and dumped all positive matches on the internet. In fact, part of the reason I only analyzed 3.3 million passwords this year is due to a large number of mail.ru, yandex.com and other Russian accounts that had unusual passwords such as qwerty and other keyboard patterns. Here is the top 10 list including all the Russian email accounts:

1. qwerty
2. 123456
3. qwertyuiop
4. 123456789
5. password
6. 12345678
7. 12345
8. 111111
9. 1qaz2wsx
10. qwe123

While these are common passwords, the Russian data was highly skewed which made me suspect that these were either fake accounts or hacked by checking only certain passwords. So while 3.3 million passwords isn’t a huge dataset to analyze, it is a clean set of data that seems to accurately reflect results I have seen in the past.

The other problem is that when a company gets hacked, often it is because they have not properly secured their data. If they have poor security practices, this could affect password policies and user training which might result in poor quality passwords.

Unfortunately, we do not know to what extent crackable passwords and hackable companies affect the quality of the password data we have to analyze.

No Indication of Source

When we work with publicly leaked passwords, we often don’t know the source of the data. We don’t know if the passwords are from some corporation with strict password policies, or if they come from hacked adult sites where many users are choosing passwords such as boobies, or if they are hacked Minecraft accounts where a large chunk of the users are kids or teenagers. We don’t know if the data came from keyloggers or phishing or password hashes.

We also don’t know when users set these passwords. When Adobe had 150 million user accounts leaked, clearly those passwords were from accounts and passwords created years ago. We do know that users are slowly getting better with their passwords, but if we don’t know when they set these passwords it is impossible for us to gauge that progress.

These are all significant variables and therefore makes it impossible for us to get an accurate picture of which passwords people truly are using where and when.

No Indication of User Attitude

The source of the data also strongly affects user attitude towards security. Many users have several common passwords they use which typically includes a strong one for bank and other sensitive accounts and another one for casual or one-time-use accounts such as a flower company shopping cart. The data we have gives no indication of the users’ attitude when they selected their password.

In the fifteen years I have been collecting passwords I have seen just one of my passwords publicly leaked. It was in the Yahoo! Voices password dump. I actually remember setting this password, I was on my phone at the time and was I was researching possible ways to syndicate my writing. I set up various accounts on different sites I was checking out, all with the same password because I was on my phone and security for these sites wasn’t particularly a concern, this just being casual research.

My password was October38, a throwaway password I occasionally used around that time. Although it is a decent password, it doesn’t represent the type of password I normally use. None of my other passwords show up on public dumps and there is no indication how often I used this particular password and no indication how often I used it or how it compares to the rest of my passwords.

So where are the passwords coming from and how does this affect user attitude? Are they PayPal accounts or a quick login someone created to comment on a small web site? Are they computer accounts that you can’t manage with a password manager and therefore users must memorize? We just do not know for much of the data.

Bad Data

Finally, the biggest problem when dealing with public password dumps is that sometimes you just get bad data and sometimes good data is ruined through poor parsing or conversion. When dealing with tens of millions of passwords and hundreds of gigabytes of files, bad data will make its way in there, and it is usually hard to spot this data without a manual review. While I do manually glance over most data I include, it is impossible to catch everything.

Here are some examples of bad data that I sometimes catch manually but is extremely difficult to identify with my automated parsing scripts.

Yet another problem is that since my goal is to identify user-selected passwords, I need to be able to spot data that isn’t real user data. Here is an example of a dump where both the usernames and passwords follow an obvious pattern and are clearly machine-generated. I don’t want that type of data.

My parsing script performs dozens of checks on each username and password but ultimately I have to still manually review data and still don’t catch everything.

As you can see there are many variables that can affect the data and therefore we can’t truly say which passwords users have set for stuff that really matters. Nevertheless, the statistics are consistent and the same passwords show up on the top year after year. What it comes down to is if you are one of those people who are using 123456 or password, please stop.

Use a Password Manager

So what do we do about bad passwords? As I have said before, you need to use a password manager. When I analyzed passwords in my book Perfect Passwords in 2005 I published a list of the Top 500 Passwords. Later in 2011 I published a list of the top 10,000 passwords. Now with the latest analysis in 2014 we can clearly see that list really doesn’t change much over the years. But the threats are increasing much faster than we can keep up.

The only solution is to stop trying to create and remember your own passwords. You just can’t create strong, unique passwords for each account you have and keep it all in your head. You cannot consider yourself secure on the internet unless you are using some tool to manage your passwords. Password managers let you generate strong passwords and manage them in a central location, protected by a single strong password.

SplashData, who worked with my on this analysis is the developer of the password manager SplashID Safe. Other password managers I user or have tested include LastPass, KeePass1Password, and Dashlane. I would recommend any of these products.

And because I know I will be asked, the following articles will be coming soon:

The New Top 10,000 Password List

How I Collect and Process Passwords

 

 

 

Today is Password Day, Go Change Five Passwords Now

passdayI have always been a big fan of password days. While it is always important to regularly change your passwords, there is a specific benefit to changing a large number of passwords all at once. To understand why this is so effective, it is important to understand how hackers work.

Security intrusions are typically the result of a chain of failures. It’s usually not one big mistake that lets the hackers in, it is a series of smaller mistakes that eventually lead to compromise. Furthermore, the intrusion is rarely full admin access with the first exploit, it is an incremental process where each step leads to getting deeper and deeper into the target. The process involves compromising passwords for numerous accounts along the way.

Here’s the problem, changing one password or patching one hole doesn’t rarely locks out the hacker because at that point they probably have collected a good list of passwords and have many ways to get back in if needed. In fact, if someone is at the point where they have access to to several key accounts on several key servers, the chances of locking them out completely are pretty slim. Often the only effective way to lock them out is through a massive undertaking involving patching all holes and changing all account passwords in as small amount of time possible.

91% of all user passwords sampled all appear on the list of just the top 1,000 passwords.

While organizations with many servers or a large number of employees have a huge attack surface that will require a massive effort in the event of an intrusion, individuals users have a little more of an advantage in that we only need to change a dozen or so passwords.

At least that used to be the case. Nowadays, it is not uncommon for a heavy internet users to have hundreds or even over a thousand accounts. Clearly, changing all your accounts in a single day is nearly impossible for many people.

My strategy now is to take one day each month (I do the last Saturday of the month) and go through and change 5-10 passwords. While I’m at it, I go through and check the privacy and security settings for the account and add two-factor authentication if it is available.

So today, being World Password Day, now would be a good time to go and change some of your passwords. Here are a few links for some common web sites to get you started:

eBay
Change Password | Account Settings | Review App Authorizations | Activate Hardware Token

LinkedIn
Change Password | Privacy Settings | Enable Two Factor Auth via SMS

GitHub
Change Password | Enable Two Factor Auth | Review App Authorizations

WordPress
Change Password | Enable Two Factor Auth | Review App Authorizations

DropBox
Change Password & Enable Two Factor Auth 

The single most effective tool users have is a password manager such as LastPass or KeePass. A password manager gives you a list of all your accounts and usually shows the age of each password. Password managers also include password generators to create strong, unique passwords for each site.

The Untapped Power of Security Headers

HTTP HeadersUser-targeted attacks, such as XSS, often involve manipulating the server or browser into behaving in a manner not originally intended. While this has always been a serious risk, new HTML5 technologies enhance browser capabilities enough to make client-side attacks an increasingly inviting attack vector.

For a web developer, protecting the client has traditionally been limited to preventing client-side injection and browser manipulation. Beyond that, there was not much more that could be done other than helping users to make smart security decisions. But with the new features of HTML5 comes new ways to improve client-side security through security headers.

Security headers are HTTP response headers that direct the browser’s behavior to enhance a web site’s security. To prevent attackers from manipulating browser behavior, security headers allow you to be specific about your intentions for content served from your web site.

Some of these header specifications are fairly new or browser-specific and they are not universally implemented across all browsers, but there is enough support that now is the time to start implementing these in some form.

If you aren’t already aware of the various security headers available, I thought I would do a quick summary:

 

Content Security Policy (CSP)

CSP is a least privilege approach to security that uses the Content-Security-Policy, X-Content-Security-Policy, or X-WebKit-CSP headers (depending on your browser) to define exactly which resources the browser is allowed to load for any particular page and from where they can be loaded. So if you want a certain document to only be able to load scripts from a specific server you can now specify that through the CSP header. CSP currently allows you to set restrictions for loading scripts, objects, styles, images, media, frames, and fonts. By default, CSP also blocks eval(), inline scripts, inline CSS styles, and data URIs. It further allows you to provide a reporting URL to help identity possible emerging threats which is especially helpful for DOM-Based XSS attacks that are normally difficult to detect.

At this point, CSP specifications vary between browsers and support in mobile browsers is somewhat limited. On top of that, many web sites use inline code and make use of the eval() function so initially full implementation is likely not possible. Still, even the most basic CSP implementation can cripple a large number of client-side attack vectors and is well worth the effort to implement.

In short, CSP is already a powerful tool for limiting client-side attacks and it looks to be an even more promising technique as the specification evolves.

Here is a demo page that shows how CSP can affect page elements (hint: use your browser’s developer panel to view messages about blocked elements).

Some tips for using CSP:

  • Use SSL to prevent tampering of CSP headers.
  • Set very specific origins for each type of resource rather than using the * wildcard.
  • The browser will fall back to a default source if none is provided so use the default-src directive to set a blanket policy.
  • When using a reporting URL, make sure you take precautions to prevent that from introducing new vulnerabilities.

Other useful CSP Resources:

 

Cross-Origin Resource Sharing (CORS)

CORS doesn’t directly increase security, it actually reduces it. But it allows you to do so in a controlled and granular manner to limit exposure to attack. Normally same-origin policy only allows a script to access resources from the same host. For those times when you need to access resources from other hosts, you can limit access for your specific needs. The CORS-related headers are:

Access-Control-Allow-Origin
Access-Control-Allow-Credentials
Access-Control-Expose-Headers
Access-Control-Max-Age
Access-Control-Allow-Methods
Access-Control-Allow-Headers
Access-Control-Request-Method
Access-Control-Request-Headers
Origin

Tips for using CORS:

  • Never allow GET or OPTIONS requests to modify data.
  • Use the Access-Control-Allow-Origin headers as needed on a per-resource basis, rather than for the entire domain.
  • Only use Access-Control-Allow-Origin: * for publicly-accessible static resources that do not include sensitive information or modify data.

 

HTTP Strict Transport Security (HSTS)

HSTS uses the Strict-Transport-Security header to tell the web browser to only server resources from your site using HTTPS. Once specified, the HSTS header will cause the browser to upgrade all HTTP requests to HTTPS. It also prevents users from overriding invalid certificate errors for your site.

Tips for using HSTS:

  • The browser only needs to see the header once so you generally only need to set it for documents, not for the resources loaded by those documents.
  • Because the browser will only honor HSTS headers delivered via HTTPS, you must still configure your site to redirect all resources to HTTPS so the browser gets the HSTS header.

 

X-Frame-Options

The X-Frame-Options header allows you to direct browsers not to embed a document in a frame or to only embed it from documents of the same origin. The purpose of this is to reduce exposure to clickjacking attacks. Although this header does not guarantee your content will never appear in a frame on all clients, all major web browsers will honor this and browsers are the primary mechanism for clickjacking attacks. Most web sites should implement this at least for documents of the same origin.

The X-Frame-Options header provides for two options: Deny, to block all frame embeds, or SameOrigin, to only allow the page to embed in frames from the same server.

 

IE-Specific Headers

Internet Explorer provides support for additional security headers not found in other web browsers. Although these protections only protect a portion of your user base, even a small decrease of attack surface is worth the effort.

X-Content-Type-Options: NoSniff

One IE-specific feature, the X-Content-Type-Options: nosniff header prevents the browser from improperly identifying MIME types. For example, it will not load a script unless the server specifically identifies it as a script through the Content-Type response header.

X-Download-Options: NoOpen

This header allows a web server to specify that a document should never be loaded in the browser and must be saved locally to view. This helps to prevent the browser from loading scripts and other potentially dangerous resources.

X-XSS-Protection

The X-XSS-Protection header,  allows you to control behavior of the IE XSS Filter which protects from a number of reflective XSS attacks. Normally XSS protection is enabled by default in Internet Explorer, but this header provides  you with two additional options to control this filter.

First, if the filter causes problems with your web site and your site is already well-tested for XSS vulnerabilities, you can disable the filter using this header:

X-XSS-Protection: 0

Second, you can make the filter more aggressive by completely blocking a page when XSS is detected rather than trying to simple filter out the attack. You can enable this with the header:

X-XSS-Protection: 1; mode=block

 

Implementation Strategy

The best thing about most of these security headers is that you generally do not need a full implementation to benefit from some of their protections. Although the ideal solution would be to coordinate security header implement between developers and web administrators, developers can implement many of these headers on a limited basis via code. Even if you are unable to implement security headers right now, it is important for developers to be aware of these features to lay the groundwork for future implementation. Moving away from inline scripts and styles, for example, is a major task that must be undertaken to prepare full CSP deployment. Start making those changes now because these security headers will make eventually play a major role in client security.


The Pathetic Reality of Adobe Password Hints

AdobeThe leak of 150 million Adobe passwords in October this year is perhaps the most epic security leak we have ever seen. It was huge. Not just because of the sheer volume of passwords, but also because it’s such a large dump from a single site, allowing for a much better analysis than earlier sets. But there’s something unique about the Adobe dump that makes it even more insightful–the fact that there are about 44 million password hints included in this dump. Even though we still haven’t decrypted the passwords, the data is extremely useful

One thing I have pondered over the years in analyzing passwords is trying to figure out *what* the password is. I can determine if the password contains a noun or a common name, but I can’t always determine what that noun or name means to the user.

For example, if the password is Fred2000, is that a dog’s name and a date? An uncle and his anniversary? The user’s own name and the year they set up the account? Once we know the significance of a password we gain a huge insight into how users select passwords. But I have never been able to come up with a method to even remotely measure this factor. Then came the Adobe dump.

The sheer amount of data in the Adobe dump makes it a bit overwhelming and somewhat difficult to work with. But if you remove the least common and least useful hints the data becomes a bit more manageable. Using a trimmed down set of about 10 million passwords, I was able to better work with the data to come up with some interesting insights.

Just glancing at the top one hundred hints, several patterns immediately become clear. In fact, what we learn is that a large percentage of the passwords are the name of a person, the name of a pet, the name of a place, or an important date.

Take dates for example. Consider the following list of top date-related hints:

Hint Total Note
birthday 29425
bday 17697
date 15272
birth 14956
DOB 13109
niver 9484 Spanish: Anniversary (short for aniversario)
fecha 8899 Spanish: Date
naissance 7892 French: Birth
anniversary 6959

In all, there are about 420,000 passwords with a date-related hint which represents about 3.6% of the passwords in the working set.

We see similar trends with dog names which account for 375,000 passwords or 3.2% of the total (plus another 120,00 that mention “pet”):

Hint Total Note
dog 70550
dogs name 13559
my dog 9780
dog’s name 8191
dog name 8187
perro 8000 Spanish
hund 7185 German, Danish, Swedish,  Norwegian
first dog 5653
chien 5542 French
doggy 5184

One interesting insight offered here is something we already know but find difficult to measure: password reuse. Surely a large percentage of these users have the same password across multiple sites, but it is interesting to see that about 361,000 users (or 3.11%) state this fact in their password hints:

Hint Total Note
same 44565
password 14634
always 13329
la de siempre 8559 Spanish: as always or the usual
same as always 8289
usual password 5277
same old 5111
siempre 4163 Spanish: always
normal password 3898
my password 3022

Keep in mind that these are just those passwords that admit to reuse in the hint. The number of passwords actually in use across multiple sites certainly is much greater than this.

Looking at the three lists above, we see that nearly 10% of the passwords fall into just these 3 categories. Adding names of people and places will likely account for 10% more.

So what did we learn by analyzing these hints?  First, that you should never use password hints. If users forget their password, they should use the password reset process. Second, that decades of user education has completely failed. No matter how much we advise not to use dates, family names or pet names in your passwords and no matter how much we tell people not to use the same passwords on multiple sites, you people will just do it anyway.

This is why we can’t have nice password policies.

 

 

9 Ways to Restrain the NSA

Keith Alexander

With U.S. Government surveillance being a hot news issue lately, several members of Congress have stepped up and started working on bills to place limits on NSA powers. Although these are admirable attempts, most proposals likely won’t have much affect on NSA operations. So of course I thought I’d propose some points that I think at a minimum any surveillance bill should cover.

1. No backdoors or deliberate weakening of security

The single most damaging aspect of recent NSA revelations is that they have deliberately weakened cryptography and caused companies to bypass their own security measures. If we can’t trust the security of our own products, everything falls apart. Although this has had the side-effect of causing the internet community to fill that void, we still need to trust basic foundations such as crypto algorithms.

Approaching a company to even suggest they weaken security should be a crime.

Related issue: the mass collecting of 0-day exploits. I have mixed feelings on how to limit this, but we at least need limits. The fact is that government law enforcement and military organizations are sitting on tens of thousands of security flaws that put us all at risk. Rather than reporting these flaws to vendors to get them fixed and make us all secure, they set these flaws aside for years waiting for the opportunity to exploit them. There are many real threats we all face out there and it is absurd to think that others can’t discover these same flaws to exploit us. By sitting on 0-days, our own government is treating us all as their personal cyberwar pawns.

2. Create rules for collection as well as searches

We saw how the NSA exploited semantics to get away with gathering personal records and not actually calling it a search. They got away with it once, we should never allow that excuse again. Any new laws should clearly define both searches and collection and have strict laws that apply to both.

3. Clear definition of national security

Since the Patriot Act, law enforcement agencies have stretched and abused the definitions of national security and terrorism so much that almost anything can fall under those terms. National security should only refer to imminent or credible domestic threats from foreign entities. Drug trafficking is not terrorism. Hacking a school computer is not terrorism. Copyright infringement is not terrorism.

4. No open-ended gag orders

Gag orders make sense for ongoing investigations or perhaps to protect techniques used in other investigations but there has to be a limit. Once an investigation is over, there is no valid reason to indefinitely prevent someone from revealing basic facts about court orders. That is, there’s no reason to hide this fact unless your investigations are perhaps stretching the laws.

5. No lying to Congress or the courts

“There is only one way to ensure compliance to laws: strong whistleblower protection. We need insiders to let us know when the NSA or other agencies make a habit of letting the rules slide.”

It’s disconcerting that I would even need to say this, but giving false information to protect classified information should be a crime. The NSA can simply decline to answer certain questions like everyone else does when it comes to sensitive information. Or there’s always the 5th amendment if the answer to a question would implicate them in a crime.

6. Indirect association is not justification

Including direct contacts in surveillance may be justified, but including friends of friends of friends is really pushing it and includes just about everyone. So there’s that.

7. No using loopholes

The NSA is not supposed to be spying on Americans but they can legally spy on other countries. The same goes for other countries, they can spy on the US. If the NSA needs info on Americans, they can just go to their spying partners to bypass any legal restrictions. Spying on Americans must include getting information from spy partners.

And speaking of loopholes, many of the surveillance abuses we have seen recently are due to loopholes or creative interpretation of the laws. Allowing the Government to keep these interpretations secret is setting the system up for abuse. We need transparency for loopholes and creative interpretations.

8. No forcing companies to lie

Again, do I even have to say this? The NSA and FBI will ultimately destroy the credibility of US companies unless the law specifically states that people like Mark Zuckerberg can’t come out and say they don’t give secret access to the US government.

9. Strong whistleblower immunity

We saw how self regulation, court supervision, and congressional oversight has overwhelmingly failed to protect us from law enforcement abuses. There is only one way to ensure compliance to laws: strong whistleblower protection. We need insiders to let us know when the NSA or other agencies make a habit of letting the rules slide.

Whistleblowers need non-governmental and anonymous third party protection. We need to exempt these whistleblowers from prosecution and provide them legal yet powerful alternatives to going public. You’d think that even the NSA would prefer fighting this battle in a court over having to face leaks of highly confidential documents. In fact, I think that the only reason to oppose these laws if you actually have something to hide. The NSA’s fear of transparency should be a blaring alarm that something is horribly wrong.

The NSA thinks that public response has been unfair and will severely limit their ability to protect us. What they don’t seem to understand is the reasons we have these limits in the first place. When the NSA can only focus on foreign threats, they have no interest in domestic law enforcement. Suspicionless spying is incompatible with domestic law enforcement and justice systems.

The greatest concern, however, is the unchecked executive and military power. The fact that there has been so much for Snowden to reveal demonstrates the level of abuse. Unfortunately, the capabilities are already in place so even legal limits are largely superficial and self-enforced. It would be trivial to ignore those laws in a national security emergency.

I cringe at the thought of becoming one of those people warning others to be afraid, but that is why we put limits on the government, so we know we don’t ever have to be afraid. We solve the little problems now so we don’t have to face the big problems later. We understand the need for surveillance, we just need to know when the cameras point at us.

 

 

Fingerprints and Passwords: A Guide for Non-Security Experts

iphoneToday Apple announced that the iPhone 5S will have a fingerprint scanner. Many of us in the security community are highly sceptical of this feature, while others saw this as a smart security move. Then of course there are the journalists who see fingerprints as the ultimate password killer. Clearly there is some disagreement here. I thought I’d lay this out for those of you who need to better understand the implications of using fingerprints vs or in addition to passwords.

Biometrics, like usernames and passwords, are a way to identify and authenticate yourself to a system. We all know that passwords can be weak and difficult to manage, which makes it tempting to call every new authentication product a password killer. But despite their flaws, passwords must always play some role in authentication.

The fact is that while passwords do have their flaws, they also have their strengths. The same is true with biometrics. You can’t just replace passwords with fingerprints and say you’ve solved the problem because you have introduced a few new problems.

To clarify this, below is a table that compares the characteristics of biometrics vs passwords, with check marks where one method has a clear advantage:

Passwords Biometrics
Difficult to remember Don’t have to remember 
Requires unique passwords for each system Can be used on every system 
Nothing else to carry around Nothing else to carry around
Take time to type Easy to swipe/sense 
Prone to typing errors Prone to sensor or algorithm errors
Immune to false positives  Susceptible to false positives
Easy to enroll  Some effort to enroll
Easy to change  Impossible to change
Can be shared among users 1  Cannot be shared 
Can be used without your knowledge Less likely to be used without your knowledge 
Cheap to implement  Requires hardware sensors
Work anywhere including browsers & mobile  Require separate implementation
Mature security practice  Still evolving
Non-proprietary  Proprietary
Susceptible to physical observation Susceptible to public observation
Susceptible to brute force attacks Resistant to brute force attacks 
Can be stored as hashes by untrusted third party  Third party must have access to raw data
Cannot personally identify you  Could identify you in the real world
Allow for multiple accounts  Cannot use to create multiple accounts
Can be forgotten; password dies with a person Susceptible to injuries, aging, and death
Susceptible to replay attacks Susceptible to replay attacks
Susceptible to weak implementations Susceptible to weak implementations
Not universally accessible to everyone Not universally accessible to everyone
Susceptible to poor user security practices Not susceptible to poor practices 
Lacks non-repudiation Moderate non-repudiation 
1 Can be both a strength and a weakness

 

What Does This Tell Us?

As you can see, biometrics clearly are not the best replacement for passwords, which is why so many security experts cringe when every biometrics company in their press releases claim themselves as the ultimate password killer. Biometrics do have some clear advantages over passwords, but they also have numerous disadvantages; they both can be weak and yet each can be strong, depending on the situation. Now the list above is not weighted–certainly some of the items are more important than others–but the point here is that you can’t simply compare passwords to biometrics and say that one is better than the other.

However, one thing you can say is that when you use passwords together with biometrics, you have something that is significantly stronger than either of the two alone. This is because you get the advantages of both techniques and only a few of the disadvantages. For example, we all know that you can’t change your fingerprint if compromised, but pair it with a password and you can change that password. Using these two together is referred to as two-factor authentication: something you know plus something you are.

It’s not clear, however, if the Apple implementation will allow for you to use both a fingerprint and password (or PIN) together.

Now specifically talking about the iPhone’s implementation of a fingerprint sensor, there are some interesting points to note. First, including it on the phone makes up for some of the usual biometric disadvantages such as enrollment, having special hardware sensors, and privacy issues due to only storing that data locally. Another interesting fact is that the phone itself is actually a third factor of authentication: something you possess. When combined with the other two factors it becomes an extremely reliable form of identification for use with other systems. A compromise would require being in physical possession of your phone, having your fingerprint, and knowing your PIN.

Ultimately, the security of the fingerprint scanner largely depends on the implementation, but even if it isn’t perfect, it is better than those millions of phones with no protection at all.

There is the issue of security that some have brought up: is this just a method for the NSA to build a master fingerprint database? Apple’s implementation encrypts and stores fingerprint locally using trusted hardware. Whether this is actually secure remains to be seen, but keep in mind that your fingerprints aren’t really that private: you literally leave them on everything you touch.