Filling out a web form without also having to pass a CAPTCHA test nowadays is pretty rare. CAPTCHAs weren’t really that annoying to me when they were more of a rare occurrence but I have been finding myself more and more bothered with them lately, especially because my success rate in entering the correct letters seems to be around 75%. There are some CAPTCHAs I have encountered lately that take me several tries to get right. And when I get annoyed at some security measure my first thought is to try to break it.

It turns out that while programmers have spent much time in making their CAPTHCHAs visually complex, they have largely failed on the core security of these things. I think part of the problem is that web developers in general have a hard time correctly enforcing session state. I see the same mistakes being made over and over so I thought I would take a few CAPTCHA implementations and analyze them.

But before I do that, I need to step back a bit to remind ourselves why we use CAPTCHAs. They basically protect a web site from automated signups, submissions, or scraping of data. But by protecting themselves, they also protect the community as a whole, making it harder for people like spammers to exploit the rest of us.

When you build a CAPTCHA you need to take a string of characters, create a visually obscured picture of those characters, then have some way to verify that the user typed in the correct string. It sounds pretty easy but it is actually hard to do right.

When you pick a string of characters—the CAPTCHA code—you need to make sure it is unique for that one user session. Somehow you need to track that code on both the client and the server side, and you need to make sure the end user can’t do anything to influence what code you pick.

Reddit.com

On reddit.com’s submit page, they build a string of six capital letters to use in their CAPTCHA. I actually like the reddit.com CAPTCHA because it’s just plain easier to read–I get it right 100% of the time. I doubt this CAPTCHA would stand up against an OCR attack, but on the human side it really works well.

What’s interesting is how they chose to implement the CAPTCHA. Looking at the html source, I see that the CAPTCHA image points to a static URL with a great—and telling—alt tag:

<img alt=”i wonder if these things even work” src=”/captcha/HVXqqL4IRRTCmi1aiyRoWp1phn4wc5T9.png” />

What’s interesting about this is that if I browse to that file http://reddit.com/captcha/HVXqqL4IRRTCmi1aiyRoWp1phn4wc5T9.png in another browser or even on another pc, I see the exact same letters. This seems to be true for at least a few hours after initially viewing the image.

Now playing around, I found that I can browse to any filename in that directory, such as http://reddit.com/captcha/doesthisthingwork.png and still see a CAPTCHA with six upper-case letters. When I open the image on another system, I get the same letters, just warped differently. This all tells me that they dynamically generate these images based on the png filename. Since those letters are the same when viewed in any browser anywhere in the world, the CAPTCHA is not based on anything unique to a user’s session. That means that the same CAPTCHA image URL will always match the same string of letters.

This means two things:

1. Someone could build a database of the 308,915,776 (26 upper case letters to the power of 6) valid reddit.com CAPTCHAs and then use that to automate bypassing the CAPTCHA.
2. Someone could cryptoanalyze the png filename and probably figure out the method they use to derive the string. Just looking at it, I see it is the same length as the hex representation of a MD5 sum, which was probably then encoded with a base-64-like algorithm. Having an abundant supply of plaintext certainly makes this easier.

The flaw with this CAPTCHA is that with a fixed URL, anyone could easily build a history of or pre-calculate all possible letter combinations. Would someone even want to pre-calculate the entire keyspace of reddit.com CAPTCHAs? Probably not. But that’s not the point. The point is that there are CAPTCHA implementations out there where someone certainly would be willing to go through that work to compromise it. Especially if it’s a commercial CAPTCHA implementation that many sites use.

Ok it Gets Worse

If you look a bit closer, you can see that you don’t even need to brute force the entire key space. In the source code for reddit.com’s submit page you see this:

<input name=”iden” type=”hidden” value=” HVXqqL4IRRTCmi1aiyRoWp1phn4wc5T9″></input>

So you see, they know you entered the correct code because when you submit the form, it also tells the server which image it displayed to you. To exploit this, all you need to do is make a simple script that changes the client-side code to always submit the same value for “iden” every time—a CAPTCHA you already know the answer to.

Now I certainly am not singling out reddit.com. In fact, it was harder to find a web site that did CAPTCHAs correctly than one that did them wrong. This is by no means a report of vulnerability report for reddit.com’s service. It’s vulnerability report for a large portion of the web.

Look at the similarity of these CAPTCHA URLs:

https://www.paypal.com/cgi-bin/gs_web/dP9XQHKcLmhqkDmyQSfhkr7tq-XGgeyp1sqI6abRxREJjUswqcgzQMR0B1pKiZ0v63ZxAg/secret.jpeg

http://images.slashdot.org/hc/21/7cec28b0d39a.jpg

The way to test for this problem—a fixed CAPTCHA URL—is to copy a CAPTCHA URL and view it by itself in your browser. Hit refresh a few times and see if it changes. Then open up the same URL in another browser or on another computer. Do you get the same exact image? If so, that CAPTCHA is probably vulnerable to this type of attack.

Even if you try the image in another browser and get a different result, the thing still might be vulnerable. Many CAPTCHAs use the session ID to determine the CAPTCHA string. In other words, they save the session ID and match it to a CAPTCHA value. Anyone who authenticates with that session ID will need to know the correct value. However, most web servers allow you to supply your own session ID (i.e., session fixation) which means that someone could make a script to always hit the same URL with the same session ID.

For a CAPTCHA to be effective, you really don’t need to focus so much on how mangled your letters are, but focus on building a web application that implements strong tokens, smart user binding, and overall effective session management.

Session management. I could write a whole book chapter on that.



http://xato.net/bl/2007/08/21/these-captchas-are-just-not-working-out/
Once your website is ready to be launched you should contact a web host that provides a variety of hosting services. The website can be marketed through banner advertising. Business communication is cheap and easy through internet phone but one should a have a good internet connection like yahoo dsl for this purpose. People who have sun certification and cisco certifications like 70-291 and 642-825 are most likely to be hired in such places.



11 Responses to “These CAPTCHAs are just not working out”

  1. […] two on what not to do with a CAPTCHA In my previous post on CAPTCHAs I mentioned that “…you need to make sure the end user can’t do anything to influence […]

  2. DrWizon 24 Aug 2007 at 8:31 pm

    How about checking if the user actually used the keyboard? Won’t it work?

  3. Timon 29 Aug 2007 at 2:25 am

    actually u would not require a detabse as big as 308,915,776. take this senario, u would only realy require a database of one CAtCHA but a maximum of 308,915,776. page refreshs until the one image has been ‘reached’. if u have a data base with 2 randemly chosen catchas the u would only need a apporimatly a maximum of half of (308,915,776) page refreshs since the catchas are displayed randomly. and so on.

  4. mbon 29 Aug 2007 at 7:28 am

    Excellent point, thanks.

  5. […] These CAPTCHAs are just not working out […]

  6. MustLiveon 12 Oct 2007 at 8:43 am

    Nice article

  7. MustLiveon 17 Oct 2007 at 5:39 am

    Captcha bypass test.

  8. MustLiveon 17 Oct 2007 at 5:40 am

    Captcha bypass test 2.

  9. MustLiveon 17 Oct 2007 at 5:40 am

    Captcha bypass test 3.

  10. MustLiveon 17 Oct 2007 at 5:51 am

    Mark!

    Your captcha is vulnerable (as you can see from my Captcha bypass tests). Which is very ironical, because you wrote about captchas security in this article. So you need to find more secure captcha for yourself (without those weaknesses which you mentioned in this post). I’ll write you an email about this hole.

    This captcha (plugin) will be in my Month of Bugs in Captchas. The official announcement of my new project will be very soon.

  11. mbon 30 Nov 2007 at 9:37 am

    a

Trackback URI | Comments RSS

Leave a Reply