CAPTCHA sox, let's fool it!

You know CAPTCHA sox!

It means “Completely Automated Public Turing test to tell Computers and Humans Apart.” There are various types, such as text-based or image-based questions or even quick puzzles.

The idea is to ask something a machine cannot answer correctly to ensure it’s a human being trying to submit a form and not a spam robot.

Here is a basic CAPTCHA:

CAPTCHA is outdated

It’s a huge pain for the users and terrible for accessibility, but, at least, it used to be efficient to prevent programs from auto-filling forms. I say “was”, because it’s not the case anymore.

With the rise of Machine Learning and computer vision, it’s relatively easy for spammers to successfully pass such tests.

A pretty basic but still powerful approach would be to download hundreds of examples of images used in CAPTCHA, solve them manually, and train a model with the results to let the machine learn how to do it.

There are tones of free, open-source libraries that can generate a CAPTCHA, so getting the material is not the most challenging. Besides, the code itself is pretty straightforward.

Of course, some CAPTCHA are way more sophisticated than others, but if we consider the most widely used ones, it’s often letters or a mix of numbers and letters, so it’s even easier to replicate. You only need the alphabet “ABCDEFGHIJKLMNOPQRSTUVWXYZ” and integers “0123456789”. Then you can generate thousands of images with random combinations.

It’s a practical exercise you can try if you want to dive into deep learning, but let’s say you’re lazy or don’t have time for this. GitHub has hundreds of open-source CAPTCHA solvers literally.

Note that it’s not limited to our basic example with letters and numbers. Models can map complex images and more complex problems.

The success rate is not 100%, but it’s pretty high (~ 80% on average), including ReCAPTCHA. Even CAPTCHA solving services such as Anti-Captcha and browser extensions can be used to remove the hassle.

Why fool CAPTCHA and why you should fail sometimes

You may wonder why the heck do we need to bypass such protection, but it’s not always illegal. While spammers and mass mailers love such techniques, there could be legitimate reasons to use a captcha solver, for example:

  • you’re not a spammer, and you hate CAPTCHA
  • you have no time for poorly implemented CAPTCHA that fail two times out of three without reason
  • you mask your real IP with a VPN, and many online services, such as Google Search, block these IPs systematically

Because CAPTCHA solutions have too many constraints, such as cultural shifts and other significant differences from one country to another, the tests must be cross-cultural and cross-language, which is complex to implement.

Besides, it’s not that robots are so unique, but more humans than just sock at solving a CAPTCHA. Big companies such as Amazon, Google, or Facebook are making it worse, IMHO, because CAPTCHA is more and more sophisticated to the point that it’s sometimes impossible to solve it on purpose, like a reverse way to determine whether you are human or not.

Indeed, such CAPTCHA is supposed to be solved by robots only 😈.

See Also