What is CAPTCHA, and how does it work?

Latest

Auth & identity

November 17, 2022

Author: Stytch Team

CAPTCHA flows have long provided an essential security tool for developers as they work to protect their platforms and users from cyber attacks. Specifically, modern CAPTCHAs can help apps and websites tell the difference between the good, human users they want to let in and the bad, non-human users (namely, bots) they want to keep out.

In this guide, we explain exactly what CAPTCHA is, how it works, and how modern authentication providers are adapting the CAPTCHA system to keep up with evolving cyber threats — and stop malevolent bots in their tracks.

What is a CAPTCHA?

CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. At the most basic level, a CAPTCHA is a computer-generated security flow meant to distinguish real users from automated web traffic like bots. It’s a form of challenge-response authentication, where a user must answer a question or solve a puzzle in order to be verified and gain access to an online account.

The “Turing” in CAPTCHA refers to Alan Turing, a renowned computer scientist and mathematician who, among other achievements, devised tests that could differentiate between the behavior of human beings and that of artificial intelligence (AI). CAPTCHA challenges are based on this work. They’re designed to elicit the kinds of sensory and cognitive responses that non-human bots could never reproduce.

How does CAPTCHA technology work?

Bots are programmed to carry out specific, repetitive tasks within a narrowly defined scope. For example, they might be set up to identify common customer questions and provide automated answers in an ecommerce chatbox or to comb the internet for the lowest price on a given product.

Simply put, bots cannot “think” or “feel” in any human sense. While AI has come a long way, resulting in sophisticated bots that can process complex images and sounds, those crawling the internet with malicious intent generally aren’t built with these abilities. Their pre-programmed nature prevents them from being able to spontaneously react to or reason through the sorts of challenges presented in a CAPTCHA flow.

In short, a CAPTCHA works by presenting these bots with a test or challenge that human users can solve, but bots cannot.

Let’s take a closer look at what such challenges involve.

Types of CAPTCHAs

CAPTCHA tests can take many different forms. Some of the most common challenges include:

  1. Text-based CAPTCHAs: Users are asked to type a word or a series of letters presented to them as distorted text. This distorted text may and numbers written in a squiggly or strikethrough font or displayed against a grainy background. The idea is that web-crawling bots cannot “read” or recognize these figures among the visual noise.
  2. Image-recognition CAPTCHAs: Users are presented with an array of images and asked to pick out familiar objects like fire hydrants, crosswalks, or trains. Since automated bots cannot understand the concept of a crosswalk or train or what it references in the real world, they are unable to match the prompt to related pictures.
  3. Audio-based CAPTCHAs: Users type in a word, sequence, or phrase spoken by a muddled or distorted voice. The idea is that crawler bots cannot “listen” to distinguish between and interpret competing sounds. This can also be a good alternative CAPTCHA test for visually impaired users.
  4. Question-based CAPTCHAs: Users must solve a simple math or word problem that is intended to test comprehension, rather than outright intelligence.

Why these challenges work

While an automated, web-crawling bot could theoretically be engineered to exhibit or mimic these capabilities, it would be an incredibly difficult, time-consuming, and expensive undertaking for a hacker. What’s more, a CAPTCHA test cannot be reused once it’s solved. That means, even if a bot does correctly decipher a CAPTCHA puzzle, it has to repeat the process thousands of times to be effective, neutralizing the speed and automation that makes bots so attractive in the first place.

CAPTCHAs have been around for a while, first used and distributed commercially in the early 2000s. Since then, they’ve undergone a few adjustments and advancements — most notably, in the form of reCAPTCHAs.

What is a reCAPTCHA (or Google reCAPTCHA)?

Computer scientist Luis von Ahn, part of the original team that invented the CAPTCHA system, grew increasingly concerned with how much time internet users were spending on CAPTCHA puzzles. In response, he developed reCAPTCHA, which was ultimately sold to Google in 2009.

At first, reCAPTCHA tests were intended to make CAPTCHA efforts more productive — for example, to assist with the digitization of books and other printed texts. To that end, instead of the random words and characters used in text-based traditional CAPTCHAs, reCAPTCHAs use actual printed words and phrases that book-scanning computers are having trouble recognizing.

In a reCAPTCHA, a user types in these words just as they would in a CAPTCHA challenge, simultaneously verifying their humanity and identifying the letters in an unidentified word. Typically, a reCAPTCHA features two words side by side, one known and one unknown to the system. The known word serves as a screen to ensure accuracy, while the unknown word is passed to many different users to be solved — so they’re ultimately working together to confirm its identity.

Over time, however, and with advances in optical character recognition (OCR) technology, reCAPTCHAs have become more focused on limiting levels of friction in the user experience.

For instance, Google rolled out what they call “noCAPTCHA reCAPTCHAs,” where users can prove they’re human without having to solve a CAPTCHA puzzle at all. The most common version involves a checkbox that allows users to confirm they’re not a robot with a single click.

Behind the scenes, the checkbox flow is actually analyzing the user’s IP address, plugins, browser history, behaviors (including actions like type rates, mouse clicks, and scrolls), and more to confirm that they’re not a bot.

More recently, Google further streamlined the verification process by removing challenges and checkboxes altogether. With invisible reCAPTCHAs, legitimate users are able to access their accounts without any disruption, while suspicious users and bots will still face a CAPTCHA or reCAPTCHA test.

Why are CAPTCHAs important?

Today, humans make up less than 40% of all internet traffic. The remaining 60%+ is bots. And while bots can be used benevolently — as in the ecommerce examples above — there are also malicious bots used by hackers to automate and amplify their cyber attacks. In fact, some studies estimate that more than half of the bots currently working online (representing about 39% of all web activity) are employed maliciously.

This can have disastrous consequences for individuals and businesses alike. For example, hackers can use bots to crawl the web, intercept users’ passwords, and take over their bank accounts in credential stuffing attacks. Hackers can also mobilize bots to, say, buy out concert tickets and resell them at inflated prices on the secondary market.

Given the prevalence of bots and the risks they can pose to legitimate users and organizations, it’s not surprising that users are often asked to prove their personhood when they want to access sensitive online accounts.

The problems with CAPTCHA

On the surface, CAPTCHA tools have a lot going for them. They’re relatively simple for users to solve, and they’re easily scalable. For sites and apps that are processing thousands (if not millions) of authentication attempts, computer-generated CAPTCHA tests offer a quick and affordable way for developers to weed out bots from user traffic.

But the CAPTCHA system also has its downsides. First of all, since it requires users to complete a specific action, it introduces substantial friction to sign up and login flows, potentially driving heightened rates of dropoff and churn.

More concerningly, CAPTCHA and reCAPTCHA challenges aren’t foolproof from a security standpoint. For one thing, it seems AI has been able to solve basic CAPTCHA tests for years. What’s more, an entire cottage industry of human-based fraud has emerged that enables hackers (and their bots) to clear CAPTCHA puzzles and breach users’ accounts. We’ll cover that next.

Understanding CAPTCHA fraud

Unfortunately, hackers have found a convenient and reliable loophole in the CAPTCHA system. It stems from a critical design flaw in the public key architecture, or the code used to encrypt backend data.

Basically, every major CAPTCHA system exposes its public key in the webpage source code. That means bots (and the hackers behind them) can easily scrape the site key value and submit it to a third party to be solved.

Dozens, if not hundreds or thousands, of CAPTCHA-solving-as-a-service companies — also known as CAPTCHA farms — employ low-paid workers around the world to manually solve CAPTCHA challenges one by one. Ultimately, they supply bots and hackers (their clients) with all the information they need to bypass the CAPTCHA system and gain access to users’ accounts.

These fraudulent tactics work because CAPTCHAs can be solved remotely. In other words, a CAPTCHA flow doesn’t require whoever is solving the puzzle to be working in the same browser that’s used to submit the solution.

Worst of all, CAPTCHA farm sites look and feel a lot like legitimate SaaS platforms, so it can be tough to tell the difference. In fact, they operate similarly to SaaS platforms using specialized APIs.

Let’s dive deeper into how CAPTCHA fraud typically plays out:

1. A bot scrapes the public site key from a CAPTCHA challenge and sends it to a CAPTCHA farm.

A CAPTCHA farm really only needs two parameters to solve a CAPTCHA challenge: the related web page’s URL and its public site key value. Since CAPTCHA systems publicly expose their site keys in the source code, it’s easy for bots to scrape the value and submit it along with the URL to a CAPTCHA farm’s API to kick off an account breach.

2. A human worker at the CAPTCHA farm solves the CAPTCHA challenge.

Any worker at the CAPTCHA farm can then use the URL and site key value sent by the bot to load and solve the CAPTCHA puzzle. From there, they receive a response code (also known as a g-captcha-response), which appears as a string of code. This lets their bot/hacker client know that the CAPTCHA challenge has been solved.

3. The bot uses the CAPTCHA response code to solve the puzzle and breach the user’s account.

Finally, the CAPTCHA farm sends the g-captcha-response to the bot, which uses it to solve the CAPTCHA challenge. Since the bot has a code representing the correct solution, the CAPTCHA system recognizes it as human and grants it access.

How can developers fight CAPTCHA fraud?

With such vulnerable public-key architecture at its core, many developers wonder how they can improve on the CAPTCHA system to outsmart hackers, bots, and CAPTCHA farms and bolster their app’s security.

Some companies — like the crypto exchange platform Binance — have taken on substantial costs and workloads to avoid traditional CAPTCHA design flaws by building their own, customized CAPTCHA flows in-house.

But thanks to modern authentication strategies, development teams don’t have to invest nearly that amount of time, effort, and resources to fortify their flow and strike an effective balance between security, efficiency, and scalability. In fact, the best auth providers on the market today remove CAPTCHA security loopholes through easy-to-use, turnkey products.

Introducing Strong CAPTCHA

Solutions like Stytch’s Strong CAPTCHA puzzles eliminate the public-key problem by removing the public key altogether from client-side browser environments — leaving bots with nothing to scrape and send to a CAPTCHA farm.

Strong CAPTCHA is image-based and can be added via API or SDK to any auth flow in a matter of hours, and they work with any web-based or native mobile platform. Meanwhile, the end user’s journey and experience remains the same. Users encounter the same simple CAPTCHA puzzles they’re already familiar with, just without the standard-issue risks and vulnerabilities.

The bottom line

Conventional CAPTCHA flows can play a fundamental role in app security and form part of a robust authentication strategy — but they’re far from bulletproof. Not only have bots learned to solve basic CAPTCHA challenges outright, but proliferating CAPTCHA farms have also established viable, human-based fraud vectors that can outmaneuver the CAPTCHA system.

That’s where innovative solutions like Stytch’s Strong CAPTCHA come in. With such tech-forward auth tools, developers can avoid the design and security flaws that have historically put CAPTCHAs at risk — and cut hackers and fraudsters off at the roots.

SHARE

Get started with Stytch