Chapter 5 · Part 3

The arms race

A CAPTCHA's security, we said in Chapter 2, comes entirely from machines being bad at something. That's a dangerous foundation, because machine vision did not stay bad. The 2010s deep-learning revolution — the CNNs from our image course — got astonishingly good at exactly the tasks CAPTCHAs relied on: reading warped text and recognizing objects in photos.

So the gap that made CAPTCHAs work started to close. From both directions.

Scroll to watch bots catch up to humans.

Early on, only humans could solve the puzzles — a wide, useful gap.

scroll↓

Machines got good at the test

The milestones piled up fast:

By the mid-2010s, neural networks could solve distorted-text CAPTCHAs with very high accuracy — by some measures better than humans. Distorted text was effectively dead as a defense.
Image recognition is the home turf of CNNs. "Find the traffic lights" is a textbook object-detection task — the kind of thing the self-driving perception models do for a living.
Attackers didn't even need their own models. They could call cheap vision APIs, or route puzzles to human CAPTCHA-solving farms for a fraction of a cent each.

When the machine's ability to solve a puzzle approaches the human's, the test stops discriminating — passing it no longer says much about whether you're a person.

A test that can't rely on the puzzle

This is a genuine dead end for the old approach. You can keep inventing harder visual puzzles, but (a) harder puzzles annoy and exclude real humans, and (b) the next model learns to solve them anyway — often faster than people. Making the challenge harder mostly hurts the humans you're trying to admit.

So the whole field had to change the question. If you can no longer trust whether someone solves the puzzle, maybe you can learn from how they behave on the way to solving it — long before any puzzle appears. That shift is why you so rarely see a grid anymore. Finally: the invisible CAPTCHA.