Chapter 13 · Part 5

Where this shows up

We just built the whole machine: pixels become tensors, convolutions extract features, a network learns which features matter, and gradient descent tunes the weights. It's tempting to file that under "neat theory." But this exact pipeline — pixels → features → prediction — is quietly running all around you, right now, doing real work in hospitals, phones, factories and cars.

This closing chapter is a tour. The remarkable part isn't any single app; it's that they're all the same idea from the last twelve chapters, pointed at a different problem.

Scroll through where computer vision actually earns its keep.

Healthcare: a CNN scans X-rays, MRIs and retina photos, flagging tumors and disease — often spotting them earlier than the human eye.

scroll

The same pipeline, a different job

Look past the surface and every one of these is the staircase you climbed:

  • The input is pixels — a scan, a face, a frame from a camera, a satellite tile, a photo of a page.
  • Convolutions turn those pixels into features — edges, textures, parts, whole objects.
  • A final layer maps those features to whatever the task needs: a label ("tumor" / "healthy"), a box around each car, a match score, a line of text.

Swapping the task mostly means swapping the labels you train on and the shape of that last layer. The seeing machinery underneath barely changes. That's why one breakthrough in image models ripples out to medicine, farming and cars at once.

Classify, detect, segment

Most vision applications are one of three flavors, all built on the same features:

  • Classification — one label for the whole image ("is this a melanoma?"). The plain CNN from Chapter 10.
  • Detectionwhere are the objects? Boxes around each car or face. This is the backbone of self-driving perception.
  • Segmentation — label every pixel ("this pixel is road, that one is tumor"). Used heavily in medical imaging and mapping.

The other side of the coin

Because it's so capable and so cheap to deploy, computer vision also raises real questions worth keeping in view:

  • Face recognition enables convenient unlock — and pervasive surveillance.
  • Bias: a model is only as fair as its training images; underrepresented groups get worse results, which matters a lot in medicine and policing.
  • Over-trust: these systems are confident even when wrong, so high-stakes uses (diagnosis, driving) keep a human in the loop.

Understanding how the pipeline works — which is exactly what this course gave you — is what lets you judge where to trust it and where to be careful.

That's the course

You started with a single pixel and ended knowing how a machine turns a grid of numbers into medical diagnoses, self-driving cars and the photo search in your pocket. Same idea, all the way up.

If you enjoyed this, the other courses go deeper on the machines this chapter touched — how images are generated, how self-driving cars see, and how the chips that run all this actually work.