Chapter 6 · Part 3

Which chip when

Three chips, one spectrum. They differ along a single axis we've been circling the whole course: how specialized are you willing to be? Give up flexibility and you gain matrix-math throughput; keep flexibility and you can run anything. There's no winner — only a right tool for each job.

Scroll to place the three chips on the map and read off when to use each.

Plot flexibility against matrix-math throughput.

scroll↓

A rough rule of thumb

CPU — general application logic, orchestration, databases, serving small models, and anything branchy or latency-sensitive. It's always in the loop, even in an AI system, doing the non-matrix work.
GPU — the default for training and running most models. Massively parallel yet flexible enough for new architectures and research. If you're doing ML and unsure, it's almost always a GPU.
TPU (and other dedicated AI accelerators) — very large matrix workloads at scale, where squeezing out maximum throughput per watt and per dollar across a whole data center is worth giving up flexibility.

In practice, real systems mix them: a CPU coordinates and handles logic while it hands the matrix mountain to a GPU or TPU.

The bottleneck nobody mentions: moving data

Here's the twist that surprises people. Past a point, the limit usually isn't how many multiplications a chip can do — it's whether you can deliver the data fast enough to keep those units busy. Weights and activations have to travel from memory to the arithmetic units, and that movement is slow and power-hungry compared to the math itself. This is the memory wall.

You now know the difference

Strip away the branding and the whole picture is one workload meeting three designs:

AI is mostly matrix math — trillions of multiply-adds.
Those adds are independent, so parallelism is the way to go fast.
A CPU has a few brilliant, flexible cores (great latency, modest throughput).
A GPU has thousands of simple cores running one instruction on many data (SIMD).
A TPU hard-wires a systolic array that does almost nothing but matmul.
And across all three, moving the data is often the real fight.

So the next time someone says AI "needs GPUs," you'll know exactly why: it's a mountain of identical, independent multiply-adds, and parallel chips with fast memory are simply the best way to climb it.

Thanks for reading. If you enjoyed this, the other courses cover how images, language, meaning, recommendations and self-driving cars work under the hood.