Chapter 4 · Part 2

The GPU: thousands of simple workers

The GPU (graphics processing unit) was built to draw video games: shading millions of pixels, where every pixel runs the same little calculation on different data. That is precisely the crowd-of-schoolchildren workload — and it turns out neural-network matrix math has the exact same shape. So the chip designed for Crysis became the chip that trains GPT.

Where a CPU has a few big cores, a GPU has thousands of small ones, all built to run the same instruction across a huge array of data at once.

Scroll to watch one instruction sweep across thousands of cores.

A GPU is a vast grid of simple cores — thousands of them.

scroll↓

SIMD: one instruction, many data

The GPU's core trick is SIMD — Single Instruction, Multiple Data. Instead of each core fetching and decoding its own instructions (expensive, CPU-style), one instruction is broadcast to a whole group of cores that apply it to different data simultaneously. You amortize all the control overhead across thousands of multiply-adds. For matrix math — the same operation on every element — it's a near-perfect fit.

Why GPUs won AI

Two things made the GPU the default AI chip:

The workload matched the hardware. Researchers realized training neural nets is dense matrix multiplication, and GPUs already did that better than anything affordable.
Memory bandwidth. Thousands of cores starve without data, so GPUs pair their cores with very high-bandwidth memory to keep them fed — often the deciding factor in real performance.

The result: a GPU can be tens to hundreds of times faster than a CPU on matrix math, while still being programmable enough to run almost any model and any new research idea. That flexibility is why it's the workhorse of the whole field.

Where we're headed

A GPU is wonderfully general for a parallel chip — it can do graphics, simulations, and AI. But if you knew your chip would do nothing but matrix multiply, you could specialize even harder and strip away everything else. That's the bet Google made. Next: the TPU.