Chapter 7 · Part 4

Where this shows up

Knowing why AI is a mountain of matrix math — and how CPUs, GPUs and TPUs attack it — isn't just trivia. It explains a chunk of the modern tech economy, and it's a genuinely practical lens the moment you build or budget for anything with AI in it.

Here's where the chips choice matters.

Training giant models — GPU/TPU clusters run for weeks to train LLMs and diffusion models.

scroll

The same idea, many jobs

It all comes back to matrix math and the serial-vs-parallel tradeoff:

  • Training is the most compute-hungry job, so it lives on big GPU and TPU clusters.
  • Inference (serving) cares about latency and cost per request — sometimes a GPU, sometimes cheaper specialized chips, sometimes a CPU.
  • On-device AI uses small NPUs so your data never leaves the phone.
  • Graphics and science were parallel workloads long before AI — same hardware, and the same memory-bandwidth bottleneck.

Why it's worth knowing

"Which chip, and how many?" drives real cost, speed and feasibility decisions — and the scarcity of AI compute is why a chipmaker became one of the most valuable companies on earth. This lens turns AI hype into questions you can actually reason about.

That's the course

You now know the whole picture: AI is matrix math, parallelism is how you go fast, and the CPU, GPU and TPU are three bets on which to use when.

If you enjoyed this, the other courses cover the models these chips actually run.