Chapter 1 · Part 1

Destroy, then rebuild

Models like Stable Diffusion, DALL·E and Midjourney conjure a detailed picture out of what looks like pure television static. It feels like magic. It isn't — it's one surprisingly simple idea, and this whole course is about it.

Here's the trick, stated backwards from how you'd expect. Instead of teaching a model to paint, we teach it to clean up. And to do that, we first have to make a mess on purpose.

Take a real photo and slowly stir in random noise — a little, then more, then more — until nothing of the original survives. Scroll through it below.

Here's an ordinary photo: a sky, a sun, some ground. Timestep 0 — no noise yet.

scroll

Why on earth would we destroy the image?

Because destroying it is easy and predictable, and that gives us something to learn from. At every step above we knew exactly how much noise we added. So imagine freezing the animation at any point and asking a model one small question:

"Some noise was just added to this image. Can you guess what it was?"

If the model can answer that — predict the noise — then it can subtract it, nudging the image one step back toward the original. Do that over and over and you walk the whole animation in reverse: static → grain → picture.

That's the entire idea of a diffusion model:

  • Forward (this chapter): take a clean image and add noise, step by step, on a fixed schedule, until it's random static. No learning required — it's just arithmetic.
  • Reverse (the rest of the course): train a network to undo one step of noise at a time. Run it from pure static and an image appears.

The clever part is that once a model knows how to denoise, you don't have to start from a real photo at all. Start from a fresh patch of random static — something no one has ever seen — and denoise that. What comes out is a brand new image. That's how these models generate.

The forward process, precisely

Adding the noise isn't haphazard. Each step mixes the image with a bit of random noise in a fixed ratio, and there's a neat shortcut: you can jump straight to the noise level of any timestep t in one shot, without simulating every step in between.

In code, one noising step — or a jump to any timestep — is just a weighted blend of the image and a patch of Gaussian noise:

forward.py — add noise to an image at timestep t
import numpy as np

def add_noise(x0, alpha_bar_t):
  """Mix a clean image x0 with random noise.
  alpha_bar_t in [0, 1]: 1.0 = clean, 0.0 = pure static."""
  eps = np.random.randn(*x0.shape)          # Gaussian noise, same shape
  xt  = np.sqrt(alpha_bar_t) * x0 + np.sqrt(1 - alpha_bar_t) * eps
  return xt, eps                            # eps is the "answer" to learn

# A noise schedule: alpha_bar shrinks from ~1 to ~0 over many steps.
alpha_bar = np.cos(np.linspace(0, 1, 1000) * np.pi / 2) ** 2

Notice the function hands back eps alongside the noisy image. That returned noise is the label — the right answer — we'll train the network to predict in Chapter 4.

Where we're headed

Hold onto the picture from the animation: a dial that runs an image all the way to static and, crucially, back. Everything else in this course is about building the one piece we're still missing — the network that runs the dial backwards.

Next we'll look closely at the raw material of all this: what random noise actually is, and why "just add randomness" has a surprisingly precise meaning.