Chapter 7 · Part 5
Where this shows up
We built the whole diffusion pipeline: add noise on a schedule, train a network to predict it, then run the process backwards to conjure an image from static — steered by a text prompt. That's not a lab curiosity. It's the engine behind a wave of tools you've almost certainly already used.
Here's where the "denoise from static" idea is actually earning its keep.
Text-to-image: type a prompt, get a picture — Midjourney, DALL·E and Stable Diffusion.
The same idea, many jobs
Every one of these is the reverse process you scrubbed through, pointed at a different goal:
- Art & design tools start from pure noise and denoise toward your prompt (text conditioning).
- Editing (inpainting, outpainting, generative fill) just fixes part of the image and denoises the rest to match — same model, masked.
- Video and 3D apply diffusion across frames or volumes instead of a single 2D grid.
- Molecule and protein design run diffusion over structures rather than pixels — the noise-and-reverse recipe doesn't care what the data is.
Powerful enough to need care
Because the output is photoreal and cheap, diffusion also raises real issues: deepfakes and misinformation, copyright and training-data questions, and built-in bias from what the model was trained on. That's why provenance standards, visible/invisible watermarking and consent are active areas — worth knowing about if you use these tools seriously.
That's the course
From a clean photo to pure static and back, you now know how modern image generators work: the forward noising, the network that predicts the noise, sampling from static, and steering with text.
If you enjoyed this, the other courses cover how AI sees images, understands language and meaning, and the chips that make all of it run.