Chapter 7 · Part 4

Where this shows up

Strip it back and an LLM does one thing: predict the next token, over and over. But that single trick, at scale, turns out to be shockingly general — and it's now wired into tools used by hundreds of millions of people every day.

Here's where "autocomplete on steroids" is actually doing work.

Chatbots & assistants — ChatGPT, Claude, Gemini for questions, drafting and brainstorming.

scroll↓

The same idea, many jobs

Each of these is next-token prediction dressed differently:

Chat and writing are the loop you saw in Chapter 1 run to completion.
Coding copilots are the same model trained on code — code is just another language to predict.
RAG fixes the context-window and hallucination limits by pasting real, retrieved documents into the prompt (which is exactly what the embeddings course powers).
Agents wrap the model in a loop that lets it call tools — search, a calculator, your calendar — turning prediction into action.

Use them well

Because the model optimizes for plausible, not true, the practical rules follow directly: verify specifics, prefer tools/RAG for facts, lower temperature for precision, and keep a human in the loop for anything high-stakes.

That's the course

You now know the whole machine — tokens, probabilities, temperature, the context window, and why it makes things up — and where all of it shows up in the tools you use.

If you enjoyed this, the other courses cover how AI generates images, understands meaning, and the hardware that runs it all.