Chapter 1 · Part 1

It's just autocomplete

ChatGPT can write an essay, debug your code, or explain quantum physics. It feels like it must understand things. But underneath, it is doing something almost absurdly simple — the same thing your phone keyboard does when it suggests the next word, just done extraordinarily well.

Here's the whole trick: given some text, predict the next word. Add that word. Then predict again. Repeat a few hundred times and you have a paragraph. That loop — and nothing more exotic — is what turns your prompt into an answer.

Scroll to watch a model write one word at a time.

Start with some text — here, a half-finished sentence. The model's only job: guess what comes next.

scroll

Predict, append, repeat

That loop has a name: the model is autoregressive — each new word is predicted from all the words so far, including the ones it just generated. There's no master plan for the sentence; it genuinely is decided one word at a time.

  • Input: everything written so far (your prompt + whatever it's generated).
  • Output: a guess at the single next word.
  • Then: that word is glued on, and the whole thing is fed back in for the next guess.

This is why ChatGPT "types" its answer out left to right, and why it can be interrupted mid-sentence — there's literally nothing computed beyond the next word yet.

"But that can't be all it's doing…"

It's a fair reaction. How does guessing the next word produce working code or a coherent argument? The answer is that to predict the next word really well, you have to learn an awful lot about the world.

To finish "The capital of France is ___" you need a fact. To finish "def add(a, b): return ___" you need syntax. To finish a mystery novel's last sentence you need to have tracked the whole plot. A model trained to predict the next word across billions of pages is forced to absorb grammar, facts, reasoning patterns, and style — all as a side effect of getting that one guess right.

generate.py — the entire generation loop
text = "The cat sat on the"

for _ in range(50):                 # generate up to 50 words
  probs = model.predict_next(text)   # a score for every possible word
  next_word = pick(probs)            # choose one (next chapters: how)
  text += " " + next_word            # append it
  if next_word == "<end>":
      break

print(text)

That's the real shape of it: a for loop around "predict the next word." The intelligence is all squeezed into that one predict_next step.

What's next

We've been saying "word," but models don't actually deal in words — they read text in chunks called tokens, and that detail explains a surprising amount (why they're bad at spelling, why they're priced the way they are). That's the next chapter.