Chapter 1 · Part 1

Meaning as a place in space

A computer doesn't know that cat and dog have anything to do with each other. To it, they're just two different strings of letters — as unrelated as cat and umbrella. Yet models like ChatGPT and Google Search clearly act as if they understand that cat and dog are similar. How?

The trick is the single most important idea in modern AI, and this whole course is about it: represent each word as a list of numbers — a point in space — arranged so that words with similar meanings sit close together. That list of numbers is called an embedding, and once meaning becomes a location, machines can do arithmetic with it.

Scroll to watch a pile of plain words turn into a map of meaning.

Here are twelve words. To a computer they're just symbols — no order, no meaning.

scroll↓

A word becomes a vector

In the visual we used two dimensions so it fits on a screen. Real embeddings use many more — often hundreds or thousands — but the idea is identical: every word is a point, written as a list of coordinates called a vector.

The extra dimensions are what let the space capture many kinds of similarity at once — an embedding can place cat near dog (both pets), near lion (both felines), and near kitten (same animal, different age), all at the same time, because it has room to spread those relationships across different directions.

You don't build this by hand

You might imagine someone painstakingly assigning coordinates to every word. No one does. These vectors are learned automatically by a model that reads enormous amounts of text — we'll see exactly how in Chapter 3. In practice you just call an embedding model and get a vector back:

embed.py — turn words into vectors

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

vec = model.encode("cat")        # a 384-number vector
print(vec.shape)                 # (384,)
print(vec[:5])                   # [ 0.021, -0.118, 0.044, ... ]

# similar words land in nearby parts of the space
model.encode(["cat", "dog", "umbrella"])

Run that on cat, dog and umbrella and the first two vectors come out close together, while umbrella lands far away — exactly the structure you scrubbed into existence above.

Where we're headed

Once meaning is a position, "do these two words mean similar things?" becomes a question about distance — something a computer can answer instantly. Next we'll make that precise: how to measure closeness in embedding space with cosine similarity, and why direction matters more than raw distance.