Chapter 4 · Part 2

Predicting what others do

Knowing where every car and pedestrian is still isn't enough to drive. By the time you've reacted to where something is, it has already moved. Good driving is mostly anticipation — reading that a car is drifting toward your lane, or that a pedestrian at the curb is about to step out. So the car forecasts the future of everything around it, a few seconds ahead.

Scroll to give the other road users their likely futures.

Here's the scene: you, an oncoming car, and a pedestrian at the curb.

scroll↓

The future is multi-modal

The key idea: there is rarely one right answer. A car approaching an intersection might go straight, turn left, or stop — genuinely different futures. So the predictor is multi-modal: for each agent it outputs several candidate trajectories, each with a probability, rather than a single guess. Collapsing that to one prediction would be dangerous — you'd plan as if the 30% "turns across you" future couldn't happen.

Interaction makes it hard

The reason this is so much harder than predicting a falling ball: people are reactive. The pedestrian's choice to cross depends on whether you slow down; the merging car depends on whether you make space. Everyone is predicting everyone, which is why a tiny hesitation can cascade into a four-way standstill at a junction. Modern predictors model these interactions jointly rather than agent-by-agent.

Where we're headed

Now the car has, for the next few seconds, a probabilistic movie of what everyone around it might do. The remaining question is the one that actually moves the steering wheel: given all those possible futures, what should we do? Next: planning the path.