Chapter 2 · Part 1

Color & channels

In Chapter 1 every pixel was a single number — its brightness. That's enough for a grayscale image, but the world isn't grey. To store colour, one number per pixel isn't enough. We need three.

Scroll through the photo below. It looks like one picture, but watch it come apart into the layers it's secretly made of.

A colour photo — it looks like a single picture.

scroll↓

Three numbers per pixel

A colour pixel stores three values, not one:

R — how much red, 0–255.
G — how much green, 0–255.
B — how much blue, 0–255.

So the bright sun pixel above isn't "yellow" to the computer — it's (255, 215, 70). Yellow is just a lot of red, a lot of green, and a little blue. Every colour you can see on a screen is some mix of these three.

Each channel, on its own, is exactly the kind of grayscale grid from Chapter 1. A colour image is just three of those grids stacked together.

Why adding the layers works

Screens make colour with light, and light adds. Start from black and add some red, some green, some blue, and the colours pile up toward white. That's why the three tinted layers in the animation merge back into the original photo when they overlap — you're literally adding the red, green and blue light back together.

Red + Green = Yellow
All three at full = White
All three at zero = Black

In code

A grayscale image was a 2D array. A colour image is a 3D array — height, then width, then a length-3 stack of [R, G, B] per pixel:

channels.py — colour is three stacked grids

from PIL import Image
import numpy as np

img = np.array(Image.open("sky.png").convert("RGB"))

print(img.shape)        # (140, 140, 3)  ->  height, width, channels
print(img[20, 100])     # [255 215  70]  ->  one pixel: R, G, B

red   = img[:, :, 0]    # just the red layer  (a 2D grayscale grid)
green = img[:, :, 1]    # just the green layer
blue  = img[:, :, 2]    # just the blue layer
print(red.shape)        # (140, 140)

That extra 3 on the end of the shape is a big deal. An image is no longer a flat grid — it's a small stack of grids. In the next chapter we'll give that stack its proper name and see why thinking of images this way is exactly what neural networks need.