Broadcasting¶
Broadcasting is how NumPy lets you do math between arrays of different shapes. Once you understand it, you'll write half as much code.
The simplest case — scalar + array¶
The scalar 5 is broadcast to the same shape as a.
Adding a 1D row to every row of a 2D array¶
import numpy as np
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
row = np.array([10, 20, 30])
print(matrix + row)
Result:
NumPy stretched row from shape (3,) to (3, 3) — adding it to every row.
Subtracting a column¶
To add a different value to each row (a column vector), reshape first:
import numpy as np
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
col = np.array([100, 200, 300])
# Without reshaping: 3 vs 3 cols → applies to rows. WRONG for our purpose.
print(matrix + col)
# Reshape to a column
col2 = col.reshape(-1, 1) # shape (3, 1)
print(matrix + col2)
Output of matrix + col2:
The rules of broadcasting¶
NumPy compares shapes right-to-left. Two dimensions are compatible if:
- They are equal, OR
- One of them is 1.
If they're incompatible → ValueError.
| shape A | shape B | Compatible? | Result shape |
|---|---|---|---|
(3, 4) |
(4,) |
yes | (3, 4) |
(3, 4) |
(3, 1) |
yes | (3, 4) |
(3, 4) |
(1, 4) |
yes | (3, 4) |
(3, 4) |
(3, 4) |
yes | (3, 4) |
(3, 4) |
(3, 2) |
NO | error |
(3, 1, 5) |
(4, 5) |
yes | (3, 4, 5) |
Walk-through for (3, 4) + (4,):
A: (3, 4)
B: (4,) ← align right
↓
(3, 4) — same!
(4 == 4) — ok
B has no first axis → treated as 1 → stretches to 3
A practical example — feature scaling¶
For ML, you often need to subtract the mean of each column and divide by the std:
import numpy as np
# 5 samples, 3 features
data = np.array([
[1.0, 2.0, 100.0],
[2.0, 3.0, 200.0],
[3.0, 4.0, 300.0],
[4.0, 5.0, 400.0],
[5.0, 6.0, 500.0],
])
# Per-column statistics — shape (3,)
mean = data.mean(axis=0)
std = data.std(axis=0)
print("mean:", mean)
print("std :", std)
# Broadcasting: (5,3) - (3,) → (5,3)
scaled = (data - mean) / std
print("\nscaled:")
print(scaled)
print("scaled.mean(axis=0):", scaled.mean(axis=0).round(4))
print("scaled.std(axis=0) :", scaled.std(axis=0).round(4))
Each column now has mean ≈ 0 and std ≈ 1. That's a one-liner thanks to broadcasting.
Outer product — every combination¶
A 1D column times a 1D row gives a 2D table of all products:
import numpy as np
x = np.arange(1, 6) # shape (5,)
y = np.arange(1, 4) # shape (3,)
# Reshape to column × row
table = x[:, None] * y[None, :]
print(table)
print("shape:", table.shape) # (5, 3)
x[:, None] is shape (5, 1); y[None, :] is (1, 3). Broadcasting expands both → (5, 3).
Coordinate grids¶
import numpy as np
x = np.arange(-2, 3)
y = np.arange(-2, 3)
# Build all (x, y) combinations
xx, yy = np.meshgrid(x, y)
print("xx:")
print(xx)
print("yy:")
print(yy)
# Compute z = x² + y² at every grid point
z = xx**2 + yy**2
print("\nz:")
print(z)
Useful for plotting surfaces, image filters, mathematical functions.
When broadcasting fails — visualize the shapes¶
import numpy as np
a = np.zeros((3, 4))
b = np.zeros((3, 2))
try:
a + b
except ValueError as e:
print("Error:", e)
(3, 4) + (3, 2) — last dims 4 and 2 are not equal and neither is 1 → fails.
Fix it by aligning shapes explicitly (often with reshape or [:, None]).
More examples¶
Distance from each point to a center:
import numpy as np
points = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8],
])
center = np.array([0, 0])
# (4, 2) - (2,) → (4, 2)
diffs = points - center
print("diffs:")
print(diffs)
# Euclidean distance per point
distances = np.sqrt((diffs ** 2).sum(axis=1))
print("distances:", distances)
Multiplication table:
import numpy as np
n = 10
nums = np.arange(1, n + 1)
table = nums[:, None] * nums[None, :]
print(table)
Broadcasting and memory¶
Broadcasting doesn't actually copy data — it pretends to. NumPy uses clever strides to reuse memory. So broadcasting is fast and memory-efficient.
Cheatsheet — common patterns¶
| Goal | Shape match |
|---|---|
| Add scalar to array | arr + 5 |
| Add row vector to every row | arr (M,N) + row (N,) |
| Add col vector to every col | arr (M,N) + col[:, None] (M,1) |
| Outer product | a[:, None] * b[None, :] |
| Normalize each column | (arr - arr.mean(axis=0)) / arr.std(axis=0) |
| Normalize each row | (arr - arr.mean(axis=1, keepdims=True)) / arr.std(axis=1, keepdims=True) |
keepdims=True is the trick — keeps the reduced dim as size 1 so it broadcasts back.
Common pitfalls¶
- ❗ Forgetting
keepdims=True—arr.sum(axis=1)reduces shape from(M, N)to(M,). Thenarr - thatbroadcasts incorrectly. Usearr.sum(axis=1, keepdims=True)(shape(M, 1)) so it broadcasts back to(M, N). - ❗ Adding a row instead of a column — always check the shapes.
(M, N) + (M,)will FAIL or do the wrong thing. Reshape to(M, 1). - ❗ Operator precedence in
&/|— wrap parts in parens:(a > 1) & (a < 5), nota > 1 & a < 5. - ❗ Mixing dtypes —
int + float → float. Sometimes surprising.
Practice¶
What does this print?
Expected: [[11 22 33] [14 25 36]]
Add col to every column (not every row) of the matrix
Expected: [[101 102 103] [204 205 206] [307 308 309]]
Quiz — Quick check¶
What you remember
Q1. Broadcasting (3, 4) + (4,) produces a result of shape…
-
(3, 4) -
(3, 1) -
(4, 4) - Error — shapes don't match
Why: NumPy aligns shapes right-to-left.
(4,)becomes(1, 4), then stretches to(3, 4)— same as the first operand.
Q2. Why does (3, 4) + (3, 2) fail?
- Last dims (
4and2) are neither equal nor 1 - First dims don't match
- NumPy doesn't broadcast at all
- You need
np.broadcast
Why: Two dimensions can broadcast only if they're equal OR one is 1. Neither applies to
4vs2, so NumPy raisesValueError.
Q3. When normalizing columns of a (M, N) array with (arr - arr.mean(axis=0)) / arr.std(axis=0), which axis is correct?
-
axis=0— collapses rows, gives per-column stats -
axis=1 -
axis=-1 - No axis needed
Why:
axis=0reduces along the first axis (rows), producing per-column statistics. Then broadcasting expands the result back to(M, N).
Common doubts¶
Does broadcasting actually copy memory?
No — it uses strides to pretend the smaller array is bigger. The data isn't duplicated. That's why broadcasting is both fast and memory-efficient.
Why do people use keepdims=True so often?
Because reductions collapse a dimension. After arr.mean(axis=1) for shape (M, N), you get (M,). Subtracting that from the original via broadcasting may fail or do the wrong thing. keepdims=True keeps the dimension as size 1 ((M, 1)), which broadcasts cleanly back to (M, N).
When should I reach for np.meshgrid vs broadcasting x[:, None] * y[None, :]?
They achieve the same thing. np.meshgrid is more explicit and produces both xx and yy matrices — better for plotting. The [:, None] trick is shorter and produces just the result. Use whichever is clearer in context.