Creating Arrays¶
NumPy has many ways to create an array. You'll use the same handful 90% of the time.
From a Python list — np.array()¶
From a list of lists → 2D array:
import numpy as np
b = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
print(b)
print("shape:", b.shape)
np.zeros() — all zeros¶
import numpy as np
print(np.zeros(5)) # 1D, length 5
print(np.zeros((2, 3))) # 2D, 2 rows × 3 cols
print(np.zeros((2, 2, 2))) # 3D
np.ones() — all ones¶
np.full() — filled with any value¶
np.arange() — like Python's range()¶
import numpy as np
print(np.arange(10)) # 0 to 9
print(np.arange(1, 11)) # 1 to 10
print(np.arange(0, 20, 2)) # step of 2
print(np.arange(1, 0, -0.1)) # floats also work
np.linspace() — evenly spaced points¶
linspace(start, stop, n) — n points from start to stop, inclusive on both ends.
import numpy as np
print(np.linspace(0, 1, 5)) # 5 points from 0 to 1
print(np.linspace(0, 10, 11)) # 11 points from 0 to 10
print(np.linspace(-1, 1, 9)) # 9 points from -1 to 1
arange vs linspace:
- arange — give a step, get however many points fit.
- linspace — give the number of points, get the step automatically.
np.eye() — identity matrix¶
np.empty() — uninitialized¶
Faster than zeros because it doesn't bother filling with zeros — but the contents are garbage (whatever was in memory). Only useful when you're about to overwrite the whole thing.
np.random — random arrays¶
import numpy as np
# Set the seed for reproducibility
rng = np.random.default_rng(seed=42)
print(rng.random(5)) # 5 floats in [0, 1)
print(rng.integers(0, 100, size=10)) # 10 ints in [0, 100)
print(rng.normal(loc=0, scale=1, size=5)) # 5 samples from N(0, 1)
print(rng.choice(["red", "green", "blue"], size=5))
We'll deep-dive into random in chapter 11.
dtype — choose the number type¶
By default, integer lists → int64, float lists → float64. You can pick:
import numpy as np
a = np.array([1, 2, 3], dtype=np.float32)
print(a, a.dtype)
b = np.array([1.5, 2.5, 3.5], dtype=np.int32)
print(b, b.dtype) # decimals dropped!
# Common dtypes
print(np.zeros(3, dtype=bool)) # all False
print(np.zeros(3, dtype=np.uint8)) # 0-255
print(np.array(["a", "bb", "ccc"]).dtype) # '<U3' (Unicode str, 3 chars)
Smaller dtypes = less memory. A 1-million-element float64 array is 8 MB; float32 is 4 MB; int8 is 1 MB. For ML on big data, choosing the right dtype matters.
like versions — match the shape of another array¶
import numpy as np
original = np.array([[1, 2, 3], [4, 5, 6]])
print(np.zeros_like(original))
print(np.ones_like(original))
print(np.full_like(original, 9))
Quick-reference¶
| Need | Function |
|---|---|
| From a list | np.array(lst) |
| All zeros | np.zeros(shape) |
| All ones | np.ones(shape) |
| Any constant | np.full(shape, value) |
| Range with step | np.arange(start, stop, step) |
| N evenly-spaced points | np.linspace(start, stop, n) |
| Identity matrix | np.eye(n) |
| Random | np.random.default_rng().random(shape) |
| Match shape of another | np.zeros_like(arr) |
Mini-exercise¶
Create a 5×5 array where the diagonal is 1, everything else is 0. Two ways:
import numpy as np
# Way 1
print(np.eye(5))
# Way 2 — manual
a = np.zeros((5, 5), dtype=int)
for i in range(5):
a[i, i] = 1
print(a)
Common pitfalls¶
- ❗
np.array(1, 2, 3)doesn't work — must wrap in a list:np.array([1, 2, 3]). - ❗ Mixed-type list —
np.array([1, 2.0, "3"])produces a string array. - ❗
np.empty()for working data — it has garbage in it. Usezerosunless performance is critical. - ❗
arange()with floats — float arithmetic isn't exact.np.arange(0, 1, 0.1)may have 10 or 11 elements depending on rounding. Uselinspacefor floats.
Practice¶
What does this print?
Expected: [0. 0.25 0.5 0.75 1. ]
Create exactly 11 evenly-spaced points from 0 to 10 (inclusive)
Expected: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
Quiz — Quick check¶
What you remember
Q1. What's the shape of np.zeros((2, 3))?
-
(6,) -
(2, 3)— 2 rows, 3 columns -
(3, 2) -
2
Why: Pass a tuple to specify a multi-dimensional shape.
(2, 3)means 2 in the first axis (rows), 3 in the second (cols).
Q2. np.linspace(0, 10, 5) and np.arange(0, 10, 2.5) — which gives identical output?
- They're identical
- No —
linspaceincludes the endpoint (10);arangeexcludes it -
arangeis for floats only -
linspaceis deprecated
Why:
linspace(0, 10, 5)→[0, 2.5, 5, 7.5, 10](5 points, endpoint included).arange(0, 10, 2.5)→[0, 2.5, 5, 7.5](stops before 10).
Q3. Why prefer np.linspace over np.arange when working with floats?
- Float arithmetic with
arangecan produce unexpected element counts due to rounding -
linspaceis faster -
arangeonly works on integers -
linspaceuses less memory
Why:
np.arange(0, 1, 0.1)may give 10 or 11 elements depending on float precision.linspacelets you specify the count exactly.
Common doubts¶
Why do I need np.zeros((2, 3)) instead of np.zeros(2, 3)?
NumPy expects a single argument for the shape — a tuple. np.zeros((2, 3)) says "shape is (2, 3)". np.zeros(2, 3) would mean two positional args, which zeros interprets differently (second arg is dtype) — confusing error.
What's np.empty for if its contents are garbage?
Performance. np.empty doesn't waste time zeroing memory. Use it only when you'll immediately overwrite every element (e.g. filling in a loop). For most code, np.zeros is the safe default.
When should I use a smaller dtype like float32 instead of float64?
For ML on large datasets — float32 halves memory and on modern CPUs/GPUs is often faster than float64. For scientific computing where precision matters (e.g. accumulating sums of many small numbers), stay with float64.