Skip to content

Creating Arrays

NumPy has many ways to create an array. You'll use the same handful 90% of the time.

From a Python list — np.array()

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a)
print(type(a))

From a list of lists → 2D array:

import numpy as np

b = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
print(b)
print("shape:", b.shape)

np.zeros() — all zeros

import numpy as np

print(np.zeros(5))             # 1D, length 5
print(np.zeros((2, 3)))         # 2D, 2 rows × 3 cols
print(np.zeros((2, 2, 2)))      # 3D

np.ones() — all ones

import numpy as np

print(np.ones(4))
print(np.ones((3, 3)))

np.full() — filled with any value

import numpy as np

print(np.full(5, 7))                # five 7s
print(np.full((2, 3), 3.14))         # 2x3 of 3.14

np.arange() — like Python's range()

import numpy as np

print(np.arange(10))                 # 0 to 9
print(np.arange(1, 11))              # 1 to 10
print(np.arange(0, 20, 2))           # step of 2
print(np.arange(1, 0, -0.1))         # floats also work

np.linspace() — evenly spaced points

linspace(start, stop, n) — n points from start to stop, inclusive on both ends.

import numpy as np

print(np.linspace(0, 1, 5))       # 5 points from 0 to 1
print(np.linspace(0, 10, 11))     # 11 points from 0 to 10
print(np.linspace(-1, 1, 9))      # 9 points from -1 to 1

arange vs linspace: - arange — give a step, get however many points fit. - linspace — give the number of points, get the step automatically.

np.eye() — identity matrix

import numpy as np

print(np.eye(4))            # 4x4 identity
print(np.eye(3, 5))         # 3x5 with diagonal of ones

np.empty() — uninitialized

Faster than zeros because it doesn't bother filling with zeros — but the contents are garbage (whatever was in memory). Only useful when you're about to overwrite the whole thing.

import numpy as np

# Don't rely on the values!
print(np.empty((2, 3)))

np.random — random arrays

import numpy as np

# Set the seed for reproducibility
rng = np.random.default_rng(seed=42)

print(rng.random(5))                       # 5 floats in [0, 1)
print(rng.integers(0, 100, size=10))       # 10 ints in [0, 100)
print(rng.normal(loc=0, scale=1, size=5))  # 5 samples from N(0, 1)
print(rng.choice(["red", "green", "blue"], size=5))

We'll deep-dive into random in chapter 11.

dtype — choose the number type

By default, integer lists → int64, float lists → float64. You can pick:

import numpy as np

a = np.array([1, 2, 3], dtype=np.float32)
print(a, a.dtype)

b = np.array([1.5, 2.5, 3.5], dtype=np.int32)
print(b, b.dtype)         # decimals dropped!

# Common dtypes
print(np.zeros(3, dtype=bool))            # all False
print(np.zeros(3, dtype=np.uint8))        # 0-255
print(np.array(["a", "bb", "ccc"]).dtype) # '<U3' (Unicode str, 3 chars)

Smaller dtypes = less memory. A 1-million-element float64 array is 8 MB; float32 is 4 MB; int8 is 1 MB. For ML on big data, choosing the right dtype matters.

like versions — match the shape of another array

import numpy as np

original = np.array([[1, 2, 3], [4, 5, 6]])

print(np.zeros_like(original))
print(np.ones_like(original))
print(np.full_like(original, 9))

Quick-reference

Need Function
From a list np.array(lst)
All zeros np.zeros(shape)
All ones np.ones(shape)
Any constant np.full(shape, value)
Range with step np.arange(start, stop, step)
N evenly-spaced points np.linspace(start, stop, n)
Identity matrix np.eye(n)
Random np.random.default_rng().random(shape)
Match shape of another np.zeros_like(arr)

Mini-exercise

Create a 5×5 array where the diagonal is 1, everything else is 0. Two ways:

import numpy as np

# Way 1
print(np.eye(5))

# Way 2 — manual
a = np.zeros((5, 5), dtype=int)
for i in range(5):
    a[i, i] = 1
print(a)

Common pitfalls

  • np.array(1, 2, 3) doesn't work — must wrap in a list: np.array([1, 2, 3]).
  • Mixed-type listnp.array([1, 2.0, "3"]) produces a string array.
  • np.empty() for working data — it has garbage in it. Use zeros unless performance is critical.
  • arange() with floats — float arithmetic isn't exact. np.arange(0, 1, 0.1) may have 10 or 11 elements depending on rounding. Use linspace for floats.

Practice

What does this print?

Expected: [0. 0.25 0.5 0.75 1. ]

import numpy as np
print(np.linspace(0, 1, 5))

Create exactly 11 evenly-spaced points from 0 to 10 (inclusive)

Expected: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]

import numpy as np
print(np.arange(0, 10))     # bug: stops before 10, not inclusive; use linspace

Quiz — Quick check

What you remember

Q1. What's the shape of np.zeros((2, 3))?

  • (6,)
  • (2, 3) — 2 rows, 3 columns
  • (3, 2)
  • 2

Why: Pass a tuple to specify a multi-dimensional shape. (2, 3) means 2 in the first axis (rows), 3 in the second (cols).

Q2. np.linspace(0, 10, 5) and np.arange(0, 10, 2.5) — which gives identical output?

  • They're identical
  • No — linspace includes the endpoint (10); arange excludes it
  • arange is for floats only
  • linspace is deprecated

Why: linspace(0, 10, 5)[0, 2.5, 5, 7.5, 10] (5 points, endpoint included). arange(0, 10, 2.5)[0, 2.5, 5, 7.5] (stops before 10).

Q3. Why prefer np.linspace over np.arange when working with floats?

  • Float arithmetic with arange can produce unexpected element counts due to rounding
  • linspace is faster
  • arange only works on integers
  • linspace uses less memory

Why: np.arange(0, 1, 0.1) may give 10 or 11 elements depending on float precision. linspace lets you specify the count exactly.

Common doubts

Why do I need np.zeros((2, 3)) instead of np.zeros(2, 3)?

NumPy expects a single argument for the shape — a tuple. np.zeros((2, 3)) says "shape is (2, 3)". np.zeros(2, 3) would mean two positional args, which zeros interprets differently (second arg is dtype) — confusing error.

What's np.empty for if its contents are garbage?

Performance. np.empty doesn't waste time zeroing memory. Use it only when you'll immediately overwrite every element (e.g. filling in a loop). For most code, np.zeros is the safe default.

When should I use a smaller dtype like float32 instead of float64?

For ML on large datasets — float32 halves memory and on modern CPUs/GPUs is often faster than float64. For scientific computing where precision matters (e.g. accumulating sums of many small numbers), stay with float64.

What's next

Array Attributes — shape, dtype, ndim