Random Sampling¶
NumPy's random module powers simulations, sampling, ML data shuffling, and statistical experiments.
The modern way — default_rng¶
Always use np.random.default_rng() — it's the modern, properly-engineered API.
import numpy as np
rng = np.random.default_rng()
print(rng.random(5)) # 5 floats in [0, 1)
print(rng.integers(0, 10, 8)) # 8 ints in [0, 10)
print(rng.normal(0, 1, 5)) # 5 samples from N(0, 1)
The older
np.random.rand(),np.random.randn()still work but are deprecated for new code. Usedefault_rng().
Reproducibility — set a seed¶
For reproducible results (testing, debugging, papers), pass a seed:
import numpy as np
# Same seed = same sequence every time
rng1 = np.random.default_rng(seed=42)
print(rng1.random(3))
rng2 = np.random.default_rng(seed=42)
print(rng2.random(3)) # identical
Uniform distribution¶
.random(size=...) → floats in [0, 1). Scale and shift for any range:
import numpy as np
rng = np.random.default_rng(0)
print(rng.random(5)) # [0, 1)
print(rng.random((2, 3))) # 2D array
# Scale to [10, 20)
print(rng.uniform(10, 20, size=5))
# Just integers
print(rng.integers(0, 100, size=10)) # [0, 100)
print(rng.integers(0, 100, size=10, endpoint=True)) # [0, 100]
Normal (Gaussian) distribution¶
mean ± std — the bell curve.
import numpy as np
rng = np.random.default_rng(0)
# Standard normal — mean=0, std=1
samples = rng.standard_normal(10000)
print(f"mean = {samples.mean():.4f} (≈ 0)")
print(f"std = {samples.std():.4f} (≈ 1)")
# Custom mean and std
heights = rng.normal(loc=170, scale=10, size=5)
print(f"heights: {heights.round(1)}")
Other distributions¶
import numpy as np
rng = np.random.default_rng(0)
print("Exponential :", rng.exponential(scale=1.0, size=5))
print("Poisson :", rng.poisson(lam=3, size=10))
print("Binomial :", rng.binomial(n=10, p=0.5, size=10))
print("Beta :", rng.beta(a=2, b=5, size=5))
print("Gamma :", rng.gamma(shape=2, scale=2, size=5))
choice — pick from a set¶
import numpy as np
rng = np.random.default_rng(0)
cards = ["A", "K", "Q", "J", "10", "9", "8", "7"]
# Single
print(rng.choice(cards))
# 5 with replacement (default)
print(rng.choice(cards, size=5))
# 3 without replacement
print(rng.choice(cards, size=3, replace=False))
# Weighted — "K" is 5× more likely
weights = [0.5, 2.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
weights = np.array(weights) / sum(weights) # must sum to 1
print(rng.choice(cards, size=10, p=weights))
permutation and shuffle¶
import numpy as np
rng = np.random.default_rng(0)
# Returns a shuffled COPY
print(rng.permutation(np.arange(10)))
# Modifies in place
a = np.arange(10)
rng.shuffle(a)
print(a)
# For 2D — shuffles ROWS by default
matrix = np.arange(12).reshape(4, 3)
rng.shuffle(matrix)
print(matrix)
This is how you shuffle training data in ML.
Train/test split — DIY¶
import numpy as np
rng = np.random.default_rng(42)
# Imagine 100 samples
X = np.arange(100)
# Shuffle indices, take 80% for train
indices = rng.permutation(100)
train_idx = indices[:80]
test_idx = indices[80:]
X_train, X_test = X[train_idx], X[test_idx]
print(f"train ({len(X_train)}): {sorted(X_train.tolist())[:10]}...")
print(f"test ({len(X_test)}): {sorted(X_test.tolist())[:10]}...")
Monte Carlo simulation — estimate π¶
import numpy as np
rng = np.random.default_rng(0)
N = 1_000_000
# Random points in the unit square
x = rng.random(N)
y = rng.random(N)
# Distance from origin
inside_circle = (x**2 + y**2) < 1
pi_estimate = 4 * inside_circle.sum() / N
print(f"Estimated π: {pi_estimate:.5f}")
print(f"True π: {np.pi:.5f}")
print(f"Error : {abs(pi_estimate - np.pi):.5f}")
We threw 1M random points into a square, counted how many landed inside a quarter circle. Beautiful.
Sampling with custom probabilities¶
import numpy as np
rng = np.random.default_rng(0)
# Simulate a loaded die — 6 is twice as likely as anything else
outcomes = np.arange(1, 7)
prob = np.array([1, 1, 1, 1, 1, 2]) / 7
rolls = rng.choice(outcomes, size=10_000, p=prob)
# Count the outcomes
vals, counts = np.unique(rolls, return_counts=True)
for v, c in zip(vals, counts):
print(f" {v}: {c:5} ({c/10000*100:.1f}%)")
Bootstrap — estimate uncertainty¶
Resampling your data to estimate how stable a statistic is:
import numpy as np
rng = np.random.default_rng(0)
data = rng.normal(loc=100, scale=15, size=50)
# Original mean
print(f"Original mean: {data.mean():.2f}")
# Bootstrap — sample with replacement many times, compute mean each time
n_bootstraps = 1000
means = []
for _ in range(n_bootstraps):
sample = rng.choice(data, size=len(data), replace=True)
means.append(sample.mean())
means = np.array(means)
print(f"Bootstrap mean of means: {means.mean():.2f}")
print(f"95% CI: ({np.percentile(means, 2.5):.2f}, {np.percentile(means, 97.5):.2f})")
A simple but powerful technique for ML evaluation.
Random arrays of any shape¶
import numpy as np
rng = np.random.default_rng(0)
img_batch = rng.random((4, 8, 8)) # 4 grayscale 8x8 "images"
print("batch shape:", img_batch.shape)
# Same for any distribution
weights = rng.normal(0, 0.01, size=(64, 10)) # NN weights
print("weights shape:", weights.shape, "stats:", weights.mean(), weights.std())
Cheatsheet¶
| Distribution | Function |
|---|---|
| Uniform [0, 1) | rng.random(size) |
| Uniform [a, b) | rng.uniform(a, b, size) |
| Integers [low, high) | rng.integers(low, high, size) |
| Standard normal | rng.standard_normal(size) |
| Normal(μ, σ) | rng.normal(mu, sigma, size) |
| Exponential | rng.exponential(scale, size) |
| Poisson | rng.poisson(lam, size) |
| Binomial | rng.binomial(n, p, size) |
| Pick from list | rng.choice(items, size) |
| Permutation | rng.permutation(arr) |
| Shuffle in place | rng.shuffle(arr) |
Always start with: rng = np.random.default_rng(seed=42).
Common pitfalls¶
- ❗ Using
np.random.seed()globally — the old API. Usedefault_rng(seed=...)perrngfor cleaner code. - ❗
rng.random()vsrng.uniform(0, 1)— same thing,randomis shorthand. - ❗
rng.integers(0, 10)is exclusive of 10 — passendpoint=Trueto include it. - ❗ Probabilities don't sum to 1 —
rng.choice(..., p=weights)raises ifweightsdoesn't sum to 1. - ❗ Shuffling 2D arrays — only the first axis —
rng.shuffle(matrix)shuffles rows, not cells. Userng.permutationindexing if you want more control.
Practice¶
What does this print?
Expected: [ 9 77 65 44 9]
Make this produce the same output every run (it currently changes)
Expected: [0.77395605 0.43887844 0.85859792]
Quiz — Quick check¶
What you remember
Q1. Why prefer np.random.default_rng() over np.random.rand()?
- It's required in NumPy 2.0
- It's the modern, properly-engineered API with better statistical properties and explicit seeds
- It's faster
-
np.random.randis removed
Why:
default_rng()returns aGeneratorobject you can seed and pass around. The legacynp.random.rand()and friends share global state, which is bug-prone in larger codebases.
Q2. Does rng.integers(0, 10, size=5) include 10?
- Yes — both ends inclusive
- No — high is exclusive by default (pass
endpoint=Trueto include) - Only sometimes
- Depends on the seed
Why: Default is half-open
[low, high)— same as Python'srange(). To include the endpoint, userng.integers(0, 10, size=5, endpoint=True).
Q3. What does seeding do?
- Makes results faster
- Makes random sequences reproducible — same seed → same numbers every run
- Removes bias from samples
- Required for the random module to work
Why: Random number generators are deterministic — they produce a sequence based on a seed. Same seed = same sequence. Essential for testing, debugging, and reproducible papers.
Common doubts¶
Should I use the same rng across the whole program?
Yes — that way you can seed once at startup and get reproducible results everywhere. Pass rng as an argument to functions that need randomness. Avoid creating new default_rng() calls scattered through code.
rng.choice with replace=False for a large array is slow — why?
replace=False requires NumPy to track which items have been picked. For huge arrays where you need a small unique sample, prefer rng.permutation(N)[:k] (shuffle indices, take the first k). Same result, often faster.
What's the difference between rng.shuffle and rng.permutation?
shuffle modifies the array in place and returns None. permutation returns a shuffled copy. Use permutation when you want the original preserved; use shuffle when you don't care about the original.