Time Series¶

Pandas was originally built for finance data — its date/time handling is excellent.

Creating datetimes¶

import pandas as pd

# Single date
print(pd.Timestamp("2025-01-15"))

# From components
print(pd.Timestamp(year=2025, month=1, day=15, hour=14, minute=30))

# Now
print(pd.Timestamp.now())

# Parse strings
print(pd.to_datetime("2025-01-15"))
print(pd.to_datetime("15/01/2025", dayfirst=True))
print(pd.to_datetime(["2025-01-15", "2025-02-20", "2025-03-10"]))

Date ranges — `pd.date_range`¶

import pandas as pd

# 7 daily dates
print(pd.date_range("2025-01-01", periods=7, freq="D"))
print()

# Until end date
print(pd.date_range("2025-01-01", "2025-01-10"))
print()

# Weekly
print(pd.date_range("2025-01-01", periods=5, freq="W"))
print()

# Business days (Mon-Fri)
print(pd.date_range("2025-01-01", periods=10, freq="B"))
print()

# Monthly (last day of each month)
print(pd.date_range("2025-01-01", periods=4, freq="ME"))

Common frequency codes:

Code	Meaning
`D`	Daily
`B`	Business day (Mon-Fri)
`W`	Weekly (Sunday)
`ME`	Month end
`MS`	Month start
`Q` / `QE`	Quarter end
`Y` / `YE`	Year end
`h`	Hourly
`min`	Minutely
`s`	Secondly

Setting a datetime index¶

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
n = 10

df = pd.DataFrame({
    "date":  pd.date_range("2025-01-01", periods=n, freq="D"),
    "sales": rng.integers(100, 500, size=n),
})

# Set date as index — gives us time-series superpowers
df = df.set_index("date")
print(df)

Date-based selection¶

With a datetime index, you can slice by date strings:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.integers(0, 100, size=60),
}, index=pd.date_range("2024-11-01", periods=60, freq="D"))

# All rows in December 2024
print(df.loc["2024-12"])
print()

# A date range
print(df.loc["2024-12-15":"2024-12-20"])
print()

# A specific date
print(df.loc["2024-11-15"])

`.dt` accessor — extract components¶

import pandas as pd

df = pd.DataFrame({
    "joined": pd.to_datetime(["2024-01-15", "2024-03-22", "2024-07-09", "2024-11-30"])
})

df["year"]    = df["joined"].dt.year
df["month"]   = df["joined"].dt.month
df["day"]     = df["joined"].dt.day
df["weekday"] = df["joined"].dt.day_name()
df["quarter"] = df["joined"].dt.quarter
df["is_weekend"] = df["joined"].dt.weekday.isin([5, 6])

print(df)

Time differences — timedelta¶

import pandas as pd

start = pd.Timestamp("2025-01-15")
end   = pd.Timestamp("2025-03-20")

diff = end - start
print(diff)                       # Timedelta('64 days 00:00:00')
print(diff.days)                  # 64
print(diff.total_seconds() / 3600) # in hours

# Add / subtract
print(start + pd.Timedelta(days=30))
print(start + pd.DateOffset(months=3))   # smart month addition

Resampling — change the frequency¶

Resample is the time-series version of groupby. Aggregate over time buckets.

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.integers(0, 100, size=90),
}, index=pd.date_range("2025-01-01", periods=90, freq="D"))

# Weekly averages
print("Weekly mean (first 5):")
print(df.resample("W").mean().head())
print()

# Monthly totals
print("Monthly sum:")
print(df.resample("ME").sum())

Frequency	Code
Hourly	`"h"`
Daily	`"D"`
Weekly	`"W"`
Monthly	`"ME"` (end) / `"MS"` (start)
Quarterly	`"QE"`
Yearly	`"YE"`

Multiple aggs at once¶

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "price": rng.normal(100, 5, size=30),
}, index=pd.date_range("2025-01-01", periods=30, freq="D"))

# Weekly: open / high / low / close style
weekly = df.resample("W").agg(
    open  = ("price","first"),
    high  = ("price","max"),
    low   = ("price","min"),
    close = ("price","last"),
    avg   = ("price","mean"),
)
print(weekly.round(2))

Rolling windows — moving averages¶

The classic "smooth the noise" tool:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.normal(50, 5, size=20),
}, index=pd.date_range("2025-01-01", periods=20, freq="D"))

df["rolling_3"] = df["value"].rolling(window=3).mean()
df["rolling_7"] = df["value"].rolling(window=7).mean()
print(df.round(1))

The first window-1 rows are NaN (not enough data to compute the average).

Expanding window — running totals¶

Cumulative aggregation from the start:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "daily": rng.integers(10, 50, size=10),
}, index=pd.date_range("2025-01-01", periods=10, freq="D"))

df["cumulative"]   = df["daily"].cumsum()
df["running_avg"]  = df["daily"].expanding().mean()
print(df.round(1))

Time zones¶

import pandas as pd

ts = pd.Timestamp("2025-01-15 10:00:00")
print("Naive (no TZ):", ts)

# Localize to a time zone
ist = ts.tz_localize("Asia/Kolkata")
print("IST :", ist)

# Convert to another zone
utc = ist.tz_convert("UTC")
print("UTC :", utc)

ny = ist.tz_convert("America/New_York")
print("NYC :", ny)

Shifting — lag / lead values¶

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "price": rng.normal(100, 5, size=8).round(2),
}, index=pd.date_range("2025-01-01", periods=8, freq="D"))

df["prev_day"]  = df["price"].shift(1)
df["next_day"]  = df["price"].shift(-1)
df["change"]    = df["price"] - df["prev_day"]
df["pct_chg"]   = df["price"].pct_change() * 100

print(df.round(2))

A complete time-series example — stock analysis¶

import pandas as pd
import numpy as np

rng = np.random.default_rng(42)
n = 90

# Fake daily price series
prices = 100 * (1 + rng.normal(0.0005, 0.02, size=n).cumsum())
df = pd.DataFrame({
    "price": prices.round(2),
}, index=pd.date_range("2025-01-01", periods=n, freq="B"))

# Daily return
df["return"] = df["price"].pct_change()

# 20-day rolling average ("trend")
df["ma20"]   = df["price"].rolling(20).mean()

# 20-day rolling volatility
df["vol20"]  = df["return"].rolling(20).std() * np.sqrt(252)

# Cumulative return
df["cum_return"] = (1 + df["return"]).cumprod() - 1

print(df.tail(10).round(4))

# Summary
print("\n--- Summary ---")
print(f"Total return : {df['cum_return'].iloc[-1]*100:.2f}%")
print(f"Avg daily ret: {df['return'].mean()*252*100:.2f}% annualized")
print(f"Avg vol      : {df['vol20'].mean()*100:.2f}% annualized")

Cheatsheet¶

Goal	Code
Parse date string(s)	`pd.to_datetime(s)`
Date range	`pd.date_range(start, periods=n, freq="D")`
Set as index	`df.set_index("date")`
Slice by date string	`df.loc["2025-01"]`
Extract year/month/etc	`df["c"].dt.year`, `dt.month`, `dt.day_name()`
Resample (group-by-time)	`df.resample("ME").sum()`
Moving average	`df["c"].rolling(window=7).mean()`
Cumulative sum	`df["c"].cumsum()`
Lag / lead	`df["c"].shift(1)` / `shift(-1)`
Percent change	`df["c"].pct_change()`
Add time delta	`ts + pd.Timedelta(days=7)`
Time zone localize	`ts.tz_localize("UTC")`
Time zone convert	`ts.tz_convert("Asia/Kolkata")`

Common pitfalls¶

❗ MS vs ME — MS is month start (1^st), ME is month end (28-31^st). Easy to confuse.
❗ Mixed time zones — operations between naive and tz-aware timestamps fail. Stick with one approach.
❗ rolling() returns NaN at the start — until the window is full. Use min_periods=1 to compute earlier.
❗ dt doesn't work on non-datetime columns — convert with pd.to_datetime() first.
❗ resample only works on a datetime INDEX — df.set_index("date") first.
❗ freq="M" warning — newer Pandas wants "ME" (month end). Update your code.

Practice¶

What does this print?

Expected: 2025

import pandas as pd
ts = pd.Timestamp("2025-03-15")
print(ts.year)

Get the day of the week as a name (e.g., 'Wednesday')

Expected: Wednesday

import pandas as pd
ts = pd.Timestamp("2025-01-15")
print(ts.day)            # bug: returns the day number, not the name

Quiz — Quick check¶

What you remember

Q1. What's the difference between df.resample("ME").sum() and df.groupby(df.index.month).sum()?

No difference
resample is time-aware (fills missing periods with zeros, handles edge dates); groupby(month) lumps the same month across all years together
resample is faster
resample only works on years

Why: resample("ME") produces one row per (year, month). groupby(month) would lump January 2024 and January 2025 together. Different operations.

Q2. Why does df["c"].rolling(window=7).mean() return NaN for the first 6 rows?

Not enough data to compute a 7-day average until row 7
Bug in Pandas
The first values were missing
Random sample

Why: A rolling window needs window values to compute the average. The first window-1 rows don't have enough preceding data. Pass min_periods=1 if you want partial averages from the start.

Q3. What does df["price"].pct_change() compute?

The absolute change
The percentage change from the previous row (current / prev - 1)
The price percentile
A rolling average

Why: pct_change is shorthand for (s - s.shift(1)) / s.shift(1). Heavily used in finance ("daily returns") and analytics ("week-over-week growth").

Common doubts¶

Why does freq=\"M\" give a deprecation warning?

Newer Pandas distinguishes month-end (ME) from month-start (MS). The old M was ambiguous. Update to ME (or MS). Same warnings exist for Q → QE, Y → YE.

When should I localize vs convert time zones?

tz_localize("UTC") attaches a timezone to a naive timestamp — declaring "this naive timestamp represents UTC time". tz_convert("US/Eastern") changes the timezone of an already-aware timestamp. Mixing the two is the most common bug.

How do I get business-day-aware date arithmetic?

Use pd.tseries.offsets: ts + pd.tseries.offsets.BDay(5) adds 5 business days (skipping weekends). For more complex calendars (holidays), see pd.tseries.holiday. Useful for finance, scheduling.

What's next¶

→ Real-World Examples