Skip to content

Time Series

Pandas was originally built for finance data — its date/time handling is excellent.

Creating datetimes

import pandas as pd

# Single date
print(pd.Timestamp("2025-01-15"))

# From components
print(pd.Timestamp(year=2025, month=1, day=15, hour=14, minute=30))

# Now
print(pd.Timestamp.now())

# Parse strings
print(pd.to_datetime("2025-01-15"))
print(pd.to_datetime("15/01/2025", dayfirst=True))
print(pd.to_datetime(["2025-01-15", "2025-02-20", "2025-03-10"]))

Date ranges — pd.date_range

import pandas as pd

# 7 daily dates
print(pd.date_range("2025-01-01", periods=7, freq="D"))
print()

# Until end date
print(pd.date_range("2025-01-01", "2025-01-10"))
print()

# Weekly
print(pd.date_range("2025-01-01", periods=5, freq="W"))
print()

# Business days (Mon-Fri)
print(pd.date_range("2025-01-01", periods=10, freq="B"))
print()

# Monthly (last day of each month)
print(pd.date_range("2025-01-01", periods=4, freq="ME"))

Common frequency codes:

Code Meaning
D Daily
B Business day (Mon-Fri)
W Weekly (Sunday)
ME Month end
MS Month start
Q / QE Quarter end
Y / YE Year end
h Hourly
min Minutely
s Secondly

Setting a datetime index

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
n = 10

df = pd.DataFrame({
    "date":  pd.date_range("2025-01-01", periods=n, freq="D"),
    "sales": rng.integers(100, 500, size=n),
})

# Set date as index — gives us time-series superpowers
df = df.set_index("date")
print(df)

Date-based selection

With a datetime index, you can slice by date strings:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.integers(0, 100, size=60),
}, index=pd.date_range("2024-11-01", periods=60, freq="D"))

# All rows in December 2024
print(df.loc["2024-12"])
print()

# A date range
print(df.loc["2024-12-15":"2024-12-20"])
print()

# A specific date
print(df.loc["2024-11-15"])

.dt accessor — extract components

import pandas as pd

df = pd.DataFrame({
    "joined": pd.to_datetime(["2024-01-15", "2024-03-22", "2024-07-09", "2024-11-30"])
})

df["year"]    = df["joined"].dt.year
df["month"]   = df["joined"].dt.month
df["day"]     = df["joined"].dt.day
df["weekday"] = df["joined"].dt.day_name()
df["quarter"] = df["joined"].dt.quarter
df["is_weekend"] = df["joined"].dt.weekday.isin([5, 6])

print(df)

Time differences — timedelta

import pandas as pd

start = pd.Timestamp("2025-01-15")
end   = pd.Timestamp("2025-03-20")

diff = end - start
print(diff)                       # Timedelta('64 days 00:00:00')
print(diff.days)                  # 64
print(diff.total_seconds() / 3600) # in hours

# Add / subtract
print(start + pd.Timedelta(days=30))
print(start + pd.DateOffset(months=3))   # smart month addition

Resampling — change the frequency

Resample is the time-series version of groupby. Aggregate over time buckets.

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.integers(0, 100, size=90),
}, index=pd.date_range("2025-01-01", periods=90, freq="D"))

# Weekly averages
print("Weekly mean (first 5):")
print(df.resample("W").mean().head())
print()

# Monthly totals
print("Monthly sum:")
print(df.resample("ME").sum())
Frequency Code
Hourly "h"
Daily "D"
Weekly "W"
Monthly "ME" (end) / "MS" (start)
Quarterly "QE"
Yearly "YE"

Multiple aggs at once

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "price": rng.normal(100, 5, size=30),
}, index=pd.date_range("2025-01-01", periods=30, freq="D"))

# Weekly: open / high / low / close style
weekly = df.resample("W").agg(
    open  = ("price","first"),
    high  = ("price","max"),
    low   = ("price","min"),
    close = ("price","last"),
    avg   = ("price","mean"),
)
print(weekly.round(2))

Rolling windows — moving averages

The classic "smooth the noise" tool:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "value": rng.normal(50, 5, size=20),
}, index=pd.date_range("2025-01-01", periods=20, freq="D"))

df["rolling_3"] = df["value"].rolling(window=3).mean()
df["rolling_7"] = df["value"].rolling(window=7).mean()
print(df.round(1))

The first window-1 rows are NaN (not enough data to compute the average).

Expanding window — running totals

Cumulative aggregation from the start:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "daily": rng.integers(10, 50, size=10),
}, index=pd.date_range("2025-01-01", periods=10, freq="D"))

df["cumulative"]   = df["daily"].cumsum()
df["running_avg"]  = df["daily"].expanding().mean()
print(df.round(1))

Time zones

import pandas as pd

ts = pd.Timestamp("2025-01-15 10:00:00")
print("Naive (no TZ):", ts)

# Localize to a time zone
ist = ts.tz_localize("Asia/Kolkata")
print("IST :", ist)

# Convert to another zone
utc = ist.tz_convert("UTC")
print("UTC :", utc)

ny = ist.tz_convert("America/New_York")
print("NYC :", ny)

Shifting — lag / lead values

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "price": rng.normal(100, 5, size=8).round(2),
}, index=pd.date_range("2025-01-01", periods=8, freq="D"))

df["prev_day"]  = df["price"].shift(1)
df["next_day"]  = df["price"].shift(-1)
df["change"]    = df["price"] - df["prev_day"]
df["pct_chg"]   = df["price"].pct_change() * 100

print(df.round(2))

A complete time-series example — stock analysis

import pandas as pd
import numpy as np

rng = np.random.default_rng(42)
n = 90

# Fake daily price series
prices = 100 * (1 + rng.normal(0.0005, 0.02, size=n).cumsum())
df = pd.DataFrame({
    "price": prices.round(2),
}, index=pd.date_range("2025-01-01", periods=n, freq="B"))

# Daily return
df["return"] = df["price"].pct_change()

# 20-day rolling average ("trend")
df["ma20"]   = df["price"].rolling(20).mean()

# 20-day rolling volatility
df["vol20"]  = df["return"].rolling(20).std() * np.sqrt(252)

# Cumulative return
df["cum_return"] = (1 + df["return"]).cumprod() - 1

print(df.tail(10).round(4))

# Summary
print("\n--- Summary ---")
print(f"Total return : {df['cum_return'].iloc[-1]*100:.2f}%")
print(f"Avg daily ret: {df['return'].mean()*252*100:.2f}% annualized")
print(f"Avg vol      : {df['vol20'].mean()*100:.2f}% annualized")

Cheatsheet

Goal Code
Parse date string(s) pd.to_datetime(s)
Date range pd.date_range(start, periods=n, freq="D")
Set as index df.set_index("date")
Slice by date string df.loc["2025-01"]
Extract year/month/etc df["c"].dt.year, dt.month, dt.day_name()
Resample (group-by-time) df.resample("ME").sum()
Moving average df["c"].rolling(window=7).mean()
Cumulative sum df["c"].cumsum()
Lag / lead df["c"].shift(1) / shift(-1)
Percent change df["c"].pct_change()
Add time delta ts + pd.Timedelta(days=7)
Time zone localize ts.tz_localize("UTC")
Time zone convert ts.tz_convert("Asia/Kolkata")

Common pitfalls

  • MS vs MEMS is month start (1st), ME is month end (28-31st). Easy to confuse.
  • Mixed time zones — operations between naive and tz-aware timestamps fail. Stick with one approach.
  • rolling() returns NaN at the start — until the window is full. Use min_periods=1 to compute earlier.
  • dt doesn't work on non-datetime columns — convert with pd.to_datetime() first.
  • resample only works on a datetime INDEXdf.set_index("date") first.
  • freq="M" warning — newer Pandas wants "ME" (month end). Update your code.

Practice

What does this print?

Expected: 2025

import pandas as pd
ts = pd.Timestamp("2025-03-15")
print(ts.year)

Get the day of the week as a name (e.g., 'Wednesday')

Expected: Wednesday

import pandas as pd
ts = pd.Timestamp("2025-01-15")
print(ts.day)            # bug: returns the day number, not the name

Quiz — Quick check

What you remember

Q1. What's the difference between df.resample("ME").sum() and df.groupby(df.index.month).sum()?

  • No difference
  • resample is time-aware (fills missing periods with zeros, handles edge dates); groupby(month) lumps the same month across all years together
  • resample is faster
  • resample only works on years

Why: resample("ME") produces one row per (year, month). groupby(month) would lump January 2024 and January 2025 together. Different operations.

Q2. Why does df["c"].rolling(window=7).mean() return NaN for the first 6 rows?

  • Not enough data to compute a 7-day average until row 7
  • Bug in Pandas
  • The first values were missing
  • Random sample

Why: A rolling window needs window values to compute the average. The first window-1 rows don't have enough preceding data. Pass min_periods=1 if you want partial averages from the start.

Q3. What does df["price"].pct_change() compute?

  • The absolute change
  • The percentage change from the previous row (current / prev - 1)
  • The price percentile
  • A rolling average

Why: pct_change is shorthand for (s - s.shift(1)) / s.shift(1). Heavily used in finance ("daily returns") and analytics ("week-over-week growth").

Common doubts

Why does freq=\"M\" give a deprecation warning?

Newer Pandas distinguishes month-end (ME) from month-start (MS). The old M was ambiguous. Update to ME (or MS). Same warnings exist for QQE, YYE.

When should I localize vs convert time zones?

tz_localize("UTC") attaches a timezone to a naive timestamp — declaring "this naive timestamp represents UTC time". tz_convert("US/Eastern") changes the timezone of an already-aware timestamp. Mixing the two is the most common bug.

How do I get business-day-aware date arithmetic?

Use pd.tseries.offsets: ts + pd.tseries.offsets.BDay(5) adds 5 business days (skipping weekends). For more complex calendars (holidays), see pd.tseries.holiday. Useful for finance, scheduling.

What's next

Real-World Examples