Time Series¶
Pandas was originally built for finance data — its date/time handling is excellent.
Creating datetimes¶
import pandas as pd
# Single date
print(pd.Timestamp("2025-01-15"))
# From components
print(pd.Timestamp(year=2025, month=1, day=15, hour=14, minute=30))
# Now
print(pd.Timestamp.now())
# Parse strings
print(pd.to_datetime("2025-01-15"))
print(pd.to_datetime("15/01/2025", dayfirst=True))
print(pd.to_datetime(["2025-01-15", "2025-02-20", "2025-03-10"]))
Date ranges — pd.date_range¶
import pandas as pd
# 7 daily dates
print(pd.date_range("2025-01-01", periods=7, freq="D"))
print()
# Until end date
print(pd.date_range("2025-01-01", "2025-01-10"))
print()
# Weekly
print(pd.date_range("2025-01-01", periods=5, freq="W"))
print()
# Business days (Mon-Fri)
print(pd.date_range("2025-01-01", periods=10, freq="B"))
print()
# Monthly (last day of each month)
print(pd.date_range("2025-01-01", periods=4, freq="ME"))
Common frequency codes:
| Code | Meaning |
|---|---|
D |
Daily |
B |
Business day (Mon-Fri) |
W |
Weekly (Sunday) |
ME |
Month end |
MS |
Month start |
Q / QE |
Quarter end |
Y / YE |
Year end |
h |
Hourly |
min |
Minutely |
s |
Secondly |
Setting a datetime index¶
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
n = 10
df = pd.DataFrame({
"date": pd.date_range("2025-01-01", periods=n, freq="D"),
"sales": rng.integers(100, 500, size=n),
})
# Set date as index — gives us time-series superpowers
df = df.set_index("date")
print(df)
Date-based selection¶
With a datetime index, you can slice by date strings:
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"value": rng.integers(0, 100, size=60),
}, index=pd.date_range("2024-11-01", periods=60, freq="D"))
# All rows in December 2024
print(df.loc["2024-12"])
print()
# A date range
print(df.loc["2024-12-15":"2024-12-20"])
print()
# A specific date
print(df.loc["2024-11-15"])
.dt accessor — extract components¶
import pandas as pd
df = pd.DataFrame({
"joined": pd.to_datetime(["2024-01-15", "2024-03-22", "2024-07-09", "2024-11-30"])
})
df["year"] = df["joined"].dt.year
df["month"] = df["joined"].dt.month
df["day"] = df["joined"].dt.day
df["weekday"] = df["joined"].dt.day_name()
df["quarter"] = df["joined"].dt.quarter
df["is_weekend"] = df["joined"].dt.weekday.isin([5, 6])
print(df)
Time differences — timedelta¶
import pandas as pd
start = pd.Timestamp("2025-01-15")
end = pd.Timestamp("2025-03-20")
diff = end - start
print(diff) # Timedelta('64 days 00:00:00')
print(diff.days) # 64
print(diff.total_seconds() / 3600) # in hours
# Add / subtract
print(start + pd.Timedelta(days=30))
print(start + pd.DateOffset(months=3)) # smart month addition
Resampling — change the frequency¶
Resample is the time-series version of groupby. Aggregate over time buckets.
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"value": rng.integers(0, 100, size=90),
}, index=pd.date_range("2025-01-01", periods=90, freq="D"))
# Weekly averages
print("Weekly mean (first 5):")
print(df.resample("W").mean().head())
print()
# Monthly totals
print("Monthly sum:")
print(df.resample("ME").sum())
| Frequency | Code |
|---|---|
| Hourly | "h" |
| Daily | "D" |
| Weekly | "W" |
| Monthly | "ME" (end) / "MS" (start) |
| Quarterly | "QE" |
| Yearly | "YE" |
Multiple aggs at once¶
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"price": rng.normal(100, 5, size=30),
}, index=pd.date_range("2025-01-01", periods=30, freq="D"))
# Weekly: open / high / low / close style
weekly = df.resample("W").agg(
open = ("price","first"),
high = ("price","max"),
low = ("price","min"),
close = ("price","last"),
avg = ("price","mean"),
)
print(weekly.round(2))
Rolling windows — moving averages¶
The classic "smooth the noise" tool:
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"value": rng.normal(50, 5, size=20),
}, index=pd.date_range("2025-01-01", periods=20, freq="D"))
df["rolling_3"] = df["value"].rolling(window=3).mean()
df["rolling_7"] = df["value"].rolling(window=7).mean()
print(df.round(1))
The first window-1 rows are NaN (not enough data to compute the average).
Expanding window — running totals¶
Cumulative aggregation from the start:
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"daily": rng.integers(10, 50, size=10),
}, index=pd.date_range("2025-01-01", periods=10, freq="D"))
df["cumulative"] = df["daily"].cumsum()
df["running_avg"] = df["daily"].expanding().mean()
print(df.round(1))
Time zones¶
import pandas as pd
ts = pd.Timestamp("2025-01-15 10:00:00")
print("Naive (no TZ):", ts)
# Localize to a time zone
ist = ts.tz_localize("Asia/Kolkata")
print("IST :", ist)
# Convert to another zone
utc = ist.tz_convert("UTC")
print("UTC :", utc)
ny = ist.tz_convert("America/New_York")
print("NYC :", ny)
Shifting — lag / lead values¶
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
df = pd.DataFrame({
"price": rng.normal(100, 5, size=8).round(2),
}, index=pd.date_range("2025-01-01", periods=8, freq="D"))
df["prev_day"] = df["price"].shift(1)
df["next_day"] = df["price"].shift(-1)
df["change"] = df["price"] - df["prev_day"]
df["pct_chg"] = df["price"].pct_change() * 100
print(df.round(2))
A complete time-series example — stock analysis¶
import pandas as pd
import numpy as np
rng = np.random.default_rng(42)
n = 90
# Fake daily price series
prices = 100 * (1 + rng.normal(0.0005, 0.02, size=n).cumsum())
df = pd.DataFrame({
"price": prices.round(2),
}, index=pd.date_range("2025-01-01", periods=n, freq="B"))
# Daily return
df["return"] = df["price"].pct_change()
# 20-day rolling average ("trend")
df["ma20"] = df["price"].rolling(20).mean()
# 20-day rolling volatility
df["vol20"] = df["return"].rolling(20).std() * np.sqrt(252)
# Cumulative return
df["cum_return"] = (1 + df["return"]).cumprod() - 1
print(df.tail(10).round(4))
# Summary
print("\n--- Summary ---")
print(f"Total return : {df['cum_return'].iloc[-1]*100:.2f}%")
print(f"Avg daily ret: {df['return'].mean()*252*100:.2f}% annualized")
print(f"Avg vol : {df['vol20'].mean()*100:.2f}% annualized")
Cheatsheet¶
| Goal | Code |
|---|---|
| Parse date string(s) | pd.to_datetime(s) |
| Date range | pd.date_range(start, periods=n, freq="D") |
| Set as index | df.set_index("date") |
| Slice by date string | df.loc["2025-01"] |
| Extract year/month/etc | df["c"].dt.year, dt.month, dt.day_name() |
| Resample (group-by-time) | df.resample("ME").sum() |
| Moving average | df["c"].rolling(window=7).mean() |
| Cumulative sum | df["c"].cumsum() |
| Lag / lead | df["c"].shift(1) / shift(-1) |
| Percent change | df["c"].pct_change() |
| Add time delta | ts + pd.Timedelta(days=7) |
| Time zone localize | ts.tz_localize("UTC") |
| Time zone convert | ts.tz_convert("Asia/Kolkata") |
Common pitfalls¶
- ❗
MSvsME—MSis month start (1st),MEis month end (28-31st). Easy to confuse. - ❗ Mixed time zones — operations between naive and tz-aware timestamps fail. Stick with one approach.
- ❗
rolling()returns NaN at the start — until the window is full. Usemin_periods=1to compute earlier. - ❗
dtdoesn't work on non-datetime columns — convert withpd.to_datetime()first. - ❗
resampleonly works on a datetime INDEX —df.set_index("date")first. - ❗
freq="M"warning — newer Pandas wants"ME"(month end). Update your code.
Practice¶
What does this print?
Expected: 2025
Get the day of the week as a name (e.g., 'Wednesday')
Expected: Wednesday
Quiz — Quick check¶
What you remember
Q1. What's the difference between df.resample("ME").sum() and df.groupby(df.index.month).sum()?
- No difference
-
resampleis time-aware (fills missing periods with zeros, handles edge dates);groupby(month)lumps the same month across all years together -
resampleis faster -
resampleonly works on years
Why:
resample("ME")produces one row per (year, month).groupby(month)would lump January 2024 and January 2025 together. Different operations.
Q2. Why does df["c"].rolling(window=7).mean() return NaN for the first 6 rows?
- Not enough data to compute a 7-day average until row 7
- Bug in Pandas
- The first values were missing
- Random sample
Why: A rolling window needs
windowvalues to compute the average. The firstwindow-1rows don't have enough preceding data. Passmin_periods=1if you want partial averages from the start.
Q3. What does df["price"].pct_change() compute?
- The absolute change
- The percentage change from the previous row (current / prev - 1)
- The price percentile
- A rolling average
Why:
pct_changeis shorthand for(s - s.shift(1)) / s.shift(1). Heavily used in finance ("daily returns") and analytics ("week-over-week growth").
Common doubts¶
Why does freq=\"M\" give a deprecation warning?
Newer Pandas distinguishes month-end (ME) from month-start (MS). The old M was ambiguous. Update to ME (or MS). Same warnings exist for Q → QE, Y → YE.
When should I localize vs convert time zones?
tz_localize("UTC") attaches a timezone to a naive timestamp — declaring "this naive timestamp represents UTC time". tz_convert("US/Eastern") changes the timezone of an already-aware timestamp. Mixing the two is the most common bug.
How do I get business-day-aware date arithmetic?
Use pd.tseries.offsets: ts + pd.tseries.offsets.BDay(5) adds 5 business days (skipping weekends). For more complex calendars (holidays), see pd.tseries.holiday. Useful for finance, scheduling.