Memory & Conversation State¶
1. Why this matters¶
Every chat model call is independent — the model has no idea what was said two turns ago. You have to:
- Save the user's message + the AI's reply somewhere.
- On the next turn, load history + new message and pass them all to the model.
Without this you get amnesia chatbots ("Hi, my name is Alice." → "Cool!" → "What's my name?" → "I don't know.").
2. Mental model¶
Two layers:
ChatMessageHistory— a storage interface.add_messages,messages,clear. Backed by an in-memory list, Redis, Postgres, DynamoDB, etc.RunnableWithMessageHistory— a wrapper around any chain that reads/writes history automatically based on asession_id.
flowchart LR
U[New user message] --> WMH[RunnableWithMessageHistory]
WMH -->|read by session_id| H[History Store<br/>memory/redis/postgres]
H --> P[Prompt with<br/>MessagesPlaceholder]
U --> P
P --> M[Chat Model]
M --> A[AI reply]
A -->|append to history| H
A --> R[Response to user]
The legacy ConversationBufferMemory / ConversationSummaryMemory / ConversationBufferWindowMemory classes still exist but they're deprecated in favor of this pattern.
3. Architecture / Flow¶
Memory types — what gets retained:
flowchart TD
subgraph Buffer [Buffer — keep everything]
B1[turn 1] --> B2[turn 2] --> B3[turn 3] --> B4[turn 4]
end
subgraph Window [BufferWindow — keep last k turns]
W1[turn 2] --> W2[turn 3] --> W3[turn 4]
end
subgraph Summary [Summary — LLM summarizes old turns]
S1["summary: 'user introduced themselves as Alice...'"] --> S2[turn 3] --> S3[turn 4]
end
subgraph Hybrid [BufferWindow + Summary]
H1[summary of old] --> H2[recent turns verbatim]
end
| Strategy | What it stores | When |
|---|---|---|
| Buffer | ALL messages verbatim | Short conversations |
| BufferWindow (last k) | Only last k messages | Cheap, lossy, simple |
| Summary | Running LLM-generated summary | Long conversations, but lossy |
| Buffer + Summary | Summary of old + recent verbatim | Best of both — long context budget |
| Vector | All messages embedded; retrieve relevant ones | "What did we discuss about pricing?" use cases |
| Entity | Tracks named entities (people, dates) separately | Long-running personalized agents |
4. Core concepts¶
BaseChatMessageHistory— the interface:messages,add_user_message,add_ai_message,clear.InMemoryChatMessageHistory— default, RAM-only. Lost on restart.RedisChatMessageHistory(session_id, url)— persistent, multi-process.PostgresChatMessageHistory,MongoDBChatMessageHistory, etc. — other backends.MessagesPlaceholder("history")— the slot in the prompt where history gets injected.RunnableWithMessageHistory— wraps a chain; on each.invoke(input, config={"configurable": {"session_id": ...}})it loads history → calls chain → appends new messages.
5. Code — minimal working example¶
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
# 1. Prompt with a slot for history
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder("history"),
("human", "{input}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()
# 2. Per-session in-memory history
store = {}
def get_history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
# 3. Wrap the chain
chat = RunnableWithMessageHistory(
chain,
get_history,
input_messages_key="input",
history_messages_key="history",
)
config = {"configurable": {"session_id": "alice-123"}}
print(chat.invoke({"input": "Hi, my name is Alice."}, config=config))
print(chat.invoke({"input": "What's my name?"}, config=config))
# → "Your name is Alice."
6. Code — real-world pattern¶
Redis-backed history (persistent, multi-process):
from langchain_community.chat_message_histories import RedisChatMessageHistory
def get_history(session_id):
return RedisChatMessageHistory(
session_id=session_id,
url="redis://localhost:6379/0",
ttl=60 * 60 * 24 * 7, # 7-day expiry
)
chat = RunnableWithMessageHistory(chain, get_history,
input_messages_key="input",
history_messages_key="history")
Trim history to fit a token budget (essential for long conversations):
from langchain_core.messages import trim_messages
trimmer = trim_messages(
max_tokens=2000,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o-mini"),
include_system=True,
allow_partial=False,
start_on="human",
)
# Apply inside the chain — before the prompt sees history
chain = (
{
"input": lambda x: x["input"],
"history": lambda x: trimmer.invoke(x["history"]),
}
| prompt
| ChatOpenAI(model="gpt-4o-mini")
| StrOutputParser()
)
Summarize-and-forget (rolling summary memory):
# Conceptually: when history > N messages, replace the oldest with an LLM-generated summary.
# For non-trivial cases, use LangGraph's built-in checkpointing + a summarization node
# — it's cleaner than rolling your own.
For anything beyond simple buffer/window patterns — switch to LangGraph. It has first-class state, checkpointing, and built-in patterns for summarization, entity tracking, and long-running agents.
7. Common pitfalls¶
- ❗ Using the deprecated
ConversationBufferMemoryetc. Still works in 0.3 but you'll get deprecation warnings and lose LCEL composability. UseRunnableWithMessageHistory. - ❗ In-memory history in production. Loses everything on process restart. Use Redis / Postgres / DynamoDB.
- ❗ No history trimming. Long chats blow past the model's context window. Always trim by token count.
- ❗ One global history for all users. Always key history by
session_id/user_id. Otherwise users see each other's conversations. - ❗ Storing PII forever. Apply TTL on the history store and have a
/clearflow that calls.clear(). - ❗ Forgetting that
MessagesPlaceholderinjects a LIST of messages. If you accidentally make it a single string variable, role information is lost.
8. When to use vs not use¶
| Strategy | When |
|---|---|
RunnableWithMessageHistory + Buffer |
Short chat sessions, simple chatbot |
+ Redis/Postgres history |
Production, multi-process, persistent |
+ trim_messages |
Conversations that grow long |
| Summary memory | Long sessions where exact wording matters less than the gist |
| LangGraph instead | Anything stateful: multi-step agents, durable workflows, checkpointing, human-in-the-loop |
| No memory at all | One-shot tools (translate, summarize a single doc) |
9. Cheatsheet¶
from langchain_core.chat_history import (
BaseChatMessageHistory,
InMemoryChatMessageHistory,
)
from langchain_community.chat_message_histories import (
RedisChatMessageHistory,
PostgresChatMessageHistory,
DynamoDBChatMessageHistory,
MongoDBChatMessageHistory,
FileChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import (
HumanMessage, AIMessage, SystemMessage,
trim_messages,
)
from langchain_core.prompts import MessagesPlaceholder
# Wire up
chain_with_history = RunnableWithMessageHistory(
runnable=chain,
get_session_history=lambda sid: get_history(sid),
input_messages_key="input",
history_messages_key="history",
# Optional: multi-key configurable
history_factory_config=[
ConfigurableFieldSpec(id="user_id", annotation=str, ...),
],
)
# Invoke
chain_with_history.invoke(
{"input": "..."},
config={"configurable": {"session_id": "abc"}},
)
# Clear
get_history("abc").clear()
10. Q&A — recall test¶
-
Q: Why do LLMs need memory bolted on? A: Each API call is stateless. The model has no way to know what happened in previous calls unless you pass that history in the prompt.
-
Q: Difference between
BufferWindowandSummarymemory? A: Window keeps the last k messages verbatim (fast, lossy at boundaries). Summary uses an LLM to compress old turns into a summary (slower, smoother degradation as conversation grows). -
Q: When does memory belong in LangChain vs LangGraph? A: LangChain memory: simple per-session chat history. LangGraph: any state more complex than a list of messages — agent state, scratch pads, branching, checkpointing.
-
Q: What is
MessagesPlaceholderfor? A: It's a slot in aChatPromptTemplatewhere a list of prior messages can be injected — preserving roles (System/Human/AI). Unlike a string{history}variable. -
Q: Why session_id matter? A: It's the partition key. Without it, all users share one history blob. Always derive it from the authenticated user/session.
Practice¶
What does this print?
Expected: True
Use the session_id to partition each user's history (single shared key is wrong)
Expected: True
Quiz — Quick check¶
What you remember
Q1. What does conversation memory enable?
- The LLM "remembers" prior messages in the same chat session
- Faster responses
- Lower costs
- Better embeddings
Why: Without memory, every message is independent. With memory, the LLM can refer back: "as you mentioned earlier...", "your name is Alice". Essential for multi-turn chat UX.
Q2. Why summarize old messages instead of keeping them all?
- Context windows are limited; summarization keeps relevant info while shrinking tokens
- Required by LangChain
- Faster than passing all messages
- More accurate
Why: A 100-turn conversation has thousands of tokens of history. Summarizing the older turns ("the user is asking about refunds, they're frustrated, they prefer email contact") preserves what matters without filling the prompt.
Q3. What's the session_id for in chat memory?
- Partition key — separates different users' conversation histories
- Authentication
- Rate limiting
- Required by OpenAI
Why: Without
session_id, all users share one history. Always derive it from the authenticated user (or a cookie for anonymous sessions). Multi-tenant safety 101.
Common doubts¶
Where should I store chat history?
Development: in-memory dict (ChatMessageHistory). Production: Redis (fast, ephemeral), Postgres (persistent, queryable), MongoDB (flexible schema). LangChain has wrappers for all common stores.
How much history is enough?
Depends on the use case. Chatbots: last 10-20 turns + a summary of older context. Customer support: the full session, but compressed. Coding assistants: the file + last few interactions. Trade-off between context (more is better) and cost/latency (less is faster/cheaper).
Should agents have memory across sessions?
Yes, for personalization — "Alice prefers detailed answers", "Bob uses PostgreSQL". Store as user-level metadata, retrieve at the start of each session. This is long-term memory vs session memory; both are useful.