Skip to content

Memory & Conversation State

1. Why this matters

Every chat model call is independent — the model has no idea what was said two turns ago. You have to:

  1. Save the user's message + the AI's reply somewhere.
  2. On the next turn, load history + new message and pass them all to the model.

Without this you get amnesia chatbots ("Hi, my name is Alice." → "Cool!" → "What's my name?" → "I don't know.").

2. Mental model

Two layers:

  • ChatMessageHistory — a storage interface. add_messages, messages, clear. Backed by an in-memory list, Redis, Postgres, DynamoDB, etc.
  • RunnableWithMessageHistory — a wrapper around any chain that reads/writes history automatically based on a session_id.
flowchart LR
    U[New user message] --> WMH[RunnableWithMessageHistory]
    WMH -->|read by session_id| H[History Store<br/>memory/redis/postgres]
    H --> P[Prompt with<br/>MessagesPlaceholder]
    U --> P
    P --> M[Chat Model]
    M --> A[AI reply]
    A -->|append to history| H
    A --> R[Response to user]

The legacy ConversationBufferMemory / ConversationSummaryMemory / ConversationBufferWindowMemory classes still exist but they're deprecated in favor of this pattern.

3. Architecture / Flow

Memory types — what gets retained:

flowchart TD
    subgraph Buffer [Buffer — keep everything]
      B1[turn 1] --> B2[turn 2] --> B3[turn 3] --> B4[turn 4]
    end
    subgraph Window [BufferWindow — keep last k turns]
      W1[turn 2] --> W2[turn 3] --> W3[turn 4]
    end
    subgraph Summary [Summary — LLM summarizes old turns]
      S1["summary: 'user introduced themselves as Alice...'"] --> S2[turn 3] --> S3[turn 4]
    end
    subgraph Hybrid [BufferWindow + Summary]
      H1[summary of old] --> H2[recent turns verbatim]
    end
Strategy What it stores When
Buffer ALL messages verbatim Short conversations
BufferWindow (last k) Only last k messages Cheap, lossy, simple
Summary Running LLM-generated summary Long conversations, but lossy
Buffer + Summary Summary of old + recent verbatim Best of both — long context budget
Vector All messages embedded; retrieve relevant ones "What did we discuss about pricing?" use cases
Entity Tracks named entities (people, dates) separately Long-running personalized agents

4. Core concepts

  • BaseChatMessageHistory — the interface: messages, add_user_message, add_ai_message, clear.
  • InMemoryChatMessageHistory — default, RAM-only. Lost on restart.
  • RedisChatMessageHistory(session_id, url) — persistent, multi-process.
  • PostgresChatMessageHistory, MongoDBChatMessageHistory, etc. — other backends.
  • MessagesPlaceholder("history") — the slot in the prompt where history gets injected.
  • RunnableWithMessageHistory — wraps a chain; on each .invoke(input, config={"configurable": {"session_id": ...}}) it loads history → calls chain → appends new messages.

5. Code — minimal working example

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# 1. Prompt with a slot for history
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder("history"),
    ("human", "{input}"),
])

chain = prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()

# 2. Per-session in-memory history
store = {}
def get_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# 3. Wrap the chain
chat = RunnableWithMessageHistory(
    chain,
    get_history,
    input_messages_key="input",
    history_messages_key="history",
)

config = {"configurable": {"session_id": "alice-123"}}

print(chat.invoke({"input": "Hi, my name is Alice."}, config=config))
print(chat.invoke({"input": "What's my name?"},        config=config))
# → "Your name is Alice."

6. Code — real-world pattern

Redis-backed history (persistent, multi-process):

from langchain_community.chat_message_histories import RedisChatMessageHistory

def get_history(session_id):
    return RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379/0",
        ttl=60 * 60 * 24 * 7,   # 7-day expiry
    )

chat = RunnableWithMessageHistory(chain, get_history,
                                  input_messages_key="input",
                                  history_messages_key="history")

Trim history to fit a token budget (essential for long conversations):

from langchain_core.messages import trim_messages

trimmer = trim_messages(
    max_tokens=2000,
    strategy="last",
    token_counter=ChatOpenAI(model="gpt-4o-mini"),
    include_system=True,
    allow_partial=False,
    start_on="human",
)

# Apply inside the chain — before the prompt sees history
chain = (
    {
        "input":   lambda x: x["input"],
        "history": lambda x: trimmer.invoke(x["history"]),
    }
    | prompt
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

Summarize-and-forget (rolling summary memory):

# Conceptually: when history > N messages, replace the oldest with an LLM-generated summary.
# For non-trivial cases, use LangGraph's built-in checkpointing + a summarization node
# — it's cleaner than rolling your own.

For anything beyond simple buffer/window patterns — switch to LangGraph. It has first-class state, checkpointing, and built-in patterns for summarization, entity tracking, and long-running agents.

7. Common pitfalls

  • Using the deprecated ConversationBufferMemory etc. Still works in 0.3 but you'll get deprecation warnings and lose LCEL composability. Use RunnableWithMessageHistory.
  • In-memory history in production. Loses everything on process restart. Use Redis / Postgres / DynamoDB.
  • No history trimming. Long chats blow past the model's context window. Always trim by token count.
  • One global history for all users. Always key history by session_id / user_id. Otherwise users see each other's conversations.
  • Storing PII forever. Apply TTL on the history store and have a /clear flow that calls .clear().
  • Forgetting that MessagesPlaceholder injects a LIST of messages. If you accidentally make it a single string variable, role information is lost.

8. When to use vs not use

Strategy When
RunnableWithMessageHistory + Buffer Short chat sessions, simple chatbot
+ Redis/Postgres history Production, multi-process, persistent
+ trim_messages Conversations that grow long
Summary memory Long sessions where exact wording matters less than the gist
LangGraph instead Anything stateful: multi-step agents, durable workflows, checkpointing, human-in-the-loop
No memory at all One-shot tools (translate, summarize a single doc)

9. Cheatsheet

from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_community.chat_message_histories import (
    RedisChatMessageHistory,
    PostgresChatMessageHistory,
    DynamoDBChatMessageHistory,
    MongoDBChatMessageHistory,
    FileChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import (
    HumanMessage, AIMessage, SystemMessage,
    trim_messages,
)
from langchain_core.prompts import MessagesPlaceholder

# Wire up
chain_with_history = RunnableWithMessageHistory(
    runnable=chain,
    get_session_history=lambda sid: get_history(sid),
    input_messages_key="input",
    history_messages_key="history",
    # Optional: multi-key configurable
    history_factory_config=[
        ConfigurableFieldSpec(id="user_id", annotation=str, ...),
    ],
)

# Invoke
chain_with_history.invoke(
    {"input": "..."},
    config={"configurable": {"session_id": "abc"}},
)

# Clear
get_history("abc").clear()

10. Q&A — recall test

  • Q: Why do LLMs need memory bolted on? A: Each API call is stateless. The model has no way to know what happened in previous calls unless you pass that history in the prompt.

  • Q: Difference between BufferWindow and Summary memory? A: Window keeps the last k messages verbatim (fast, lossy at boundaries). Summary uses an LLM to compress old turns into a summary (slower, smoother degradation as conversation grows).

  • Q: When does memory belong in LangChain vs LangGraph? A: LangChain memory: simple per-session chat history. LangGraph: any state more complex than a list of messages — agent state, scratch pads, branching, checkpointing.

  • Q: What is MessagesPlaceholder for? A: It's a slot in a ChatPromptTemplate where a list of prior messages can be injected — preserving roles (System/Human/AI). Unlike a string {history} variable.

  • Q: Why session_id matter? A: It's the partition key. Without it, all users share one history blob. Always derive it from the authenticated user/session.

Practice

What does this print?

Expected: True

# A chat history is a list of messages alternating user/assistant
history = [
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Hello!"},
]
print(len(history) % 2 == 0)

Use the session_id to partition each user's history (single shared key is wrong)

Expected: True

# Wrong: every user shares the same key
chat_history_key = "history"           # bug: single shared key for ALL users
correct_key_uses_session = False
print(correct_key_uses_session)

Quiz — Quick check

What you remember

Q1. What does conversation memory enable?

  • The LLM "remembers" prior messages in the same chat session
  • Faster responses
  • Lower costs
  • Better embeddings

Why: Without memory, every message is independent. With memory, the LLM can refer back: "as you mentioned earlier...", "your name is Alice". Essential for multi-turn chat UX.

Q2. Why summarize old messages instead of keeping them all?

  • Context windows are limited; summarization keeps relevant info while shrinking tokens
  • Required by LangChain
  • Faster than passing all messages
  • More accurate

Why: A 100-turn conversation has thousands of tokens of history. Summarizing the older turns ("the user is asking about refunds, they're frustrated, they prefer email contact") preserves what matters without filling the prompt.

Q3. What's the session_id for in chat memory?

  • Partition key — separates different users' conversation histories
  • Authentication
  • Rate limiting
  • Required by OpenAI

Why: Without session_id, all users share one history. Always derive it from the authenticated user (or a cookie for anonymous sessions). Multi-tenant safety 101.

Common doubts

Where should I store chat history?

Development: in-memory dict (ChatMessageHistory). Production: Redis (fast, ephemeral), Postgres (persistent, queryable), MongoDB (flexible schema). LangChain has wrappers for all common stores.

How much history is enough?

Depends on the use case. Chatbots: last 10-20 turns + a summary of older context. Customer support: the full session, but compressed. Coding assistants: the file + last few interactions. Trade-off between context (more is better) and cost/latency (less is faster/cheaper).

Should agents have memory across sessions?

Yes, for personalization — "Alice prefers detailed answers", "Bob uses PostgreSQL". Store as user-level metadata, retrieve at the start of each session. This is long-term memory vs session memory; both are useful.