Skip to content

Runnables & LCEL

1. Why this matters

LangChain has hundreds of components. Without a unified interface, you'd need to remember a different method name for each: model.complete(), parser.parse(), retriever.get_documents(), prompt.format()...

Runnable solves that — one interface, every component, every method (.invoke, .stream, .batch). That's why prompt | model | parser just works: all three implement the same protocol.

You'll touch Runnables directly when you need to wrap custom Python functions, run things in parallel, add retries/fallbacks, or pass data sideways through a chain.

2. Mental model

Think of a Runnable as a typed function with four superpowers:

Method What it does When to use
.invoke(input) Call once, return result Default
.stream(input) Yield chunks as they arrive Chat UIs
.batch([in1, in2, ...]) Run many inputs in parallel Bulk processing
.ainvoke / .astream / .abatch Async variants Inside FastAPI / async apps

LCEL is just the composition layer: how to wire many Runnables together.

flowchart LR
    subgraph SG1 [Every Runnable]
      I[Input] --> R[Runnable<br/>.invoke .stream .batch<br/>.ainvoke .astream .abatch]
      R --> O[Output]
    end

3. Architecture / Flow

LCEL composition primitives:

flowchart TB
    subgraph SG1 [RunnableSequence a / b / c]
      A1[a] --> B1[b] --> C1[c]
    end
    subgraph SG2 [RunnableParallel x: a, y: b]
      I[input] --> A2[a]
      I --> B2[b]
      A2 --> M[x, y dict]
      B2 --> M
    end
    subgraph SG3 [RunnableLambda fn]
      I3[input] --> F[fn] --> O3[fn output]
    end
    subgraph SG4 [RunnablePassthrough]
      I4[input] --> O4[input unchanged]
    end
    subgraph SG5 [RunnableBranch]
      I5[input] --> R{condition}
      R -->|true| Y[then]
      R -->|false| N[else]
    end

4. Core concepts

  • Runnable[Input, Output] — the base class. Every component subclasses this.
  • .invoke(input) — synchronous, single call. Returns the typed Output.
  • .stream(input) — yields partial chunks (AIMessageChunks for models, strings for parsers).
  • .batch(inputs) — runs many inputs concurrently with a thread/async pool. max_concurrency controls parallelism.
  • a | b — composition. Equivalent to RunnableSequence(a, b). Auto-extended for a | b | c.
  • RunnableLambda(fn) — wrap any Python callable. Use it to insert logic mid-chain.
  • RunnablePassthrough() — identity. Carries the input forward unchanged.
  • RunnablePassthrough.assign(k=fn) — passes input through AND adds k=fn(input) to it.
  • .with_retry(), .with_fallbacks(), .with_config(), .bind() — wrappers that return modified Runnables.
  • Type coercion — a plain dict literal in a chain auto-converts to RunnableParallel; a plain function auto-converts to RunnableLambda.

5. Code — minimal working example

from langchain_core.runnables import RunnableLambda

# Any function becomes a Runnable
square = RunnableLambda(lambda x: x * x)
print(square.invoke(5))     # 25
print(square.batch([1,2,3])) # [1, 4, 9]

# Compose with `|`
add_one = RunnableLambda(lambda x: x + 1)
chain = square | add_one
print(chain.invoke(3))      # 10  (3² + 1)

Stream a model:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")
for chunk in model.stream("Count to 5 slowly."):
    print(chunk.content, end="", flush=True)

6. Code — real-world pattern

A RAG-shaped chain using all the primitives — note how the dict literal becomes parallel branches automatically:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_openai import ChatOpenAI

retriever = ...  # any retriever (next chapters)
model = ChatOpenAI(model="gpt-4o-mini")
parser = StrOutputParser()

prompt = ChatPromptTemplate.from_template(
    "Answer using context.\n\nContext: {context}\nQ: {question}"
)

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

# Dict literal auto-becomes RunnableParallel
# Plain function auto-becomes RunnableLambda
chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | model
    | parser
)

print(chain.invoke("What is our refund window?"))

Add resilience without changing the chain shape:

robust_chain = chain.with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True,
).with_fallbacks([
    # If main chain fails, try a cheaper model
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini-2024-07-18")
    | parser
])

Use RunnablePassthrough.assign to enrich the dict mid-stream:

chain = (
    RunnablePassthrough.assign(
        word_count=lambda x: len(x["text"].split()),
        upper=lambda x: x["text"].upper(),
    )
    | (lambda x: f"{x['upper']} ({x['word_count']} words)")
)
print(chain.invoke({"text": "hello world"}))  # "HELLO WORLD (2 words)"

7. Common pitfalls

  • Forgetting input-type compatibility. If chain step N expects a dict but step N-1 outputs a string, you must adapt with a RunnableLambda(lambda s: {"key": s}).
  • Using .invoke() inside another .invoke() instead of composing. Composing is the point — manual nesting kills streaming and tracing.
  • Mixing sync and async carelessly. If your outer chain is .ainvoked, every step must support async. Most do, but custom RunnableLambda(sync_fn) will block the event loop — pass an async fn instead.
  • Branches sharing references. RunnableParallel runs branches concurrently — never have them mutate a shared list/dict.
  • Big anonymous lambdas in production chains. They're hard to trace and debug. Use named functions for anything > 1 line.

8. When to use vs not use

Pattern When
Pipe a | b | c Default for any sequence
RunnableLambda Need custom Python logic in the middle of a chain
RunnableParallel (or dict literal) Independent computations on the same input
RunnablePassthrough.assign(...) Adding computed fields to an input dict
.with_retry(...) Flaky API or model
.with_fallbacks([...]) Provider outage tolerance, model A/B
Raw Runnable subclass Building reusable components for a library/SDK

9. Cheatsheet

from langchain_core.runnables import (
    Runnable,
    RunnableSequence,
    RunnableParallel,
    RunnableLambda,
    RunnablePassthrough,
    RunnableBranch,
    RunnableConfig,
)

# Invoke styles
r.invoke(x)          # one input
r.batch([x1, x2])    # many inputs concurrently
r.batch(xs, config={"max_concurrency": 10})
list(r.stream(x))    # chunks
await r.ainvoke(x)   # async

# Wrap fn → Runnable
RunnableLambda(my_fn)
# Or just put `my_fn` directly in `|` — auto-coerced

# Parallel
RunnableParallel({"a": chainA, "b": chainB})
{"a": chainA, "b": chainB}  # auto-coerced in a pipe

# Branching
RunnableBranch(
    (lambda x: x["t"] == "a", chainA),
    (lambda x: x["t"] == "b", chainB),
    default_chain,  # last arg
)

# Pass-through with enrichment
RunnablePassthrough()                   # input → input
RunnablePassthrough.assign(k=fn)        # input → {**input, "k": fn(input)}
RunnablePassthrough.assign(k=chain)     # same, fn can be another Runnable

# Modifiers
r.with_retry(stop_after_attempt=3)
r.with_fallbacks([backup_chain])
r.with_config(tags=["prod"], metadata={"user": uid})
r.bind(stop=["\n\n"])                   # bind partial args

# Inspect
chain.get_graph().print_ascii()
chain.input_schema.schema()             # Pydantic schema of the input

10. Q&A — recall test

  • Q: What does it mean that "everything is a Runnable"? A: Every LangChain component (model, prompt, parser, retriever, tool) implements Runnable[Input, Output] — same .invoke / .stream / .batch / async interface. That's why a | b | c works regardless of the specific components.

  • Q: What does a | b compile to under the hood? A: RunnableSequence(a, b). The | is just Python's __or__ operator overloaded on Runnables.

  • Q: Difference between RunnablePassthrough() and RunnablePassthrough.assign(k=fn)? A: Plain RunnablePassthrough returns the input unchanged. .assign(k=fn) returns the input with an extra key k added.

  • Q: Why does {"a": chainA, "b": chainB} work in a chain? A: LCEL auto-coerces a plain dict to RunnableParallel(steps={"a": chainA, "b": chainB}). Same for functions → RunnableLambda.

  • Q: How do you make a chain survive transient API failures? A: chain.with_retry(stop_after_attempt=3, wait_exponential_jitter=True). Combine with .with_fallbacks([backup]) for total provider outages.

  • Q: What's the difference between .stream() and .batch()? A: .stream(input) yields chunks of ONE response as it arrives (for chat UIs). .batch(inputs) runs MANY inputs concurrently and returns all final results.

Practice

What does this print?

Expected: True

# Runnables expose: invoke, batch, stream, ainvoke (async), abatch, astream
methods = ["invoke", "batch", "stream", "ainvoke", "abatch", "astream"]
print(len(methods) == 6)

Use batch (concurrent) instead of a Python loop for multiple invocations

Expected: True

# Simulating: should use chain.batch(inputs), not a serial loop
inputs = ["a", "b", "c"]
use_batch = False               # bug: should be True for concurrent execution
print(use_batch)

Quiz — Quick check

What you remember

Q1. What's the difference between invoke and batch?

  • invoke processes one input synchronously; batch processes multiple concurrently
  • No difference
  • batch is for async only
  • invoke is deprecated

Why: LLM calls are I/O-bound. Running them concurrently with batch can be 10× faster than serial invoke for the same total work.

Q2. When should you use stream vs invoke?

  • stream for UIs where you want tokens to appear progressively; invoke for backend/automation where only the final answer matters
  • stream is faster
  • invoke is for very short responses
  • They're identical

Why: Streaming improves perceived latency in chat UIs. For non-interactive backend code, just invoke and process the complete response.

Q3. What does RunnableLambda do?

  • Wraps any Python function so it can be piped into an LCEL chain
  • A lazy evaluation primitive
  • Required for streaming
  • Same as lambda in Python

Why: Lets you insert arbitrary logic into chains. Useful for preprocessing input, postprocessing output, or branching on conditions.

Common doubts

What's the difference between LCEL and LangGraph?

LCEL is linear (or branched DAGs). LangGraph is for stateful, cyclic workflows — loops, retries, conditional routing, human-in-the-loop. Use LCEL for simple Q&A and RAG; switch to LangGraph when the flow has loops or branches.

Why do all my LCEL chains start with a prompt?

Most chains start by converting raw input into a prompt for the LLM. But they don't have to — you can start with a retriever (retriever | prompt | model) or any other Runnable. The pattern is just very common.

How do I add error handling to a chain?

Use chain.with_fallbacks([backup_chain]) — runs backup if the main chain fails. Or wrap in a RunnableLambda that catches the exception and returns a default. Or implement structured retries with LangGraph.