AI Application Development Mastery Part 9: Deep Agents & Autonomous Systems

1. Overview & The Agent Harness

In Parts 7 and 8 we built agents with tool-calling loops and orchestrated them with LangGraph's state machines. Those patterns work well for structured workflows, but many real-world tasks — research, code generation, document analysis — require an agent that can plan its own work, manage long-running context, read and write files, and delegate subtasks to specialized workers. The LangChain Deep Agents SDK packages all of these capabilities into a single, batteries-included library built on top of LangChain and LangGraph.

Rather than manually wiring nodes, edges, and state schemas, Deep Agents gives you one function — create_deep_agent() — that assembles an opinionated but extensible agent harness. The harness includes a built-in planner, a virtual filesystem, context compression, subagent delegation, human-in-the-loop checkpoints, a skills system, and streaming — all configured through a declarative Python API.

                            
                            Key Insight: Deep Agents is not a replacement for LangGraph — it is built on LangGraph. Think of it as a high-level SDK that assembles a production-ready agent graph for you, similar to how a web framework assembles middleware, routing, and templating into a single create_app() call.
                        

1.1 What Are Deep Agents?

A Deep Agent is an autonomous system that combines five core capabilities into a unified execution loop:

Capability	Mechanism	Purpose
Planning	`write_todos` tool	Agent maintains a structured to-do list of tasks, checking items off as it progresses
Filesystem	`ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep`, `execute`	Persistent workspace for reading, writing, and executing files
Context Management	Offloading, summarization, compression	Keeps the agent within context window limits during long sessions
Task Delegation	Subagents with isolated context	Delegate specialized work without polluting the main agent's context
Code Execution	Sandbox backends (LangSmith, custom)	Run generated code safely in isolated environments

The term "deep" refers to the depth of autonomy — unlike a simple ReAct agent that executes tools in a flat loop, a Deep Agent can plan multi-step strategies, spawn child agents, maintain persistent state across sessions, and manage its own context window.

Deep Agent Harness Architecture

flowchart TD
    A["create_deep_agent()"] --> B["Agent Harness"]
    B --> C["Planning Engine
(write_todos)"]
    B --> D["Virtual Filesystem
(ls, read, write, edit, glob, grep, execute)"]
    B --> E["Context Manager
(offload, summarize, compress)"]
    B --> F["Subagent Orchestrator
(task delegation)"]
    B --> G["Sandbox Runtime
(code execution)"]
    C --> H["To-Do List State"]
    D --> I["Backend Protocol
(State / FS / Store / Composite)"]
    F --> J["Research Agent"]
    F --> K["Code Agent"]
    F --> L["General-Purpose Agent"]
    G --> M["LangSmith / Custom Sandbox"]

1.2 The create_deep_agent() API

The entire Deep Agents SDK surface funnels through a single factory function. Here is its signature with the most commonly used parameters:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Minimal Deep Agent — just a model and a system prompt
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheapest Anthropic model
    system_prompt="You are a research assistant. Be thorough and cite sources.",
)

# Invoke the agent with a user message
result = agent.invoke({
    "messages": [{"role": "user", "content": "Summarize the latest advances in protein folding."}]
})

# The result is a dict with 'messages' — the last message is the agent's final answer
print(result["messages"][-1].content)

The factory function accepts a rich set of keyword arguments:

Parameter	Type	Description
`model`	`str \| BaseChatModel`	Model identifier in `provider:model` format, or a pre-configured chat model instance
`tools`	`list[BaseTool]`	Custom tools the agent can invoke alongside built-in filesystem tools
`system_prompt`	`str`	Prepended to the built-in harness prompt — layer your instructions on top
`subagents`	`list[dict \| CompiledSubAgent]`	Subagent definitions for task delegation
`backend`	`BaseBackend`	Storage backend for the virtual filesystem (default: `StateBackend`)
`interrupt_on`	`dict[str, bool]`	Tool names that require human approval before execution
`permissions`	`list[FilesystemPermission]`	Filesystem permission rules with paths, operations, and allow/deny mode
`skills`	`list[str]`	Paths to SKILL.md files for progressive disclosure
`memory`	`list[str]`	Paths to AGENTS.md memory files (always loaded into context)
`middleware`	`list[AgentMiddleware]`	Model call middleware for runtime interception

1.3 Planning with To-Do Lists

Every Deep Agent has a built-in write_todos tool that maintains a structured task list. When the agent encounters a complex request, it first breaks the work into discrete tasks, then systematically works through each item:

# pip install deepagents langchain-openai
from deepagents import create_deep_agent

# The agent will automatically use write_todos for complex, multi-step tasks
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",  # Cost-effective for planning tasks
    system_prompt="You are a technical writer. Break complex requests into clear steps.",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Write a comprehensive comparison of PostgreSQL vs MySQL for a new SaaS app."
    }]
})

# The agent internally creates a to-do list like:
# [x] Research PostgreSQL strengths for SaaS workloads
# [x] Research MySQL strengths for SaaS workloads
# [x] Compare performance benchmarks
# [x] Compare scalability features
# [x] Write recommendation summary
# Then works through each item, checking them off as it goes.

print(result["messages"][-1].content)

                            
                            Key Insight: The to-do list is not just a prompt trick — it is persisted in the agent's state and survives context compression. When the agent's context is summarized, the to-do list is preserved in full, ensuring the agent never loses track of its plan.
                        

Comparison

ReAct Agent vs Deep Agent Planning

A standard ReAct agent processes one tool call at a time without an explicit plan. A Deep Agent creates a to-do list first, then works through items sequentially. For a 10-step research task, the ReAct agent may lose track of progress during context compression, while the Deep Agent's to-do list anchors its progress across summarization boundaries.

Structured Planning Context Persistence Progress Tracking

2. Models & Configuration

Deep Agents support any model accessible through LangChain's init_chat_model interface. The SDK uses a convenient provider:model string format that is resolved at runtime to the appropriate chat model class.

2.1 Supported Providers

Provider	Model String	Notes
Anthropic	`anthropic:claude-opus-4-6`	Most capable, best for complex reasoning chains
	`anthropic:claude-sonnet-4-6`	Best balance of speed and capability
	`anthropic:claude-haiku-4-5`	Cheapest Anthropic — great for simple tasks and demos
OpenAI	`openai:gpt-5.4`	Latest GPT model with extended context
	`openai:gpt-4o`	Multi-modal, fast inference
	`openai:gpt-4.1`	Optimized for instruction following
	`openai:gpt-4.1-mini`	Fast and cost-effective for most agent tasks
	`openai:gpt-4.1-nano`	Ultra-cheap, best for simple routing and delegation
	`openai:o4-mini`	Reasoning model with chain-of-thought
Google	`google_genai:gemini-3-flash-preview`	Fast, large context window
Open-weight	`openai:GLM-5`	Via OpenAI-compatible endpoint
	`openai:Kimi-K2.5`	Via OpenAI-compatible endpoint
	`openai:qwen3.5-397B-A17B`	Via OpenAI-compatible endpoint

2.2 Model Configuration

For fine-grained control over model parameters, use init_chat_model directly and pass the resulting instance to create_deep_agent:

# pip install deepagents langchain-anthropic
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

# Configure model with specific parameters
model = init_chat_model(
    "anthropic:claude-sonnet-4-6",
    temperature=0.2,
    max_tokens=8192,
)

agent = create_deep_agent(
    model=model,
    system_prompt="You are a precise code reviewer. Flag bugs and suggest fixes.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Review this Python function for bugs."}]
})
print(result["messages"][-1].content)

2.3 Runtime Model Selection

The middleware parameter lets you intercept and modify every model call at runtime. A common pattern is routing simple tasks to a cheaper model and complex tasks to a more capable one:

# pip install deepagents langchain-anthropic langchain-openai
from langchain.agents.middleware.types import AgentMiddleware
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

class RouteByComplexity(AgentMiddleware):
    """Middleware that routes simple messages to a cheaper model."""

    def wrap_model_call(self, request, handler):
        last_msg = request.messages[-1].content if request.messages else ""

        if len(last_msg) < 200:
            # Short messages go to the cheapest viable model
            request = request.override(model=init_chat_model("openai:gpt-4.1-mini"))
        else:
            # Complex messages stay on the primary model
            request = request.override(model=init_chat_model("anthropic:claude-sonnet-4-6"))

        return handler(request)

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Default model (cheap)
    middleware=[RouteByComplexity()],
    system_prompt="You are a research assistant.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
})
print(result["messages"][-1].content)

                            
                            Key Insight: Middleware must subclass AgentMiddleware from langchain.agents.middleware.types. The wrap_model_call hook receives an immutable ModelRequest with fields like messages, model, tools, and state. Use request.override(model=...) to create a new request with modified attributes, then pass it to handler(request). This enables patterns like A/B testing models, injecting telemetry, or routing by complexity — all without changing the agent's core logic.
                        

3. Context Engineering

The most critical challenge for long-running agents is context management. An agent executing a 50-step research task may generate hundreds of tool calls and thousands of tokens of intermediate results. Without active management, the conversation history exceeds the model's context window and the agent fails. Deep Agents solve this with a layered context engineering system.

Context Type	When Loaded	Persistence	Example
Input Context	Every turn	Per-thread	System prompt, memory files (AGENTS.md)
Runtime Context	Injected dynamically	Per-invocation	User info, session metadata, config overrides
Context Compression	At 85% capacity	Automatic	Tool results offloaded to filesystem references
Context Isolation	Per subagent	Scoped	Subagent starts with fresh context, returns only results
Long-Term Memory	On demand	Cross-thread	Preferences and knowledge in `/memories/` path

3.0 The Agent Harness

The agent harness is what distinguishes Deep Agents from a bare tool-calling loop. It wraps the core LLM loop with a layered set of built-in middleware — adding planning (write_todos), virtual filesystem tools (ls, read_file, write_file, edit_file, glob, grep), subagent delegation (task), and automatic context management — all assembled into a complete system prompt before every invocation.

When you call create_deep_agent(), the harness assembles the system prompt from multiple sources in a fixed order. Your system_prompt is always prepended first, so your instructions override built-in guidance at every layer:

#	Prompt Layer	Source	Notes
1	Custom System Prompt	`system_prompt=` parameter	Your role, domain rules, constraints — highest priority
2	Base Agent Prompt	Harness built-in	Core reasoning and tool-use guidance
3	To-Do List Prompt	Harness built-in	Instructions for the `write_todos` planning tool
4	Memory Prompt	When `memory=` configured	Loads AGENTS.md files + usage guidelines
5	Skills Prompt	When `skills=` configured	Skill locations + frontmatter for progressive disclosure
6	Virtual Filesystem Prompt	Harness built-in	Docs for `ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep`
7	Subagent Prompt	Harness built-in	Guidance for the `task` delegation tool
8	Custom Middleware Prompts	When custom middleware added	Your domain context and custom tool instructions
9	HITL Prompt	When `interrupt_on=` set	Human approval guidance (only when configured)

                            
                            Harness vs. Bare Agent: A bare LLM call gives you tool calling. The harness adds opinionated built-in tools (planning, filesystem, subagents), system prompts that teach the model how to use them effectively, and automatic context management (offloading at 20k tokens, summarization at 85% capacity). You get a production-ready loop with sensible defaults — fully customizable or replaceable at every layer.
                        

3.1 Input Context

Input context is the foundation layer — content loaded into every agent turn. It includes the system prompt, memory files, skill frontmatter, and built-in tool descriptions. The system_prompt you provide is prepended to the harness's built-in prompt, so your instructions take priority:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# System prompt is prepended to the built-in harness prompt
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheapest Anthropic model
    system_prompt="""You are a senior data engineer specializing in ETL pipelines.

Rules:
- Always validate data schemas before transformation
- Use incremental loading over full refreshes when possible
- Log all data quality issues to the /quality-reports/ directory""",
    memory=["AGENTS.md"],  # Always loaded into context alongside system prompt
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Design an ETL pipeline for daily sales data."}]
})
print(result["messages"][-1].content)

                            
                            Key Insight: Memory files (AGENTS.md) are always loaded into the agent's context, regardless of context compression. Use them for persistent instructions, coding standards, or domain knowledge that must survive summarization.
                        

3.2 Runtime Context

Runtime context is per-invocation typed configuration that you pass when calling the agent. Unlike the static system_prompt, runtime context changes per-call — carrying user identity, API credentials, roles, feature flags, or any other request-scoped data your tools need.

Runtime context is not automatically visible to the LLM. It is never injected into the conversation history. Tools read it explicitly via ToolRuntime injection, creating a deliberate security boundary: sensitive values like API keys or session tokens never appear in the model's prompt or tool call history.

Define the context shape with a Python dataclass or TypedDict, declare it as context_schema, and pass values via context= at invoke time:

# pip install deepagents langchain-anthropic
from dataclasses import dataclass
from deepagents import create_deep_agent
from langchain.tools import tool, ToolRuntime

# 1. Define runtime context shape — changes per user/session
@dataclass
class UserContext:
    user_id: str
    user_role: str     # "admin" | "viewer"
    api_key: str       # Never visible to the LLM

# 2. Tools receive context via ToolRuntime[Context] injection
@tool
def fetch_user_orders(query: str, runtime: ToolRuntime[UserContext]) -> str:
    """Fetch orders for the currently authenticated user.

    Args:
        query: Search filter for orders
    """
    user_id = runtime.context.user_id
    return f"Orders for {user_id} matching '{query}': [order-001, order-002]"

@tool
def update_order_status(order_id: str, status: str, runtime: ToolRuntime[UserContext]) -> str:
    """Update an order's status. Requires admin role.

    Args:
        order_id: The order to update
        status: New status: 'pending', 'shipped', or 'delivered'
    """
    role = runtime.context.user_role
    if role != "admin":
        return f"Permission denied: admin role required (your role: {role})"
    return f"Order {order_id} updated to '{status}'"

# 3. Declare context_schema when creating the agent
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    tools=[fetch_user_orders, update_order_status],
    context_schema=UserContext,
    system_prompt="You are an order management assistant.",
)

# 4. Pass context per invocation — different per user/session
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Show my recent orders and ship order-001."}]},
    context=UserContext(
        user_id="user-789",
        user_role="admin",
        api_key="sk-real-key-never-in-llm-history",
    ),
)
print(result["messages"][-1].content)

                            
                            Key Insight: Runtime context automatically propagates to all subagents. When the main agent delegates a task, every spawned subagent receives the same UserContext (same user_id, user_role, etc.), so multi-agent workflows stay consistently authorized without manual plumbing. For dynamic system prompts that vary per user — e.g., injecting "You have admin access" based on role — use the @dynamic_prompt middleware pattern covered in Section 8.
                        

3.3 Context Compression

When tool inputs or outputs exceed 20,000 tokens, the harness automatically offloads them to the virtual filesystem and replaces the content with a reference pointer. This keeps the conversation history lean while preserving all data:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Context compression is automatic — no configuration needed
# When a tool returns a 30,000-token result:
#   1. The result is saved to the filesystem at a generated path
#   2. The message history replaces the result with:
#      "Result saved to /tool-outputs/research-2026-04-01-001.md (30,247 tokens)"
#   3. The agent can read_file to retrieve specific sections as needed

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model works for demos
    system_prompt="You are a research agent. Analyze large datasets thoroughly.",
)

# This invocation may trigger context offloading if the research produces
# a large volume of intermediate results
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research all OWASP Top 10 vulnerabilities with code examples for each."
    }]
})
print(result["messages"][-1].content)

3.4 Context Summarization

When the conversation history reaches 85% of the model's context window, the harness triggers automatic summarization. The agent generates a structured summary of the conversation so far, preserving:

The complete to-do list with current progress
Key decisions and findings
File paths that were created or modified
Pending tasks and their dependencies

# pip install deepagents langchain-openai
from deepagents import create_deep_agent

# Summarization is automatic — triggered at 85% of context window
# Example: For a 200k-token context window, summarization fires at ~170k tokens

# You can influence summarization by structuring your system prompt:
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",  # Cost-effective for planning tasks
    system_prompt="""You are a project manager agent.

When summarizing progress, always preserve:
1. Current sprint goals and status
2. Blockers and dependencies
3. All file paths you've created
4. Key metrics and measurements""",
)

# Long-running task that may trigger multiple summarizations
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Analyze the performance of our microservices architecture and write a report."
    }]
})
print(result["messages"][-1].content)

Context Lifecycle in a Long-Running Session

stateDiagram-v2
    [*] --> Normal: Agent starts
    Normal --> Offloading: Tool result > 20k tokens
    Offloading --> Normal: Content saved to filesystem
    Normal --> Summarization: Context at 85% capacity
    Summarization --> Compressed: Summary replaces history
    Compressed --> Normal: Agent continues with summary
    Compressed --> Offloading: New large tool result
    Normal --> [*]: Task complete

3.5 Context Isolation

Subagents solve the context bloat problem. When an agent performs heavy multi-step work — web searches, large file reads, database queries — that work fills the main agent's context window rapidly. Context isolation delegates heavy subtasks to a subagent with its own fresh context. The main agent receives only the final summary result, keeping its own context clean throughout:

Context Isolation: Subagent Work Stays Isolated

flowchart LR
    A["Main Agent\n(clean context)"] -->|"task tool"| B["Subagent\n(fresh context)"]
    B --> C["Tool Call 1\n50k tokens"]
    B --> D["Tool Call 2\n30k tokens"]
    B --> E["Tool Call N\n25k tokens"]
    C & D & E --> F["Final Summary\n~300 words"]
    F -->|"single result"| A

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Define a specialized research subagent that isolates heavy work
research_subagent = {
    "name": "researcher",
    "description": (
        "Conducts deep research on a single focused topic. Use this when a task "
        "requires extensive searching, reading, or multi-step analysis."
    ),
    "system_prompt": """You are a research analyst.
Research the topic thoroughly using all available tools.

CRITICAL OUTPUT RULES:
- Return ONLY the essential summary (300-500 words maximum)
- Do NOT include raw search results or verbose tool outputs
- Format: Brief Summary | Key Findings (3-5 bullets) | Confidence Level""",
}

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    subagents=[research_subagent],
    system_prompt=(
        "You are a project lead. For any research-heavy task, "
        "delegate to the researcher subagent to keep your context clean."
    ),
)

# The researcher subagent may use thousands of tokens internally.
# The main agent only sees the final concise summary.
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Compare event sourcing vs. traditional CRUD for a financial transactions system."
    }]
})
print(result["messages"][-1].content)

                            
                            Key Insight: The subagent's entire session — potentially hundreds of tool calls and thousands of tokens — collapses into a single short result in the main agent's conversation history. This enables orchestrating multiple parallel research tasks simultaneously without the main agent's context window ever filling up. Each subagent can be configured with its own model, tools, skills, and permissions, independently of the main agent. See Section 5 for full subagent configuration options.
                        

4. Backends & Sandboxes

The agent's virtual filesystem is backed by a pluggable backend that determines where files are stored and how code is executed. The SDK provides four backend types that can be used individually or composed together.

4.1 Virtual Filesystem

Every Deep Agent has access to a set of filesystem tools, regardless of which backend is configured:

Tool	Description	Example Usage
`ls`	List directory contents	`ls(path="/")`
`read_file`	Read file content	`read_file(path="/report.md")`
`write_file`	Create or overwrite a file	`write_file(path="/output.csv", content="...")`
`edit_file`	Apply targeted edits to existing files	`edit_file(path="/main.py", old="...", new="...")`
`glob`	Find files matching a pattern	`glob(pattern="*/.py")`
`grep`	Search file contents	`grep(pattern="TODO", path="/")`
`execute`	Run a shell command	`execute(command="python main.py")`

4.2 Backend Types

Each backend implements the same filesystem protocol but stores data differently:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent
from deepagents.backends import StateBackend, CompositeBackend, StoreBackend

# 1. StateBackend (default) — in-memory, thread-scoped
#    Files exist only for the duration of the conversation
agent_memory = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheapest Anthropic model
    # backend=StateBackend()  # This is the default, no need to specify
)

# 2. StoreBackend — persists files across threads via LangGraph Store
from langgraph.store.memory import InMemoryStore

store = InMemoryStore()  # Use PostgresStore for production
agent_persistent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    backend=StoreBackend(store=store, namespace=["project", "docs"]),
)

# 3. CompositeBackend — routes paths to different backends
#    First arg is the default backend; routes override specific paths
agent_composite = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    backend=CompositeBackend(
        StateBackend(),  # Default backend for unmatched paths
        routes={
            "/memories/": StoreBackend(store=store, namespace=["user", "memory"]),
        },
    ),
)

# Invoke the composite agent — writes to /memories/ persist across threads
result = agent_composite.invoke({
    "messages": [{"role": "user", "content": "Remember that I prefer Python over JavaScript."}]
})
print(result["messages"][-1].content)

Architecture Pattern

Backend Selection Guide

StateBackend: Use for stateless, single-conversation agents where files don't need to persist. Fastest option with zero infrastructure. StoreBackend: Use when files must survive across conversations (e.g., user preferences, project state). Requires a LangGraph Store. CompositeBackend: Use when different paths have different persistence needs — the standard pattern for agents with long-term memory.

StateBackend StoreBackend CompositeBackend

4.3 Sandbox Integration

For agents that need to run generated code, sandboxes provide isolated execution environments. The recommended pattern is "sandbox as tool" — the sandbox is exposed to the agent as a tool, rather than running the entire agent inside the sandbox:

# pip install deepagents langchain-anthropic langsmith
from deepagents import create_deep_agent
from deepagents.backends import LangSmithSandbox
from langsmith.sandbox import Sandbox

# Create a LangSmith sandbox for isolated code execution
# The sandbox runs code in an isolated environment with a timeout
sandbox = Sandbox()  # Requires LANGSMITH_API_KEY environment variable
backend = LangSmithSandbox(sandbox)

# Pass the sandbox as the backend — the agent's filesystem tools
# (ls, read, write, execute) are routed to the sandbox
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for sandbox demos
    backend=backend,
    system_prompt="""You are a data scientist. When you need to run code:
1. Write the script to the filesystem
2. Use the execute tool to run it
3. Analyze the output""",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Generate a Monte Carlo simulation for option pricing and run it."
    }]
})
print(result["messages"][-1].content)

Sandbox-as-Tool Architecture

flowchart LR
    A["Deep Agent"] -->|"write_file()"| B["Virtual Filesystem
(Backend)"]
    A -->|"execute()"| C["Sandbox Backend"]
    C --> D["Isolated Container
(Python, Node, Bash)"]
    D -->|"stdout/stderr"| A
    B -->|"upload files"| C
    C -->|"download results"| B

                            
                            Security Warning: Never put API keys, tokens, or credentials inside the sandbox environment. If the agent needs authenticated access (e.g., to a database or API), create a dedicated tool that runs outside the sandbox and handles authentication on behalf of the agent.
                        

5. Subagents & Task Delegation

The most powerful feature of Deep Agents is subagent delegation. When the main agent encounters a task that would consume significant context — like researching a topic, generating a large code file, or analyzing a document — it can spawn a subagent that runs with a fresh context window. The subagent completes the task and returns only the final result, keeping the main agent's context clean.

                            
                            Key Insight: Subagents provide context quarantine. A research subagent might consume 100,000 tokens of intermediate results while investigating a topic, but the main agent only receives the 2,000-token summary. This is the primary mechanism for keeping long-running agents within their context budget.
                        

5.1 Subagent Configuration

Subagents are defined as dictionaries with a name, description, tools, and optional model override:

# pip install deepagents langchain-anthropic langchain-openai langchain-community ddgs
from deepagents import create_deep_agent
from langchain_community.tools import DuckDuckGoSearchRun

# Define specialized subagents for different tasks
research_subagent = {
    "name": "research_agent",
    "description": "Researches topics using web search. Delegates here for any information gathering.",
    "tools": [DuckDuckGoSearchRun()],
    "model": "openai:gpt-4.1-mini",  # Cheap and fast for web search tasks
    "system_prompt": "You are a research specialist. Search thoroughly and synthesize findings.",
}

code_subagent = {
    "name": "code_agent",
    "description": "Writes, tests, and debugs code. Delegates here for any programming tasks.",
    "tools": [],  # Uses filesystem tools by default
    "model": "anthropic:claude-sonnet-4-6",  # Use a capable model for code generation
    "system_prompt": "You are a senior Python developer. Write clean, tested code.",
}

# Main agent with two specialized subagents
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for delegation
    subagents=[research_subagent, code_subagent],
    system_prompt="You are a project lead. Delegate research to the research agent and coding to the code agent.",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research the best practices for rate limiting in APIs, then write a Python implementation."
    }]
})
print(result["messages"][-1].content)

5.2 General-Purpose Subagent

Every Deep Agent automatically includes a general-purpose subagent that inherits the main agent's tools, system prompt, model, and skills. This is useful for delegating tasks that don't need a specialized configuration — the main agent simply needs context isolation:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# The general-purpose subagent is always available
# The main agent can use it to delegate any task for context isolation
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for delegation
    system_prompt="""You are an analyst. For large analysis tasks, delegate to the
general-purpose subagent to keep your own context clean.""",
)

# When invoked, the main agent can call the built-in "task" tool:
# task(agent="general", description="Analyze the 500-row dataset and return top insights")
# The general subagent runs with fresh context, returns only the summary

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Analyze the top 50 tech companies by revenue and identify growth trends."
    }]
})
print(result["messages"][-1].content)

5.3 Subagent Context Management

Runtime context propagates from the main agent to all subagents, enabling shared configuration without shared conversation history. You can also use namespaced keys for per-subagent configuration:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap for orchestration
    subagents=[
        {
            "name": "writer",
            "description": "Writes technical documentation.",
            "system_prompt": "You are a technical writer.",
        }
    ],
)

# Runtime context is passed via the config dict
# It propagates to all subagents automatically
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Write API docs for the auth module."}]},
    config={
        "configurable": {
            "user_name": "Alice",
            "project": "acme-api",
            # Namespaced keys for subagent-specific config
            "writer:style_guide": "Google developer documentation style",
            "writer:max_length": 5000,
        }
    },
)
print(result["messages"][-1].content)

Subagent Delegation Flow

sequenceDiagram
    participant User
    participant Main as Main Agent
    participant Task as task() tool
    participant Sub as Subagent
    participant FS as Filesystem

    User->>Main: Complex request
    Main->>Main: write_todos (plan steps)
    Main->>Task: Delegate research task
    Task->>Sub: Spawn with fresh context
    Sub->>Sub: Execute tools (search, read, write)
    Sub->>FS: Save intermediate results
    Sub->>Task: Return final summary only
    Task->>Main: 2,000-token summary
    Main->>Main: Continue with next to-do item
    Main->>User: Final answer

6. Human-in-the-Loop & Permissions

Autonomous agents need guardrails. Deep Agents provide two complementary safety mechanisms: interrupt-on for pausing execution before critical tool calls, and filesystem permissions for restricting what the agent can read, write, and execute.

6.1 Interrupt-On Configuration

The interrupt_on parameter is a dictionary mapping tool names to boolean values. When set to True, the agent pauses execution and returns control to the caller for approval before invoking that tool:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Require human approval before executing code or writing files
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for demos
    interrupt_on={
        "execute": True,      # Pause before running any shell command
        "write_file": True,   # Pause before creating/overwriting files
        "edit_file": True,    # Pause before editing existing files
    },
    system_prompt="You are a code assistant. Always explain what you plan to do before doing it.",
)

# First invocation — agent plans its work and hits the interrupt
result = agent.invoke({
    "messages": [{"role": "user", "content": "Fix the bug in main.py"}]
})

# The result contains an 'interrupt' key with the pending tool call details
if "interrupt" in result:
    pending_call = result["interrupt"]
    print(f"Agent wants to call: {pending_call['tool']}")
    print(f"With arguments: {pending_call['args']}")

    # To approve and continue:
    # result = agent.invoke({"messages": [], "approve": True})
    # To reject:
    # result = agent.invoke({"messages": [], "approve": False, "feedback": "Don't edit that file."})

6.2 Filesystem Permissions

Permissions are declarative rules that restrict the agent's filesystem access. Rules are evaluated in order with first-match-wins semantics:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent
from deepagents.middleware.permissions import FilesystemPermission

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for demos
    permissions=[
        # Allow reading anything in the /docs/ directory
        FilesystemPermission(paths=["/docs/**"], operations=["read"], mode="allow"),

        # Allow reading and writing to the /output/ directory
        FilesystemPermission(paths=["/output/**"], operations=["read", "write"], mode="allow"),

        # Block all access to sensitive directories
        FilesystemPermission(paths=["/secrets/**", "/.env"], operations=["read", "write"], mode="deny"),

        # Default: read-only access to everything else
        FilesystemPermission(paths=["/**"], operations=["read"], mode="allow"),
    ],
    system_prompt="You are a documentation agent. Read source files and write docs to /output/.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Generate API docs from the source files in /docs/."}]
})
print(result["messages"][-1].content)

6.3 Security Considerations

                            
                            Critical Security Rules:
                            Never put secrets in the sandbox. API keys, database credentials, and tokens must stay in tools that run outside the sandbox boundary.
Block or restrict network access. If the agent doesn't need internet, disable it. If it does, use allowlists.
Treat sandbox output as untrusted. The agent can be prompt-injected through file contents or tool outputs. Validate all results.
Use network proxies for authenticated APIs. Don't give the agent raw credentials — create tools that handle auth on its behalf.

                        

Security Pattern

Defense in Depth for Agent Systems

Production Deep Agents should implement multiple layers of protection: (1) permissions to restrict filesystem access, (2) interrupt_on for human review of critical actions, (3) sandbox isolation for code execution, (4) network restrictions to prevent data exfiltration, and (5) output validation to catch prompt injection in tool results. No single layer is sufficient — combine all five for robust security.

Permissions Interrupts Sandboxing Network Isolation Output Validation

7. Memory, Skills & Streaming

Deep Agents support three advanced capabilities that transform them from single-session tools into persistent, knowledgeable, real-time systems: long-term memory for cross-conversation persistence, skills for progressive knowledge disclosure, and streaming for real-time execution visibility.

7.1 Long-Term Memory

Long-term memory is implemented using the CompositeBackend pattern. By routing the /memories/ path to a StoreBackend, any files the agent writes to that directory persist across conversation threads:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent
from deepagents.backends import StateBackend, CompositeBackend, StoreBackend
from langgraph.store.memory import InMemoryStore

# Create a persistent store (use PostgresStore for production)
store = InMemoryStore()

# CompositeBackend routes /memories/ to persistent storage
# First arg is the default backend; routes override specific paths
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for demos
    backend=CompositeBackend(
        StateBackend(),  # Default: ephemeral in-memory storage
        routes={
            "/memories/": StoreBackend(store=store, namespace=["user", "preferences"]),
        },
    ),
    system_prompt="""You are a personal assistant with long-term memory.

When you learn something about the user (preferences, context, past decisions),
save it to /memories/user-profile.md using write_file.

At the start of each conversation, check /memories/ for existing knowledge.""",
)

# First conversation — agent learns preferences
result = agent.invoke({
    "messages": [{"role": "user", "content": "I prefer dark mode, Python over JS, and concise answers."}]
})
print(result["messages"][-1].content)

# Next conversation (new thread) — agent retrieves saved preferences
result = agent.invoke({
    "messages": [{"role": "user", "content": "Set up a new project for me."}]
})
# Agent checks /memories/user-profile.md and applies stored preferences
print(result["messages"][-1].content)

7.2 Agent Skills

Skills are knowledge modules defined as SKILL.md files with YAML frontmatter. The SDK implements progressive disclosure — at startup, only the frontmatter (name, description, triggers) is loaded into context. When the agent determines a skill is relevant, it loads the full content on demand:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Skills are SKILL.md files with YAML frontmatter
# Example SKILL.md content:
#
# ---
# name: sql-optimization
# description: Advanced SQL query optimization techniques
# triggers:
#   - slow query
#   - performance tuning
#   - query optimization
#   - explain plan
# ---
#
# # SQL Optimization Skill
#
# ## Index Selection
# When optimizing queries, consider...
#
# ## Query Rewriting
# Common patterns for rewriting slow queries...

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for demos
    skills=[
        "skills/sql-optimization.md",
        "skills/api-design.md",
        "skills/security-review.md",
    ],
    system_prompt="You are a senior engineer. Use your skills when relevant.",
)

# When the user asks about slow queries, the agent:
# 1. Matches "slow query" against skill triggers
# 2. Loads the full sql-optimization.md content
# 3. Uses the knowledge to provide an expert answer
result = agent.invoke({
    "messages": [{"role": "user", "content": "My dashboard query is taking 30 seconds."}]
})
print(result["messages"][-1].content)

                            
                            Key Insight: Skills vs. Memory — use memory (AGENTS.md) for instructions that must always be present in context. Use skills (SKILL.md) for domain knowledge that should only be loaded when relevant. This keeps context usage efficient while giving the agent access to a large knowledge base.
                        

7.3 Streaming Deep Agents

Deep Agents support multiple streaming modes for real-time visibility into agent execution. The stream() method accepts a stream_mode parameter and returns events as they occur:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheapest Anthropic model
    system_prompt="You are a helpful assistant.",
)

# Stream with v2 format — returns StreamPart dicts with type, namespace, and data
for event in agent.stream(
    {"messages": [{"role": "user", "content": "Explain gradient descent in 3 paragraphs."}]},
    stream_mode="messages",
    version="v2",
):
    # Each event is a StreamPart dict:
    # {"type": "messages", "ns": [], "data": (AIMessageChunk, {"langgraph_step": 1})}
    if event["type"] == "messages":
        chunk = event["data"][0]
        if hasattr(chunk, "content") and chunk.content:
            print(chunk.content, end="", flush=True)

print()  # Newline after streaming completes

Available stream modes:

Stream Mode	Returns	Use Case
`values`	Full state after each step	Debug/inspect full agent state
`updates`	State deltas per node	Track what each node changed
`messages`	Token-by-token LLM output	Real-time chat UI streaming
`custom`	User-defined events via `emit()`	Application-specific progress updates
`checkpoints`	Checkpoint saves	Persistence/recovery monitoring
`tasks`	Task scheduling events	Subagent orchestration visibility
`debug`	Detailed internal events	Development/troubleshooting

For applications with subagents, enable subgraphs=True to receive events from child agents with namespace prefixes:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for demos
    subagents=[{
        "name": "researcher",
        "description": "Researches topics thoroughly.",
        "system_prompt": "You are a research agent.",
    }],
)

# Stream with subgraph events — namespace identifies the source agent
for event in agent.stream(
    {"messages": [{"role": "user", "content": "Research quantum computing advances in 2026."}]},
    stream_mode="updates",
    subgraphs=True,
    version="v2",
):
    ns = event.get("ns", [])
    source = " > ".join(ns) if ns else "main"
    print(f"[{source}] {event['type']}: {list(event['data'].keys())}")

Integration Pattern

Building a Real-Time Agent Dashboard

Combine stream_mode=["updates", "messages", "custom"] (a list of modes) to build a comprehensive dashboard. Display token-by-token LLM output in the chat pane, show subagent progress in a sidebar, and emit custom events for application-level metrics like "documents analyzed: 3/10". The v2 format's ns field lets you route events to the correct UI panel by namespace.

Multi-Mode Streaming Namespace Routing Custom Events

8. Customization & Advanced Configuration

Beyond the core architecture, the Deep Agents SDK exposes a rich customization surface — custom tools, connection resilience, the built-in middleware stack, tool call interception, and structured output. This section covers the configuration options that let you tailor agents for production environments.

8.1 Custom Tools

In addition to the built-in filesystem and planning tools, you can pass any Python function as a custom tool. Functions are automatically converted using LangChain's tool infrastructure — the docstring becomes the tool description and type hints define the parameter schema:

# pip install deepagents langchain-anthropic
from typing import Literal
from deepagents import create_deep_agent

def get_weather(
    city: str,
    unit: Literal["celsius", "fahrenheit"] = "celsius",
) -> str:
    """Get the current weather for a city."""
    # In production, call a real weather API here
    return f"The weather in {city} is 22°{unit[0].upper()} and sunny."

def calculate_bmi(weight_kg: float, height_m: float) -> str:
    """Calculate Body Mass Index from weight (kg) and height (meters)."""
    bmi = weight_kg / (height_m ** 2)
    return f"BMI: {bmi:.1f}"

# Pass custom tools alongside the built-in filesystem and planning tools
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    tools=[get_weather, calculate_bmi],
    system_prompt="You are a helpful assistant with access to weather and health tools.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What's the weather like in Tokyo?"}]
})
print(result["messages"][-1].content)

                            
                            Key Insight: Custom tools coexist with the built-in tools (write_todos, ls, read_file, write_file, edit_file, execute, etc.). The agent sees all tools in its tool list and chooses which to use based on the task. You can also pass @tool-decorated functions or BaseTool subclasses for more control over schemas.
                        

8.2 Connection Resilience

LangChain chat models automatically retry failed API requests with exponential backoff. By default, models retry up to 6 times for network errors, rate limits (429), and server errors (5xx). Client errors like 401 (unauthorized) or 404 are not retried. You can tune this behavior for unreliable networks or long-running tasks:

# pip install deepagents langchain-anthropic
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

# Configure resilience for unreliable networks or rate-limited APIs
model = init_chat_model(
    "anthropic:claude-haiku-4-5",
    max_retries=12,   # Increase from default 6 for unreliable networks
    timeout=120,      # Seconds — increase for slow connections or large responses
)

agent = create_deep_agent(
    model=model,
    system_prompt="You are a research assistant. Be thorough and cite sources.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Summarize the latest trends in edge computing."}]
})
print(result["messages"][-1].content)

                            
                            Key Insight: For long-running agent tasks on unreliable networks, pair increased max_retries (10–15) with a checkpointer so that progress is preserved across failures. The checkpointer saves agent state after each step — if the process crashes, you can resume from the last checkpoint instead of starting over.
                        

8.3 Built-in Middleware Stack

Every Deep Agent comes with a default set of middleware that powers its core capabilities. Understanding this stack helps you know what's running under the hood and when additional middleware is automatically added:

Middleware	Always Active	Purpose
`TodoListMiddleware`	Yes	Tracks and manages to-do lists for organizing agent tasks
`FilesystemMiddleware`	Yes	Handles file system operations — reading, writing, and navigating directories
`SubAgentMiddleware`	Yes	Spawns and coordinates subagents for task delegation
`SummarizationMiddleware`	Yes	Condenses message history to stay within context limits
`AnthropicPromptCachingMiddleware`	Yes*	Reduces redundant token processing with Anthropic models (*only when using Anthropic provider)
`PatchToolCallsMiddleware`	Yes	Fixes message history when tool calls are interrupted or cancelled
`MemoryMiddleware`	No	Persists and retrieves context across sessions (added when `memory` argument is provided)
`SkillsMiddleware`	No	Enables progressive skill disclosure (added when `skills` argument is provided)
`HumanInTheLoopMiddleware`	No	Pauses for human approval (added when `interrupt_on` argument is provided)

Your custom middleware (passed via the middleware parameter) is added alongside the built-in stack, not as a replacement. This means you can intercept model calls, tool calls, or agent steps without losing any built-in functionality.

8.4 Custom Tool Call Middleware

While wrap_model_call intercepts LLM calls (Section 2.3), wrap_tool_call intercepts tool executions. This is useful for cross-cutting concerns like logging, metrics, rate limiting, or auditing every tool the agent invokes:

# pip install deepagents langchain-anthropic
from langchain.tools import tool
from langchain.agents.middleware import wrap_tool_call
from deepagents import create_deep_agent

@tool
def get_weather(city: str) -> str:
    """Get the weather in a city."""
    return f"The weather in {city} is sunny and 22°C."

call_count = [0]  # List to allow modification in nested function

@wrap_tool_call
def log_tool_calls(request, handler):
    """Intercept and log every tool call."""
    call_count[0] += 1
    tool_name = request.name if hasattr(request, "name") else str(request)

    print(f"[Middleware] Tool call #{call_count[0]}: {tool_name}")
    print(f"[Middleware] Arguments: {request.args if hasattr(request, 'args') else 'N/A'}")

    # Execute the actual tool call
    result = handler(request)

    print(f"[Middleware] Tool call #{call_count[0]} completed")
    return result

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    tools=[get_weather],
    middleware=[log_tool_calls],
    system_prompt="You are a helpful assistant. Use the weather tool when asked about weather.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What's the weather in London?"}]
})
print(result["messages"][-1].content)

                            
                            Concurrency Warning: Do not mutate instance attributes in custom middleware (self.x += 1). Subagents, parallel tools, and concurrent invocations on different threads can cause race conditions. Instead, use graph state to track values across hook invocations — graph state is scoped to a thread by design, so updates are safe under concurrency.
                        

Middleware Comparison

wrap_model_call vs wrap_tool_call

wrap_model_call (Section 2.3) intercepts LLM API calls — use it for model routing, A/B testing, telemetry, or injecting system messages. wrap_tool_call intercepts tool executions — use it for logging, rate limiting, access control, or auditing tool usage. Both follow the same handler pattern: receive a request, optionally modify it, call handler(request), and return the result.

wrap_model_call wrap_tool_call Handler Pattern

8.5 Structured Output

Deep Agents can return validated, typed data instead of free-form text. Pass a Pydantic model as the response_format argument, and the SDK ensures the agent's final response matches the schema. The validated data is returned in the structured_response key:

# pip install deepagents langchain-anthropic pydantic
from pydantic import BaseModel, Field
from deepagents import create_deep_agent

class CompanyAnalysis(BaseModel):
    """A structured analysis of a company."""
    name: str = Field(description="Company name")
    industry: str = Field(description="Primary industry")
    strengths: list[str] = Field(description="Key competitive strengths")
    risks: list[str] = Field(description="Primary risk factors")
    recommendation: str = Field(description="Buy, hold, or sell recommendation")
    confidence: float = Field(description="Confidence score from 0.0 to 1.0")

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    response_format=CompanyAnalysis,
    system_prompt="You are a financial analyst. Provide concise, data-driven analysis.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Analyze Apple Inc. as an investment."}]
})

# The structured response is a validated Pydantic model instance
analysis = result["structured_response"]
print(f"Company: {analysis.name}")
print(f"Industry: {analysis.industry}")
print(f"Strengths: {', '.join(analysis.strengths)}")
print(f"Risks: {', '.join(analysis.risks)}")
print(f"Recommendation: {analysis.recommendation} (confidence: {analysis.confidence})")

                            
                            Key Insight: Structured output transforms agents from chat-style assistants into reliable data producers. By constraining the output schema, you eliminate parsing ambiguity and can pipe agent results directly into downstream systems — databases, APIs, dashboards, or other agents. The Pydantic validation ensures type safety and catches malformed responses before they propagate.
                        

Customization Surface of create_deep_agent()

flowchart TD
    A["create_deep_agent()"] --> B["model"]
    A --> C["tools"]
    A --> D["system_prompt"]
    A --> E["middleware"]
    A --> F["subagents"]
    A --> G["backend"]
    A --> H["interrupt_on"]
    A --> I["skills"]
    A --> J["memory"]
    A --> K["response_format"]
    B --> B1["provider:model string
or init_chat_model()"]
    C --> C1["Custom functions
@tool decorators
BaseTool subclasses"]
    E --> E1["wrap_model_call
wrap_tool_call
AgentMiddleware"]
    G --> G1["StateBackend
StoreBackend
CompositeBackend
Sandbox"]
    K --> K1["Pydantic BaseModel
→ structured_response"]

Deep Agents v0.6 Update (May 2026)

Released May 13, 2026: Deep Agents v0.6 introduces five major capabilities: a code interpreter for programmatic tool calling, harness profiles for open-weight model support, DeltaChannel for efficient checkpoint storage, ContextHubBackend for versioned agent context, and streaming v3 with typed event projections. These updates were announced at Interrupt 2026.

9.1 Code Interpreter & Programmatic Tool Calling

The code interpreter gives agents a lightweight JavaScript runtime (QuickJS) where they can compose tool calls, manage intermediate state, and filter what returns to the model context. This enables Programmatic Tool Calling (PTC) — instead of the model making one tool call per turn and waiting for the result, it writes code that orchestrates multiple tool calls, processes results, and returns only what matters. This reduces token consumption and avoidable model round trips.

# pip install deepagents[quickjs]
from deepagents import create_deep_agent
from langchain_quickjs import REPLMiddleware

# Add the code interpreter as middleware
agent = create_deep_agent(
    model="openai:gpt-5.4",
    middleware=[REPLMiddleware()],
    system_prompt="Use the interpreter to compose tool calls efficiently."
)

# The agent can now write code like this inside the interpreter:
# const topics = ["retrieval", "memory", "evaluation"];
# const reports = await Promise.all(
#   topics.map(topic =>
#     tools.task({
#       description: `Research ${topic} and return 3 findings.`,
#       subagent_type: "general-purpose"
#     })
#   )
# );
# reports.join("\n\n");

                            
                            Key Insight: PTC is model-agnostic — any model that can write JavaScript benefits from it, including open-weight models like DeepSeek and Qwen. Anthropic popularized this pattern as an API behavior, but with the interpreter it works with any model. The interpreter also enables recursive workflows where agents maintain a queue of questions, call subagents, and expand the frontier based on results.
                        

9.2 Harness Profiles for Open-Weight Models

Open-weight models like Kimi K2.6, GLM 5.1, and DeepSeek V4 are now viable for production agent work at 20×+ lower cost than closed frontier models. The challenge is that each model is post-trained on different tool-calling formats and prompt conventions. A harness profile captures per-model overrides for the system prompt, tool descriptions, and middleware as a named, versionable unit:

# pip install deepagents langchain-google-genai
from deepagents import create_deep_agent

# Built-in profiles ship for major models — strong performance is the default
agent = create_deep_agent(
    model="baseten:zai-org/GLM-5",  # Open-weight model at fraction of frontier cost
    middleware=[REPLMiddleware()],
)

# In LangChain's own benchmarks, harness-layer changes alone moved:
# - gpt-5.2-codex: 52.8% → 66.5% on Terminal-Bench 2.0 (Top 30 → Top 5)
# - gpt-5.3-codex: +20% on tau2-bench
# - opus-4.7: +10% on tau2-bench
# Across tau2-bench, prompts and middleware can move scores 10-20 points
# without changing the model.

9.3 DeltaChannel — Efficient Checkpoint Storage

As agents run longer (dozens to hundreds of steps), the default full-snapshot checkpointing model causes storage to grow at O(N²) because each checkpoint contains everything from all previous steps. DeltaChannel is a new LangGraph 1.2 primitive that stores only the diff at each step, with periodic full snapshots every K steps. For Deep Agents, both messages and files are delta-backed by default in v0.6:

# DeltaChannel is ON by default in deepagents v0.6
# No configuration required — just upgrade:
# pip install -U deepagents

# Benchmark results (simulated 200-turn multi-file coding session):
# Without DeltaChannel: 5.27 GB checkpoint storage
# With DeltaChannel:    129 MB (41× reduction)

# For custom LangGraph state, you can use DeltaChannel directly:
from typing_extensions import Annotated
from langgraph.channels.delta import DeltaChannel

def append(state: list[str], writes: list[list[str]]) -> list[str]:
    return state + [item for batch in writes for item in batch]

from typing import TypedDict

class MyAgentState(TypedDict):
    items: Annotated[
        list[str],
        DeltaChannel(reducer=append, snapshot_frequency=50)
    ]

                            
                            Reducer Contract: DeltaChannel requires batching-invariant reducers: reducer(reducer(s, [w1, w2]), [w3, w4]) == reducer(s, [w1, w2, w3, w4]). The new signature is reducer(state: T, writes: list[T]) -> T (receives all writes since last load in one call). If your reducer violates batching-invariance, delta channel state will silently diverge from a full snapshot.
                        

9.4 ContextHub Backend

Context Hub is a LangSmith-backed versioned filesystem for Deep Agents. It stores the AGENTS.md files, skills, policies, and memories that shape agent behavior — so improvements from one run carry forward to the next. Writes land as commits with history, review, and environment tagging (dev/staging/prod):

# pip install deepagents langsmith
from deepagents import create_deep_agent
from deepagents.backends import ContextHubBackend, CompositeBackend, StateBackend

# Option 1: Use Context Hub as the entire filesystem backend
agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=ContextHubBackend("my-agent"),
)

# Option 2: Scope just /memories/ to Hub, keep rest thread-scoped
agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": ContextHubBackend("my-agent"),
        },
    ),
)

# Reads are served from cache; writes are committed back to the Hub repo.
# If the repo doesn't exist yet, the first write creates it.
# Requires LANGSMITH_API_KEY environment variable.

9.5 Streaming v3 — Typed Event Projections

The new streaming v3 protocol replaces raw event parsing with typed, subscribable projections for messages, tool calls, subagents, reasoning blocks, and custom channels. Frontend integrations ship for React, Vue, Svelte, and Angular:

# pip install deepagents
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[get_weather],
)

# Streaming v3 — typed projections instead of raw events
stream = agent.stream_events(
    {"messages": [{"role": "user", "content": "Research LangChain streaming"}]},
    version="v3",
)

# Subscribe to message text deltas
for message in stream.messages:
    for delta in message.text:
        print(delta, end="", flush=True)

# Subscribe to subagent progress
for subagent in stream.subagents:
    print(f"\n[{subagent.name}] {subagent.status}")
    for message in subagent.messages:
        for delta in message.text:
            print(delta, end="", flush=True)

Frontend Integrations

Framework-Specific Streaming Packages (v1)

Framework	Package	Key Features
React	`@langchain/react`	Hooks for message streaming, tool call rendering
Vue	`@langchain/vue`	Composables for reactive agent state
Svelte	`@langchain/svelte`	Stores for streamed agent events
Angular	`@langchain/angular`	Services and observables for agent streams

Streaming v3 Frontend Agent Protocol

9.6 Managed Deep Agents

Managed Deep Agents provides an API-first hosted runtime for creating, running, and operating deep agents in LangSmith. No need to stand up your own agent server — define agents using the familiar Deep Agents project structure, manage them through the /v1/deepagents API, and inspect every run in LangSmith:

# Managed Deep Agents — create and run agents via LangSmith API
# Key features:
# - Durable threads, streaming runs, checkpointing
# - Human-in-the-loop workflows for long-running tasks
# - Agent context/files: AGENTS.md, skills/, subagents/, tools.json
# - Context Hub for memory, operating notes, user preferences
# - Sandbox-backed execution for code, shell, file I/O

# Deploy via LangSmith CLI:
# langsmith agents create my-agent --project-dir ./my-agent/
# langsmith agents run my-agent --message "Analyze this dataset"

                            
                            When to Use Managed vs. Self-Hosted: Use Managed Deep Agents when you want zero-infrastructure deployment with built-in observability. Use self-hosted (LangGraph Platform or custom) when you need full control over the runtime, custom networking, or on-premise data residency requirements.
                        

Exercises & Challenges

Apply your understanding of the Deep Agents SDK with these hands-on exercises. Each builds on a different aspect of the harness architecture.

Exercise 1

Build a Deep Agent with Custom Subagents

Create a main agent with two specialized subagents: a research agent that uses web search tools and a code agent that writes Python scripts. The main agent should delegate research to the research agent and coding to the code agent. Test it with: "Research the best sorting algorithms for nearly-sorted data, then implement the top two in Python."

create_deep_agent subagents Task Delegation

Exercise 2

Implement Context Engineering

Build an agent with a custom system prompt, an AGENTS.md memory file, and two SKILL.md skill files. The memory file should contain persistent coding standards. The skill files should cover "API design" and "database optimization". Invoke the agent with a database-related question and verify that it loads the optimization skill on demand.

system_prompt memory skills

Exercise 3

Sandbox Integration

Set up a Deep Agent with a sandbox backend (Daytona or Modal). Configure it to: (1) accept a CSV file upload, (2) analyze the data using pandas inside the sandbox, (3) generate a visualization, and (4) download the result. Add filesystem permissions that restrict the agent to the /data/ and /output/ directories.

Sandbox File Upload Permissions

Exercise 4

Streaming Dashboard

Build a Python script that streams agent execution with stream_mode=["updates", "messages"], subgraphs=True, and version="v2". Parse the namespace (ns) field to identify events from the main agent vs. subagents. Display: (1) token-by-token output, (2) tool calls with arguments, and (3) subagent delegation events.

Streaming v2 Format Namespace Routing

Exercise 5

Structured Output & Tool Call Middleware

Create a Deep Agent with a response_format Pydantic model that returns a structured project summary (name, tech stack, estimated hours, risk level). Add a @wrap_tool_call middleware that logs every tool invocation to a list. After the agent completes, print both the structured_response and the tool call log. Verify that the Pydantic model validates correctly and that all tool calls were captured.

response_format wrap_tool_call Pydantic

Reflection

Reflective Questions

When would you choose a CompositeBackend over a plain StateBackend? What are the tradeoffs in complexity vs. capability?
How does context quarantine via subagents differ from simply using a longer context window? When is each approach preferable?
What are the risks of giving an agent unrestricted execute permissions? How would you design a permission scheme for a production coding assistant?
How does the to-do list planning mechanism improve agent reliability compared to a flat ReAct loop? What failure modes does it address?
When should you use skills (progressive disclosure) vs. memory (always loaded) vs. system prompt (prepended)? Design a context strategy for a multi-domain support agent.
How does response_format with structured output change the way you integrate agents into larger systems? What are the tradeoffs vs. free-form text output with post-hoc parsing?

Deep Agent Design Generator

Use this interactive tool to design and document a Deep Agent architecture. Fill in the fields below and download your design document as Word, Excel, PDF, or PowerPoint.

Deep Agent Architecture Designer

Design your Deep Agent configuration. Download as Word, Excel, PDF, or PowerPoint.

Draft auto-saved

Agent Name *

Model Provider *

Backend Type *

Author Name

Custom Tools

Subagent Configuration

Permissions & Safety

Conclusion & Next Steps

In this article, we explored the LangChain Deep Agents SDK — a batteries-included library that assembles autonomous agents from a single create_deep_agent() call. Here are the key takeaways:

                            
                            Key Takeaways:
                            Agent Harness: Deep Agents combine planning (to-do lists), filesystem access, context management, subagent delegation, and code execution into a unified harness.
Models & Middleware: Use the provider:model format for any supported LLM, and middleware functions for runtime model routing and interception.
Context Engineering: Automatic offloading (>20k tokens), summarization (at 85% capacity), and memory files (AGENTS.md) keep long-running agents within their context budget.
Backends & Sandboxes: Choose from StateBackend (in-memory), StoreBackend (persistent), or CompositeBackend (mixed). Integrate sandbox providers for safe code execution.
Subagent Delegation: Context quarantine via subagents is the primary mechanism for scaling agent complexity without exceeding context limits.
Safety: Combine permissions, interrupt-on, sandboxing, network restrictions, and output validation for defense-in-depth security.
Memory, Skills & Streaming: Persistent memory via CompositeBackend, progressive skill disclosure via SKILL.md, and multi-mode streaming for real-time UIs.
Customization: Extend agents with custom tools, wrap_tool_call/wrap_model_call middleware, connection resilience (max_retries, timeout), and response_format for structured Pydantic output.

                        

Next in the Series

In Part 10: Multi-Agent Systems, we'll scale beyond single-agent architectures to explore supervisor patterns, swarm intelligence, agent debates, and role-based collaboration — building systems where multiple agents coordinate to solve problems that no single agent can tackle alone.

Previous Part 8: LangGraph — Stateful Agent Workflows Next Part 10: Multi-Agent Systems

Cookie Consent

Cookie Preferences