Back to AI App Dev Series

LangChain SDK Track Part 8: Deep Agents & Harness Framework

May 22, 2026 Wasil Zafar 55 min read

The batteries-included agent harness built on LangGraph — create_deep_agent(), planning engine, virtual filesystem, context engineering (offloading & summarization), subagent delegation, skills, memory, middleware, permissions, and production deployment patterns.

Table of Contents

  1. The Agent Harness
  2. Context Engineering
  3. Backends & Filesystem
  4. Subagents & Delegation
  5. Skills & Memory
  6. Middleware & HITL
  7. Production Deployment
What You’ll Learn: Deploying LangChain applications to production means solving challenges beyond just ‘the code works’: LangServe for API deployment, rate limiting, caching, monitoring, cost management, and graceful degradation. This article is your production checklist — from Docker packaging to load testing to the patterns that keep your application running reliably when real users hit it at scale.

1. The Agent Harness

Deep Agents is a standalone library built on top of LangChain’s core building blocks and the LangGraph runtime. It provides an opinionated but extensible agent harness — the same core tool-calling loop as other agent frameworks, but with built-in capabilities for planning, virtual filesystems, context management, subagent delegation, and code execution.

Rather than manually wiring LangGraph nodes, edges, and state schemas, Deep Agents gives you one function — create_deep_agent() — that assembles a production-ready agent graph with sensible defaults, fully customizable at every layer.

Key Insight: Deep Agents is not a replacement for LangGraph — it is built on LangGraph. Think of it as a high-level SDK that assembles a production-ready agent graph for you, similar to how a web framework assembles middleware, routing, and templating into a single create_app() call.
Deep Agent Harness Architecture
flowchart TD
    A["create_deep_agent()"] --> B["Agent Harness"]
    B --> C["Planning Engine
(write_todos)"] B --> D["Virtual Filesystem
(ls, read, write, edit, glob, grep)"] B --> E["Context Manager
(offload, summarize, compress)"] B --> F["Subagent Orchestrator
(task delegation)"] B --> G["Sandbox Runtime
(code execution)"] C --> H["To-Do List State"] D --> I["Backend Protocol
(State / Store / Composite)"] F --> J["Custom Subagents"] F --> K["General-Purpose Subagent"] G --> L["Interpreters / Sandbox"]

1.1 The create_deep_agent() API

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Minimal Deep Agent — just a model and a system prompt
agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    system_prompt="You are a research assistant. Be thorough and cite sources.",
)

# Invoke the agent
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Summarize the latest advances in protein folding."}]}
)
print(result["messages"][-1].content)

The full API signature with all configuration options:

Parameter Type Description
modelstr | BaseChatModelModel in provider:model format or pre-configured instance
toolslist[BaseTool | Callable]Custom tools alongside built-in filesystem tools
system_promptstrPrepended to the built-in harness prompt
middlewarelist[AgentMiddleware]Model call middleware for runtime interception
subagentslist[SubAgent | CompiledSubAgent]Subagent definitions for task delegation
backendBackendProtocolStorage backend for the virtual filesystem
interrupt_ondict[str, bool]Tools requiring human approval before execution
permissionslist[FilesystemPermission]Filesystem access control rules
skillslist[str]Paths to SKILL.md files for progressive disclosure
memorylist[str]Paths to AGENTS.md memory files (always loaded)
context_schematype[ContextT]Typed runtime context shape
checkpointerCheckpointerState persistence for conversations
storeBaseStoreCross-thread persistent storage

1.2 Planning with To-Do Lists

Every Deep Agent has a built-in write_todos tool that maintains a structured task list. For complex requests, the agent breaks work into discrete tasks and works through each item systematically:

# pip install deepagents langchain-openai
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="openai:gpt-4.1-mini",
    system_prompt="You are a technical writer. Break complex requests into clear steps.",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Write a comprehensive comparison of PostgreSQL vs MySQL for a new SaaS app."
    }]
})

# The agent internally creates a to-do list:
# [x] Research PostgreSQL strengths for SaaS workloads
# [x] Research MySQL strengths for SaaS workloads
# [x] Compare performance benchmarks
# [x] Write recommendation summary
# Then works through each item, checking them off as it goes.

print(result["messages"][-1].content)
Key Insight: The to-do list is not just a prompt trick — it is persisted in the agent’s state and survives context compression. When the conversation history is summarized, the to-do list is preserved in full, ensuring the agent never loses track of its plan.

2. Context Engineering

The most critical challenge for long-running agents is context management. An agent executing a 50-step task may generate hundreds of tool calls and thousands of tokens. Without active management, the conversation exceeds the context window and the agent fails. Deep Agents solve this with a layered context engineering system.

Context Type When Loaded Persistence Example
Input ContextEvery turnPer-threadSystem prompt, memory files (AGENTS.md), skill frontmatter
Runtime ContextPer invocationPer-callUser identity, API keys, session metadata
CompressionAt 85% capacityAutomaticOffloading tool results, summarizing history
IsolationPer subagentScopedSubagent work stays in its own context
Long-term MemoryOn demandCross-threadPreferences in /memories/ path

2.1 Input Context

Input context is the foundation — content loaded into every turn. The system prompt you provide is prepended to the harness’s built-in prompt:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    system_prompt="""You are a senior data engineer specializing in ETL pipelines.

Rules:
- Always validate data schemas before transformation
- Use incremental loading over full refreshes when possible
- Log all data quality issues to the /quality-reports/ directory""",
    memory=["AGENTS.md"],  # Always loaded into context alongside system prompt
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Design an ETL pipeline for daily sales data."}]
})
print(result["messages"][-1].content)

2.2 Runtime Context

Runtime context carries per-invocation data (user identity, API keys, roles) that tools can access but is never visible to the LLM — creating a security boundary for sensitive values:

# pip install deepagents langchain-anthropic
from dataclasses import dataclass
from deepagents import create_deep_agent
from langchain.tools import tool, ToolRuntime

@dataclass
class UserContext:
    user_id: str
    user_role: str    # "admin" | "viewer"
    api_key: str      # Never visible to the LLM

@tool
def fetch_user_orders(query: str, runtime: ToolRuntime[UserContext]) -> str:
    """Fetch orders for the currently authenticated user.

    Args:
        query: Search filter for orders
    """
    user_id = runtime.context.user_id
    return f"Orders for {user_id} matching '{query}': [order-001, order-002]"

@tool
def update_order_status(order_id: str, status: str, runtime: ToolRuntime[UserContext]) -> str:
    """Update an order's status. Requires admin role.

    Args:
        order_id: The order to update
        status: New status value
    """
    if runtime.context.user_role != "admin":
        return f"Permission denied: admin role required (your role: {runtime.context.user_role})"
    return f"Order {order_id} updated to '{status}'"

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[fetch_user_orders, update_order_status],
    context_schema=UserContext,
    system_prompt="You are an order management assistant.",
)

# Context is passed per invocation — different per user/session
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Show my recent orders and ship order-001."}]},
    context=UserContext(user_id="user-789", user_role="admin", api_key="sk-secret"),
)
print(result["messages"][-1].content)

2.3 Compression & Offloading

The harness automatically manages context window usage with two mechanisms:

  • Offloading (at 20k tokens): Large tool results are saved to the virtual filesystem, replaced with a reference pointer
  • Summarization (at 85% capacity): Conversation history is compressed into a structured summary preserving the to-do list, key findings, and file paths
Context Lifecycle in a Long-Running Session
stateDiagram-v2
    [*] --> Normal: Agent starts
    Normal --> Offloading: Tool result > 20k tokens
    Offloading --> Normal: Content saved to filesystem
    Normal --> Summarization: Context at 85% capacity
    Summarization --> Compressed: Summary replaces history
    Compressed --> Normal: Agent continues with summary
    Normal --> [*]: Task complete
                        
# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Context compression is automatic — no configuration needed
# When a tool returns a large result:
#   1. Result saved to filesystem at a generated path
#   2. Message history replaces result with reference pointer
#   3. Agent can read_file to retrieve specific sections as needed

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    system_prompt="""You are a research agent. When summarizing progress, preserve:
1. Current to-do list with status
2. Key findings and decisions
3. All file paths created/modified
4. Pending tasks and dependencies""",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research all OWASP Top 10 vulnerabilities with code examples for each."
    }]
})
print(result["messages"][-1].content)

3. Backends & Virtual Filesystem

3.1 Virtual Filesystem Tools

Every Deep Agent has access to a configurable virtual filesystem, regardless of which backend is configured:

Tool Description Example
lsList directory contents with metadatals(path="/")
read_fileRead file contents (text + multimodal)read_file(path="/report.md")
write_fileCreate new fileswrite_file(path="/output.csv", content="...")
edit_fileExact string replacements in filesedit_file(path="/main.py", old="...", new="...")
globFind files matching patternsglob(pattern="**/*.py")
grepSearch file contentsgrep(pattern="TODO", path="/")
executeRun shell commands (sandbox only)execute(command="python main.py")

3.2 Backend Types

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent
from deepagents.backends import StateBackend, CompositeBackend, StoreBackend
from langgraph.store.memory import InMemoryStore

# 1. StateBackend (default) — in-memory, thread-scoped
#    Files exist only for the duration of the conversation
agent_memory = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    # backend=StateBackend()  # This is the default
)

# 2. StoreBackend — persists files across threads via LangGraph Store
store = InMemoryStore()  # Use PostgresStore for production
agent_persistent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    backend=StoreBackend(store=store, namespace=["project", "docs"]),
)

# 3. CompositeBackend — routes paths to different backends
agent_composite = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    backend=CompositeBackend(
        StateBackend(),  # Default for unmatched paths
        routes={
            "/memories/": StoreBackend(store=store, namespace=["user", "memory"]),
        },
    ),
)

result = agent_composite.invoke({
    "messages": [{"role": "user", "content": "Remember that I prefer Python over JavaScript."}]
})
print(result["messages"][-1].content)

3.3 Sandbox Execution

For agents that need to run generated code, sandbox backends provide isolated execution environments with an execute tool:

# pip install deepagents langchain-anthropic langsmith
from deepagents import create_deep_agent
from deepagents.backends import LangSmithSandbox
from langsmith.sandbox import Sandbox

# Create a sandboxed agent
sandbox = Sandbox()  # Requires LANGSMITH_API_KEY
backend = LangSmithSandbox(sandbox)

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    backend=backend,
    system_prompt="""You are a data scientist. When you need to run code:
1. Write the script to the filesystem
2. Use the execute tool to run it
3. Analyze the output""",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Generate a Monte Carlo simulation for option pricing and run it."
    }]
})
print(result["messages"][-1].content)
Security: Never put API keys or credentials inside the sandbox. If the agent needs authenticated API access, create a dedicated tool that runs outside the sandbox and handles authentication on behalf of the agent.
Real-World Application

From Notebook to 10K Users

A startup launched their LangChain-powered product to 10,000 users. Key production decisions: LangServe for the API layer, Redis for response caching (saved 40% of API costs), a circuit breaker for graceful degradation during OpenAI outages, and LangSmith monitoring with alerts on latency spikes. They went from prototype to production in 3 weeks with zero downtime incidents in the first month.

Production DeploymentCachingCircuit Breaker

4. Subagents & Task Delegation

Subagents solve the context bloat problem. When the main agent encounters a heavy task (web search, large file analysis), it spawns a subagent with a fresh context window. The subagent completes the task and returns only the final result — keeping the main agent’s context clean.

Subagent Context Isolation
flowchart LR
    A["Main Agent
(clean context)"] -->|"task tool"| B["Subagent
(fresh context)"] B --> C["Tool Call 1
50k tokens"] B --> D["Tool Call 2
30k tokens"] B --> E["Tool Call N
25k tokens"] C & D & E --> F["Final Summary
~300 words"] F -->|"single result"| A

4.1 SubAgent Configuration

# pip install deepagents langchain-anthropic langchain-openai tavily-python
import os
from deepagents import create_deep_agent
from tavily import TavilyClient

tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def internet_search(query: str, max_results: int = 5) -> str:
    """Run a web search using Tavily."""
    results = tavily_client.search(query, max_results=max_results)
    return str(results)

# Define specialized subagents
research_subagent = {
    "name": "research-agent",
    "description": "Researches topics using web search. Delegate here for information gathering.",
    "system_prompt": "You are a research specialist. Search thoroughly and synthesize findings.",
    "tools": [internet_search],
    "model": "openai:gpt-4.1-mini",  # Cheap and fast for search tasks
}

code_subagent = {
    "name": "code-agent",
    "description": "Writes, tests, and debugs code. Delegate here for programming tasks.",
    "system_prompt": "You are a senior Python developer. Write clean, tested code.",
    "tools": [],  # Uses filesystem tools by default
    "model": "anthropic:claude-sonnet-4-6",  # Capable model for code
}

# Main agent delegates to specialized subagents
agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Cheap model for orchestration
    subagents=[research_subagent, code_subagent],
    system_prompt="You are a project lead. Delegate research to research-agent and coding to code-agent.",
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research rate limiting best practices, then write a Python implementation."
    }]
})
print(result["messages"][-1].content)

4.2 CompiledSubAgent (Custom LangGraph Graphs)

For complex workflows, provide a pre-compiled LangGraph graph as a subagent:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent, CompiledSubAgent
from langchain.agents import create_agent

# Create a custom agent graph using LangChain's create_agent
custom_graph = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[internet_search],
    prompt="You are a specialized data analysis agent...",
)

# Wrap it as a CompiledSubAgent
data_analyzer = CompiledSubAgent(
    name="data-analyzer",
    description="Specialized agent for complex data analysis tasks",
    runnable=custom_graph,
)

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",
    subagents=[data_analyzer],
    system_prompt="Delegate data analysis tasks to the data-analyzer subagent.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Analyze Q1 sales trends by region."}]
})
print(result["messages"][-1].content)

4.3 Context Isolation & Propagation

Key Insight: Runtime context propagates automatically from the main agent to all subagents. Each subagent receives the same UserContext (user_id, role, etc.), so multi-agent workflows stay consistently authorized without manual plumbing. Conversation history does not propagate — each subagent starts with a fresh context window.

5. Skills & Memory

5.1 Skills System

Skills provide specialized workflows and domain knowledge using progressive disclosure — the agent reads only the frontmatter at startup and loads full skill content on demand when relevant:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Skills are SKILL.md files with YAML frontmatter
# The agent loads frontmatter at startup, reads full content when needed
agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    skills=[
        "/skills/database-optimization/",
        "/skills/api-design/",
        "/skills/testing-patterns/",
    ],
    system_prompt="You are a backend engineer. Use your skills when relevant.",
)

# The agent will only load the full "database-optimization" skill
# content when the task requires database work
result = agent.invoke({
    "messages": [{"role": "user", "content": "Optimize the slow query on the orders table."}]
})
print(result["messages"][-1].content)

5.2 Memory Files (AGENTS.md)

Memory files are always loaded into context (unlike skills, which use progressive disclosure). Use them for persistent instructions, coding standards, and domain knowledge:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

# Memory files are always loaded — they survive context compression
agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    memory=["AGENTS.md"],  # Project-level conventions and preferences
    system_prompt="You are a code assistant for the acme-api project.",
)

# AGENTS.md might contain:
# - Code style preferences
# - Project architecture decisions
# - Naming conventions
# - Testing requirements
# All of this is available every turn, regardless of context compression.

result = agent.invoke({
    "messages": [{"role": "user", "content": "Add a new endpoint for user preferences."}]
})
print(result["messages"][-1].content)

6. Middleware, HITL & Permissions

6.1 Custom Middleware

Middleware intercepts every model call for routing, logging, or transformation:

# pip install deepagents langchain-anthropic langchain-openai
from langchain.agents.middleware.types import AgentMiddleware
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent

class RouteByComplexity(AgentMiddleware):
    """Route simple messages to a cheaper model, complex ones to a capable model."""

    def wrap_model_call(self, request, handler):
        last_msg = request.messages[-1].content if request.messages else ""

        if len(last_msg) < 200:
            request = request.override(model=init_chat_model("openai:gpt-4.1-mini"))
        else:
            request = request.override(model=init_chat_model("anthropic:claude-sonnet-4-6"))

        return handler(request)

agent = create_deep_agent(
    model="anthropic:claude-haiku-4-5",  # Default (cheap)
    middleware=[RouteByComplexity()],
    system_prompt="You are a research assistant.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
})
print(result["messages"][-1].content)

6.2 Human-in-the-Loop

The interrupt_on parameter pauses execution before critical tool calls, returning control for human approval:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    interrupt_on={
        "execute": True,     # Pause before running shell commands
        "write_file": True,  # Pause before creating files
        "edit_file": True,   # Pause before editing files
    },
    system_prompt="You are a code assistant. Explain what you plan to do before doing it.",
)

# First invocation — agent plans work and hits the interrupt
result = agent.invoke({
    "messages": [{"role": "user", "content": "Fix the bug in main.py"}]
})

# Check for pending interrupt
if "interrupt" in result:
    pending = result["interrupt"]
    print(f"Agent wants to call: {pending['tool']}")
    print(f"Arguments: {pending['args']}")

    # Approve and continue:
    # result = agent.invoke({"messages": [], "approve": True})
    # Reject with feedback:
    # result = agent.invoke({"messages": [], "approve": False, "feedback": "Don't edit that."})

6.3 Filesystem Permissions

Declarative rules restrict filesystem access with first-match-wins semantics:

# pip install deepagents langchain-anthropic
from deepagents import create_deep_agent
from deepagents.middleware.permissions import FilesystemPermission

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    permissions=[
        # Allow read/write to workspace
        FilesystemPermission(operations=["read", "write"], paths=["/workspace/**"], mode="allow"),
        # Deny access to secrets
        FilesystemPermission(operations=["read", "write"], paths=["**/.env", "**/secrets/**"], mode="deny"),
        # Allow read-only access to docs
        FilesystemPermission(operations=["read"], paths=["/docs/**"], mode="allow"),
    ],
    system_prompt="You are a code assistant with restricted file access.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Read the .env file and show me the API keys."}]
})
# Agent will be denied access to .env files
print(result["messages"][-1].content)

7. Production Deployment

Capability Development Production
BackendStateBackend (in-memory)StoreBackend with PostgresStore
CheckpointerMemorySaverAsyncPostgresSaver
SandboxLocal executionLangSmith Sandbox (isolated)
TracingOptionalLangSmith with LANGCHAIN_TRACING_V2=true
HostingLocal invocationManaged Deep Agents or Agent Server
# pip install deepagents langchain-anthropic langgraph-checkpoint-postgres
import os
from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.store.postgres import AsyncPostgresStore

# Production configuration
async def create_production_agent():
    # Persistent store for cross-thread data
    store = AsyncPostgresStore(conn_string=os.environ["DATABASE_URL"])

    # Durable checkpointer for conversation state
    checkpointer = await AsyncPostgresSaver.from_conn_string(os.environ["DATABASE_URL"])

    agent = create_deep_agent(
        model="anthropic:claude-sonnet-4-6",
        system_prompt="You are a production assistant.",
        backend=CompositeBackend(
            StateBackend(),
            routes={"/memories/": StoreBackend(store=store, namespace=["user", "prefs"])},
        ),
        checkpointer=checkpointer,
        store=store,
        interrupt_on={"execute": True},  # Safety gate for code execution
    )

    return agent
Framework Comparison

LangChain vs LangGraph vs Deep Agents

LangChain provides core building blocks (models, tools, prompts). LangGraph adds a stateful runtime (graphs, persistence, human-in-the-loop). Deep Agents adds the harness layer on top — opinionated built-in tools (planning, filesystem, subagents) with automatic context management. Choose the level of abstraction that matches your needs.

LangChain = Blocks LangGraph = Runtime Deep Agents = Harness
Try It Yourself: Deploy a LangChain application to production: (1) wrap your chain/agent in a LangServe API endpoint, (2) add Redis caching for repeated queries (TTL: 1 hour), (3) implement a circuit breaker that returns cached responses when the LLM API is down, (4) add structured logging with request IDs, (5) write a load test that simulates 50 concurrent users. Measure p50 and p99 latency.