1. Container Sandboxing
Agents that execute code (via Bash tool) must be sandboxed. Containers provide process isolation, filesystem scoping, and network control:
# Dockerfile for agent sandbox environment
FROM python:3.12-slim
# Install only what the agent needs
RUN pip install anthropic mcp pydantic
# Create non-root user for agent execution
RUN useradd -m -s /bin/bash agent
USER agent
# Set workspace directory
WORKDIR /workspace
# Agent can only write to /workspace
# No access to host filesystem, no root privileges
# Network access controlled by Docker network settings
import subprocess
import json
def run_agent_in_sandbox(task: str, workspace_path: str) -> dict:
"""Execute an agent task inside a Docker container."""
# Build container with restricted access
container_config = {
"image": "agent-sandbox:latest",
"volumes": {
workspace_path: {"bind": "/workspace", "mode": "rw"}
},
"network_mode": "none", # No network access
"mem_limit": "2g", # Memory cap
"cpu_period": 100000,
"cpu_quota": 50000, # 50% CPU cap
"read_only": False, # /workspace is writable
"security_opt": ["no-new-privileges"], # Cannot escalate
"environment": {
"ANTHROPIC_API_KEY": "${ANTHROPIC_API_KEY}",
"TASK": task
}
}
# Run with timeout
result = subprocess.run(
["docker", "run", "--rm",
"--network", "none",
"--memory", "2g",
"--cpus", "0.5",
"--security-opt", "no-new-privileges",
"-v", f"{workspace_path}:/workspace",
"-e", f"TASK={task}",
"agent-sandbox:latest",
"python", "run_agent.py"],
capture_output=True, text=True, timeout=300
)
return {
"exit_code": result.returncode,
"output": result.stdout,
"errors": result.stderr
}
2. Cloud Deployment Patterns
flowchart TD
U["User Request"] --> LB["Load Balancer"]
LB --> API["API Gateway"]
API --> Q["Task Queue"]
Q --> W1["Agent Worker 1
(Container)"]
Q --> W2["Agent Worker 2
(Container)"]
Q --> W3["Agent Worker 3
(Container)"]
W1 --> CL["Claude API"]
W2 --> CL
W3 --> CL
W1 --> MCP["MCP Servers"]
W2 --> MCP
W3 --> MCP
# Production agent deployment considerations
# 1. Stateless workers (scale horizontally)
# - Each request gets a fresh container
# - No shared state between requests
# - Session state in external store (Redis, DB)
# 2. Rate limiting
# - Anthropic API has per-org rate limits
# - Implement client-side rate limiting
# - Queue requests during rate limit windows
# 3. Retry logic with exponential backoff
import time
import random
def api_call_with_retry(func, max_retries=3):
"""Retry API calls with exponential backoff + jitter."""
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if "rate_limit" in str(e).lower() and attempt < max_retries - 1:
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
else:
raise
# 4. Observability
# - Log every tool call (name, input hash, duration, success/fail)
# - Track token usage per request
# - Alert on repeated failures (circuit breaker pattern)
Zero-Downtime Agent Migration
A fintech company migrated their agent from Claude 3 to Claude 3.5 without downtime using environment-based canary deployment: 5% of traffic to the new model for 24 hours, automated quality checks comparing output distributions, then gradual rollout to 100%. One regression was caught at 10% and rolled back automatically.
3. Security Boundaries
flowchart TD
subgraph L1["Layer 1: Network Isolation"]
subgraph L2["Layer 2: Filesystem Scoping"]
subgraph L3["Layer 3: Tool Permissions"]
subgraph L4["Layer 4: Secret Management"]
subgraph L5["Layer 5: Audit Logging"]
AGENT["Agent
Execution"]
end
end
end
end
end
N1["No internet access
Allowed-list only"] -.-> L1
N2["Read/write within workspace only
Ephemeral storage"] -.-> L2
N3["Explicit allowlist per role
No wildcard access"] -.-> L3
N4["Vault-injected credentials
Never in context window"] -.-> L4
N5["Every tool call logged
Immutable audit trail"] -.-> L5
style L1 fill:#f8f9fa,stroke:#3B9797
style L2 fill:#f0f8f8,stroke:#16476A
style L3 fill:#e8f4f4,stroke:#132440
style L4 fill:#fff5f5,stroke:#BF092F
style L5 fill:#fafafa,stroke:#666
security_config = {
"network": {
"mode": "restricted",
"allowed_hosts": ["api.anthropic.com"],
"blocked_ports": [22, 3306, 5432] # No SSH, no direct DB
},
"filesystem": {
"writable_paths": ["/workspace"],
"readable_paths": ["/workspace", "/shared/docs"],
"blocked_paths": ["/etc", "/root", "/var/secrets"]
},
"execution": {
"max_duration_seconds": 300,
"max_memory_mb": 2048,
"allow_network_tools": False,
"allow_subprocess": True # For Bash tool
}
}
4. Production Checklist
Production Agent Deployment Checklist
- ✅ Container isolation (no host filesystem access)
- ✅ Network restrictions (allowlist, not blocklist)
- ✅ Rate limiting (client-side + API-side)
- ✅ Timeout enforcement (per-tool and per-session)
- ✅ Secret management (never in code or env vars)
- ✅ Audit logging (immutable, every tool call)
- ✅ Error budget monitoring (alert on failure rate)
- ✅ Graceful degradation (fallback when API unavailable)
5. Hosting Patterns & Isolation (CCA 6.1)
The current Agent SDK does not expose an environments CRUD API. Deployment happens in your infrastructure: how you host the claude subprocess, what filesystem and network it can reach, and how you persist transcripts when containers move across hosts.
5.1 Hosting Patterns
# Hosting patterns for the Agent SDK
# Docs: https://code.claude.com/docs/en/agent-sdk/hosting
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage
# EPHEMERAL: one container / process per task
async def ephemeral_review():
async for msg in query(
prompt="Review this pull request for security issues.",
options=ClaudeAgentOptions(
cwd="/work/review-42", # isolate filesystem state per session
max_turns=20,
allowed_tools=["Read", "Glob", "Grep"],
permission_mode="plan",
),
):
if isinstance(msg, ResultMessage) and msg.subtype == "success":
print(msg.result)
# LONG-RUNNING: keep a session alive across multiple turns in one process
# In Python, use ClaudeSDKClient when the same process owns the conversation.
# HYBRID: resume a captured session ID in a fresh container and mirror
# transcripts via SessionStore when the session must survive host restarts.
print("Hosting patterns: ephemeral | long-running | hybrid | multi-agent container")
print("Each active session maps to its own claude subprocess and transcript file")
5.2 Isolation Technology Comparison
| Technology | Security Strength | Operational Complexity | Best For |
|---|---|---|---|
| Sandbox runtime | Good secure defaults | Low | Single-developer and CI runs needing fast isolation |
| Containers | Setup-dependent | Medium | Most self-hosted agent deployments |
| gVisor | Excellent | Medium | Multi-tenant or semi-trusted code execution |
| VMs / microVMs | Excellent | High | Highest-isolation workloads and strong security boundaries |
SessionStore. (3) Choose isolation based on threat model: sandbox runtime, containers, gVisor, or VMs. (4) Per-session cwd is a core isolation boundary. (5) Network controls and credential injection live in your deployment architecture, not a Claude-managed environment object.
6. Self-Hosted Environments (CCA 6.2)
6.0 Permission Modes (Agent SDK)
The permission_mode option controls whether an agent asks for approval before using tools. This is critical for choosing the right autonomy level for each deployment environment:
# Permission Modes — Control Agent Autonomy
# Docs: https://code.claude.com/docs/en/agent-sdk/agent-loop#permission-mode
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage
# --- Mode 1: "default" (Interactive — prompts for unapproved tools) ---
# Tools in allowed_tools run freely; everything else triggers approval callback
async def interactive_agent():
async for msg in query(
prompt="Fix the failing tests",
options=ClaudeAgentOptions(
permission_mode="default",
allowed_tools=["Read", "Grep", "Glob"], # These run without asking
# Edit, Write, Bash will trigger approval prompt
),
):
pass
# --- Mode 2: "acceptEdits" (Developer mode — auto-approves file edits) ---
# Auto-approves file edits + common filesystem commands (mkdir, touch, mv, cp)
# Other Bash commands still follow allow rules
async def developer_agent():
async for msg in query(
prompt="Refactor auth module to use JWT",
options=ClaudeAgentOptions(
permission_mode="acceptEdits",
allowed_tools=["Read", "Write", "Edit", "Glob", "Grep", "Bash"],
),
):
pass
# --- Mode 3: "plan" (Read-only exploration — never edits) ---
# Only read-only tools run. Claude explores and produces a plan.
async def planning_agent():
async for msg in query(
prompt="Analyze the codebase and propose a microservices split",
options=ClaudeAgentOptions(
permission_mode="plan",
allowed_tools=["Read", "Grep", "Glob"],
),
):
pass
# --- Mode 4: "bypassPermissions" (CI/CD — no prompts, full autonomy) ---
# Auto-approves all tool uses that reach the mode check. Use ONLY in isolated environments.
# Cannot be used when running as root on Unix.
async def ci_agent():
async for msg in query(
prompt="Run full test suite and generate coverage report",
options=ClaudeAgentOptions(
permission_mode="bypassPermissions",
allowed_tools=["Read", "Bash", "Grep", "Glob"],
disallowed_tools=["Write", "Edit"], # Still enforce deny rules
),
):
pass
# --- Mode 5: "dontAsk" (Strict — never prompts, denies unknowns) ---
# Pre-approved tools run; everything else is silently denied (no prompt).
async def strict_agent():
async for msg in query(
prompt="Search for TODOs in the codebase",
options=ClaudeAgentOptions(
permission_mode="dontAsk",
allowed_tools=["Read", "Grep", "Glob"], # Only these can run
),
):
pass
print("Permission modes: default | acceptEdits | plan | dontAsk | bypassPermissions")
print("Choose based on deployment: interactive → dev → CI → production")
| Mode | Behavior | Best For |
|---|---|---|
default | Prompts for unapproved tools | Interactive apps, user-facing agents |
acceptEdits | Auto-approves file edits + mkdir/touch/mv/cp | Developer tools, code agents on dev machines |
plan | Read-only; produces plan without editing | Architecture analysis, pre-review exploration |
dontAsk | Never prompts; denies unlisted tools | Strict production agents with known tool sets |
bypassPermissions | Runs all tools without asking unless blocked earlier by deny rules or hooks | CI/CD, containers, isolated sandboxes |
6.1 Hardened Container Pattern
For self-hosted deployments, the docs recommend isolating the SDK subprocess inside a sandboxed container and routing outbound traffic through a proxy. The hardening controls live in your container runtime and network layer, not in a Claude-managed polling API.
# Hardened self-hosted container
docker run \
--cap-drop ALL \
--security-opt no-new-privileges \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
--tmpfs /workspace:rw,noexec,size=500m \
--network none \
--memory 2g \
--cpus 2 \
--user 1000:1000 \
-v /path/to/code:/workspace:ro \
-v /var/run/proxy.sock:/var/run/proxy.sock:ro \
agent-image
flowchart TD
Client[User or API caller] --> App[Your hosting app]
App --> SDK[Agent SDK query]
SDK --> Claude[claude subprocess]
Claude --> Disk[Local transcript + working directory]
Claude --> Proxy[Egress proxy]
Proxy --> External[Allowed external services]
# Give each hosted session its own working directory
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def hosted_session(prompt: str, session_id: str):
async for _ in query(
prompt=prompt,
options=ClaudeAgentOptions(
cwd=f"/workspace/{session_id}",
max_turns=20,
),
):
pass
print("Each active session should get its own cwd and local transcript path")
print("Use SessionStore when transcripts must survive host restarts or re-scheduling")
6.2 Claude on Cloud Platforms (CCA 6.3)
import json
# Claude on AWS (Bedrock) — access via AWS IAM, no Anthropic API key needed
# Claude on GCP (Vertex AI) — access via Google Cloud IAM
# AWS Bedrock Integration
aws_config = {
"provider": "aws_bedrock",
"region": "us-east-1",
"model_id": "anthropic.claude-sonnet-4-6-20250514-v1:0",
"auth": "IAM role (no API key needed)",
"cross_account": "Use resource-based policies or assume-role",
"pricing": "Same per-token, billed through AWS"
}
# GCP Vertex AI Integration
gcp_config = {
"provider": "vertex_ai",
"region": "us-central1",
"model_id": "claude-sonnet-4-6@20250514",
"auth": "Service account or Workload Identity",
"pricing": "Same per-token, billed through GCP"
}
# When to use cloud provider vs direct API:
# Direct API: fastest new model access, all features immediately
# AWS Bedrock: existing AWS infra, IAM governance, data stays in AWS
# GCP Vertex: existing GCP infra, Google IAM, data stays in GCP
# Cross-account access (AWS):
# 1. Create IAM role in account with Bedrock access
# 2. Allow assume-role from agent's execution account
# 3. Agent assumes role to call Bedrock (no API keys stored)
# Key differences:
# - Feature availability: Direct API gets features first (2-4 weeks before cloud providers)
# - Data residency: Cloud providers keep data in your chosen region
# - Governance: Cloud IAM integrates with existing org policies
# - Billing: consolidated with other cloud services
print("Direct API: All features, fastest updates, Anthropic billing")
print("AWS Bedrock: IAM auth, AWS billing, regional data residency")
print("GCP Vertex AI: Google IAM, GCP billing, regional data residency")
cwd is important for isolation and correct session resume behavior. (3) Use SessionStore to mirror transcripts across hosts. (4) AWS Bedrock uses IAM roles instead of an Anthropic API key. (5) Harden self-hosted deployments with container isolation, network controls, and an egress proxy.