1. Agent Fundamentals
The OpenAI Agents SDK provides a lightweight, production-ready framework for building agentic applications. An Agent encapsulates instructions, a model, and tools; the Runner orchestrates execution loops; and handoffs enable multi-agent delegation.
flowchart LR
U[User Input] --> R[Runner]
R --> A[Agent]
A --> LLM[LLM Call]
LLM -->|tool_call| T[Tools]
T -->|result| LLM
LLM -->|handoff| A2[Agent 2]
LLM -->|final_output| O[Output]
A2 --> LLM2[LLM Call]
LLM2 --> O
from agents import Agent, Runner
agent = Agent(
name="Research Assistant",
instructions="You help users find and summarize information. Be concise and cite sources.",
model="gpt-4.1",
)
result = Runner.run_sync(agent, "What are the key benefits of RAG systems?")
print(result.final_output)
2. Defining Agents
Agents become powerful when equipped with tools. Use the @function_tool decorator to expose Python functions as tools the agent can call. The SDK automatically generates the JSON schema from type hints and docstrings.
from agents import Agent, Runner, function_tool
@function_tool
def search_web(query: str) -> str:
"""Search the web for current information."""
return f"Results for: {query} - [simulated web results]"
@function_tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
return str(eval(expression)) # Use safe eval in production
agent = Agent(
name="Smart Assistant",
instructions="Help users with research and calculations. Use tools when needed.",
model="gpt-4.1-mini",
tools=[search_web, calculate],
)
result = Runner.run_sync(agent, "What is 15% of 847?")
print(result.final_output)
AI-Powered Legal Research
A law firm built a multi-agent system: a ‘research agent’ searches case databases, an ‘analysis agent’ identifies relevant precedents, and a ‘drafting agent’ writes legal briefs. Guardrails ensure no hallucinated case citations, and handoffs route complex questions to senior attorneys.
3. Runner Lifecycle
The Runner supports both synchronous and asynchronous execution. In async mode, you can stream partial results, inspect intermediate steps, and handle long-running agent workflows without blocking.
import asyncio
from agents import Agent, Runner
agent = Agent(name="Writer", instructions="Write creative content.", model="gpt-4.1-mini")
async def main():
result = await Runner.run(agent, "Write a haiku about coding.")
print(f"Output: {result.final_output}")
print(f"Steps: {len(result.raw_responses)}")
asyncio.run(main())
pending → running → tool_calling → running → complete. Each LLM call may produce tool calls (looping back) or a final output (completing the run).
4. Multi-Agent Handoffs
Handoffs let one agent delegate to another based on the conversation context. A triage agent can route queries to specialist agents, each with their own instructions, tools, and model configuration.
from agents import Agent, Runner
triage_agent = Agent(
name="Triage",
instructions="Route user queries to the appropriate specialist agent.",
model="gpt-4.1-mini",
handoffs=["technical_agent", "billing_agent"],
)
technical_agent = Agent(
name="Technical Support",
instructions="Help users with technical issues, bugs, and API questions.",
model="gpt-4.1",
)
billing_agent = Agent(
name="Billing Support",
instructions="Help users with billing, invoices, and subscription questions.",
model="gpt-4.1-mini",
)
triage_agent.handoffs = [technical_agent, billing_agent]
result = Runner.run_sync(triage_agent, "My API calls are returning 429 errors")
print(f"Handled by: {result.last_agent.name}")
print(f"Response: {result.final_output}")
5. Guardrails
Guardrails validate inputs and outputs before and after agent execution. Input guardrails can block harmful requests; output guardrails can filter sensitive data from responses. When a guardrail’s tripwire_triggered is True, execution halts immediately.
from agents import Agent, Runner, InputGuardrail, GuardrailFunctionOutput
from pydantic import BaseModel
class SafetyCheck(BaseModel):
is_safe: bool
reason: str
async def check_input(ctx, agent, input_text):
"""Block harmful or off-topic inputs."""
if any(word in input_text.lower() for word in ["hack", "exploit", "bypass"]):
return GuardrailFunctionOutput(
output_info=SafetyCheck(is_safe=False, reason="Potentially harmful request"),
tripwire_triggered=True,
)
return GuardrailFunctionOutput(
output_info=SafetyCheck(is_safe=True, reason="Input is safe"),
tripwire_triggered=False,
)
agent = Agent(
name="Safe Assistant",
instructions="Help users with legitimate questions only.",
model="gpt-4.1-mini",
input_guardrails=[InputGuardrail(guardrail_function=check_input)],
)
result = Runner.run_sync(agent, "How do I optimize my database queries?")
print(result.final_output)
6. Tracing & Observability
The Agents SDK includes built-in tracing that records every LLM call, tool invocation, and handoff. Wrap workflows in a trace() context to group related operations and attach metadata for debugging in the OpenAI Dashboard.
from agents import Agent, Runner, trace
agent = Agent(name="Analyst", instructions="Analyze data.", model="gpt-4.1-mini")
with trace("analysis-workflow") as t:
t.set_attribute("user_id", "user-123")
t.set_attribute("session_id", "sess-456")
result = Runner.run_sync(agent, "Summarize Q4 revenue trends.")
print(f"Trace ID: {t.trace_id}")
print(f"Output: {result.final_output}")
# View traces in OpenAI Dashboard → Traces
user_id and session_id attributes to traces. This enables filtering and debugging specific user sessions in the OpenAI Dashboard’s Traces view.
7. How Agents SDK Relates to the Apps SDK
The Agents SDK and Apps SDK solve different but adjacent problems. The Agents SDK is code-first orchestration for agent workflows: handoffs, tools, tracing, approvals, and execution loops. The Apps SDK is the packaging and UX layer for shipping app experiences into ChatGPT using MCP servers, rich metadata, authentication, state management, and review-ready conversational interfaces.
That distinction matters when you choose where to invest effort. If you are building backend logic or internal copilots, start with the Agents SDK. If you want the same capabilities surfaced as a polished ChatGPT app, you will likely pair your agent logic with an Apps SDK-compatible MCP server and app metadata.
| Need | Agents SDK | Apps SDK |
|---|---|---|
| Backend orchestration | Runner loops, handoffs, guardrails, tracing | Usually not the primary layer |
| ChatGPT app distribution | Can power the backend behavior | Primary packaging and UX layer |
| MCP server integration | Consume MCP-backed tools inside agents | Expose MCP servers and UI to ChatGPT |
| App review readiness | Not the focus | Metadata, security, UX, submission flow |
8. Sandbox Agents
Sandbox agents run in container-based environments where the agent has access to files, shell commands, packages, and persistent state across tool calls. This is the right choice when your agent needs to install dependencies, execute code, manipulate files, or work with project structures.
from agents import Agent, Runner, function_tool
from agents.extensions.sandbox import SandboxEnvironment
# Create an agent with a sandboxed shell environment
agent = Agent(
name="CodeReviewer",
instructions="""You are a code review assistant. When given code, you can:
1. Save it to a file in the sandbox
2. Run linters and type checkers
3. Execute tests
4. Report findings""",
model="gpt-4.1",
sandbox=SandboxEnvironment(
image="python:3.12-slim", # Container image
packages=["ruff", "mypy", "pytest"], # Pre-installed packages
),
)
result = Runner.run_sync(
agent,
"Review this Python function for issues: def add(a, b): return a + b",
)
print(result.final_output)
pip install or npm install, long-running computations that benefit from persistent state between tool calls, or any workflow where you want deterministic, isolated execution.
9. Voice Agents
Voice agents combine the Agents SDK with the Realtime API for speech-to-speech workflows. The agent loop runs tool calls and handoffs while maintaining a live voice session with the user. Agent Builder does not currently support voice workflows, so voice agents use the SDK path exclusively.
from agents import Agent, Runner, function_tool
from agents.voice import VoiceWorkflow
@function_tool
def check_order_status(order_id: str) -> str:
"""Look up an order's current shipping status."""
return f"Order {order_id} shipped on May 20, arriving May 27."
voice_agent = Agent(
name="VoiceSupport",
instructions="You are a friendly voice support agent. Keep responses brief and conversational.",
model="gpt-realtime-2",
tools=[check_order_status],
)
# VoiceWorkflow handles the realtime session, VAD, interruptions, and tool execution
workflow = VoiceWorkflow(agent=voice_agent)
# In production, connect this to WebRTC or WebSocket audio streams
print(f"Voice agent ready: {voice_agent.name}")
10. Agent Builder & ChatKit
Agent Builder is OpenAI’s hosted visual workflow editor for creating agents without writing orchestration code. You define nodes, connect tools, set safety constraints, and publish directly. ChatKit provides the embedded UI for deploying published workflows as chat experiences. Use Agent Builder when you want rapid prototyping with a visual interface, or when non-engineers need to create agent workflows.
| Feature | Agents SDK (Code) | Agent Builder (Visual) |
|---|---|---|
| Authoring | Python/TypeScript code | Visual node editor |
| Deployment | Your infrastructure | OpenAI-hosted |
| Voice support | Full (Realtime API) | Not yet supported |
| Custom tools | Full control, any runtime | MCP servers, built-in tools |
| State management | Application-controlled | Managed conversations |
| UI embedding | Build your own | ChatKit widgets |
| Best for | Production systems, complex logic | Prototyping, non-engineer workflows |
Next in the SDK Track
In OA Part 9: Realtime API, we’ll build voice-first agents with WebSocket streaming, real-time audio input/output, and event-driven interactions.