Back to AI App Dev Series

OpenAI SDK Track Part 8: Agents SDK

May 22, 2026Wasil Zafar50 min read

Build intelligent agents with the OpenAI Agents SDK — define agents with instructions, tools, and handoffs. Master the Runner lifecycle and state transitions, multi-agent orchestration with delegation, input/output guardrails, tracing and observability, and voice agent integration.

Table of Contents

  1. Agent Fundamentals
  2. Defining Agents
  3. Runner Lifecycle
  4. Multi-Agent Handoffs
  5. Guardrails
  6. Tracing & Observability
  7. Apps SDK Relationship
  8. Sandbox Agents
  9. Voice Agents
  10. Agent Builder & ChatKit
What You’ll Learn: The OpenAI Agents SDK is a production framework for building multi-step, tool-using agents with guardrails, handoffs, and tracing built in. Instead of writing your own agentic loops, the SDK handles the orchestration — you define agents, tools, and guardrails, and the framework runs the execution loop. Think of it like the difference between writing a web server from scratch vs using Flask/FastAPI.

1. Agent Fundamentals

The OpenAI Agents SDK provides a lightweight, production-ready framework for building agentic applications. An Agent encapsulates instructions, a model, and tools; the Runner orchestrates execution loops; and handoffs enable multi-agent delegation.

Agent → Runner → Tools Flow
                flowchart LR
                    U[User Input] --> R[Runner]
                    R --> A[Agent]
                    A --> LLM[LLM Call]
                    LLM -->|tool_call| T[Tools]
                    T -->|result| LLM
                    LLM -->|handoff| A2[Agent 2]
                    LLM -->|final_output| O[Output]
                    A2 --> LLM2[LLM Call]
                    LLM2 --> O
            
from agents import Agent, Runner

agent = Agent(
    name="Research Assistant",
    instructions="You help users find and summarize information. Be concise and cite sources.",
    model="gpt-4.1",
)

result = Runner.run_sync(agent, "What are the key benefits of RAG systems?")
print(result.final_output)
Key Concept: The Agent is stateless — it defines what an agent can do. The Runner manages the execution loop, calling the LLM repeatedly until a final output is produced or a handoff occurs.

2. Defining Agents

Agents become powerful when equipped with tools. Use the @function_tool decorator to expose Python functions as tools the agent can call. The SDK automatically generates the JSON schema from type hints and docstrings.

from agents import Agent, Runner, function_tool

@function_tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    return f"Results for: {query} - [simulated web results]"

@function_tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))  # Use safe eval in production

agent = Agent(
    name="Smart Assistant",
    instructions="Help users with research and calculations. Use tools when needed.",
    model="gpt-4.1-mini",
    tools=[search_web, calculate],
)

result = Runner.run_sync(agent, "What is 15% of 847?")
print(result.final_output)
Real-World Application

AI-Powered Legal Research

A law firm built a multi-agent system: a ‘research agent’ searches case databases, an ‘analysis agent’ identifies relevant precedents, and a ‘drafting agent’ writes legal briefs. Guardrails ensure no hallucinated case citations, and handoffs route complex questions to senior attorneys.

Legal TechMulti-Agent

3. Runner Lifecycle

The Runner supports both synchronous and asynchronous execution. In async mode, you can stream partial results, inspect intermediate steps, and handle long-running agent workflows without blocking.

import asyncio
from agents import Agent, Runner

agent = Agent(name="Writer", instructions="Write creative content.", model="gpt-4.1-mini")

async def main():
    result = await Runner.run(agent, "Write a haiku about coding.")
    print(f"Output: {result.final_output}")
    print(f"Steps: {len(result.raw_responses)}")

asyncio.run(main())
Runner States: The Runner transitions through: pendingrunningtool_callingrunningcomplete. Each LLM call may produce tool calls (looping back) or a final output (completing the run).

4. Multi-Agent Handoffs

Handoffs let one agent delegate to another based on the conversation context. A triage agent can route queries to specialist agents, each with their own instructions, tools, and model configuration.

from agents import Agent, Runner

triage_agent = Agent(
    name="Triage",
    instructions="Route user queries to the appropriate specialist agent.",
    model="gpt-4.1-mini",
    handoffs=["technical_agent", "billing_agent"],
)

technical_agent = Agent(
    name="Technical Support",
    instructions="Help users with technical issues, bugs, and API questions.",
    model="gpt-4.1",
)

billing_agent = Agent(
    name="Billing Support",
    instructions="Help users with billing, invoices, and subscription questions.",
    model="gpt-4.1-mini",
)

triage_agent.handoffs = [technical_agent, billing_agent]

result = Runner.run_sync(triage_agent, "My API calls are returning 429 errors")
print(f"Handled by: {result.last_agent.name}")
print(f"Response: {result.final_output}")

5. Guardrails

Guardrails validate inputs and outputs before and after agent execution. Input guardrails can block harmful requests; output guardrails can filter sensitive data from responses. When a guardrail’s tripwire_triggered is True, execution halts immediately.

from agents import Agent, Runner, InputGuardrail, GuardrailFunctionOutput
from pydantic import BaseModel

class SafetyCheck(BaseModel):
    is_safe: bool
    reason: str

async def check_input(ctx, agent, input_text):
    """Block harmful or off-topic inputs."""
    if any(word in input_text.lower() for word in ["hack", "exploit", "bypass"]):
        return GuardrailFunctionOutput(
            output_info=SafetyCheck(is_safe=False, reason="Potentially harmful request"),
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(
        output_info=SafetyCheck(is_safe=True, reason="Input is safe"),
        tripwire_triggered=False,
    )

agent = Agent(
    name="Safe Assistant",
    instructions="Help users with legitimate questions only.",
    model="gpt-4.1-mini",
    input_guardrails=[InputGuardrail(guardrail_function=check_input)],
)

result = Runner.run_sync(agent, "How do I optimize my database queries?")
print(result.final_output)

6. Tracing & Observability

The Agents SDK includes built-in tracing that records every LLM call, tool invocation, and handoff. Wrap workflows in a trace() context to group related operations and attach metadata for debugging in the OpenAI Dashboard.

from agents import Agent, Runner, trace

agent = Agent(name="Analyst", instructions="Analyze data.", model="gpt-4.1-mini")

with trace("analysis-workflow") as t:
    t.set_attribute("user_id", "user-123")
    t.set_attribute("session_id", "sess-456")
    result = Runner.run_sync(agent, "Summarize Q4 revenue trends.")

print(f"Trace ID: {t.trace_id}")
print(f"Output: {result.final_output}")
# View traces in OpenAI Dashboard → Traces
Production Tip: Always attach user_id and session_id attributes to traces. This enables filtering and debugging specific user sessions in the OpenAI Dashboard’s Traces view.

7. How Agents SDK Relates to the Apps SDK

The Agents SDK and Apps SDK solve different but adjacent problems. The Agents SDK is code-first orchestration for agent workflows: handoffs, tools, tracing, approvals, and execution loops. The Apps SDK is the packaging and UX layer for shipping app experiences into ChatGPT using MCP servers, rich metadata, authentication, state management, and review-ready conversational interfaces.

That distinction matters when you choose where to invest effort. If you are building backend logic or internal copilots, start with the Agents SDK. If you want the same capabilities surfaced as a polished ChatGPT app, you will likely pair your agent logic with an Apps SDK-compatible MCP server and app metadata.

NeedAgents SDKApps SDK
Backend orchestrationRunner loops, handoffs, guardrails, tracingUsually not the primary layer
ChatGPT app distributionCan power the backend behaviorPrimary packaging and UX layer
MCP server integrationConsume MCP-backed tools inside agentsExpose MCP servers and UI to ChatGPT
App review readinessNot the focusMetadata, security, UX, submission flow
Practical roadmap: Build the agent behavior first, stabilize tools and traces, then wrap the workflow in an Apps SDK surface if you need a ChatGPT-native distribution channel, authenticated user sessions, or a reviewable app experience.

8. Sandbox Agents

Sandbox agents run in container-based environments where the agent has access to files, shell commands, packages, and persistent state across tool calls. This is the right choice when your agent needs to install dependencies, execute code, manipulate files, or work with project structures.

from agents import Agent, Runner, function_tool
from agents.extensions.sandbox import SandboxEnvironment

# Create an agent with a sandboxed shell environment
agent = Agent(
    name="CodeReviewer",
    instructions="""You are a code review assistant. When given code, you can:
    1. Save it to a file in the sandbox
    2. Run linters and type checkers
    3. Execute tests
    4. Report findings""",
    model="gpt-4.1",
    sandbox=SandboxEnvironment(
        image="python:3.12-slim",  # Container image
        packages=["ruff", "mypy", "pytest"],  # Pre-installed packages
    ),
)

result = Runner.run_sync(
    agent,
    "Review this Python function for issues: def add(a, b): return a + b",
)
print(result.final_output)
When to use sandboxes: File manipulation workflows (code generation, document editing), tool-heavy agents that need pip install or npm install, long-running computations that benefit from persistent state between tool calls, or any workflow where you want deterministic, isolated execution.

9. Voice Agents

Voice agents combine the Agents SDK with the Realtime API for speech-to-speech workflows. The agent loop runs tool calls and handoffs while maintaining a live voice session with the user. Agent Builder does not currently support voice workflows, so voice agents use the SDK path exclusively.

from agents import Agent, Runner, function_tool
from agents.voice import VoiceWorkflow

@function_tool
def check_order_status(order_id: str) -> str:
    """Look up an order's current shipping status."""
    return f"Order {order_id} shipped on May 20, arriving May 27."

voice_agent = Agent(
    name="VoiceSupport",
    instructions="You are a friendly voice support agent. Keep responses brief and conversational.",
    model="gpt-realtime-2",
    tools=[check_order_status],
)

# VoiceWorkflow handles the realtime session, VAD, interruptions, and tool execution
workflow = VoiceWorkflow(agent=voice_agent)
# In production, connect this to WebRTC or WebSocket audio streams
print(f"Voice agent ready: {voice_agent.name}")

10. Agent Builder & ChatKit

Agent Builder is OpenAI’s hosted visual workflow editor for creating agents without writing orchestration code. You define nodes, connect tools, set safety constraints, and publish directly. ChatKit provides the embedded UI for deploying published workflows as chat experiences. Use Agent Builder when you want rapid prototyping with a visual interface, or when non-engineers need to create agent workflows.

FeatureAgents SDK (Code)Agent Builder (Visual)
AuthoringPython/TypeScript codeVisual node editor
DeploymentYour infrastructureOpenAI-hosted
Voice supportFull (Realtime API)Not yet supported
Custom toolsFull control, any runtimeMCP servers, built-in tools
State managementApplication-controlledManaged conversations
UI embeddingBuild your ownChatKit widgets
Best forProduction systems, complex logicPrototyping, non-engineer workflows
Try It Yourself: Build a 3-agent customer support system using the Agents SDK: (1) a triage agent that classifies intent, (2) a billing agent that handles payment questions (with a guardrail blocking refunds > $100), (3) a technical agent that troubleshoots issues. Implement handoffs between agents and test with 5 different customer scenarios.

Next in the SDK Track

In OA Part 9: Realtime API, we’ll build voice-first agents with WebSocket streaming, real-time audio input/output, and event-driven interactions.