1. Image Input
PydanticAI supports multimodal inputs — you can pass images alongside text prompts for vision-capable models. Images can be provided via URL reference or inline as base64-encoded binary data.
1.1 ImageUrl & BinaryContent Types
Use ImageUrl for publicly-accessible images and BinaryContent for local files or dynamically generated images:
from pydantic_ai import Agent, ImageUrl
agent = Agent(
"openai:gpt-4o",
system_prompt="You are an image analysis assistant. Describe what you see in detail."
)
# Pass an image via URL
result = agent.run_sync([
"What's in this image? Describe the main elements.",
ImageUrl(url="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg"),
])
print(result.output)
For local files, read them as binary and pass with the appropriate MIME type:
from pydantic_ai import Agent, BinaryContent
from pathlib import Path
agent = Agent(
"openai:gpt-4o",
system_prompt="You analyze architectural diagrams and identify components."
)
# Read a local image file
image_path = Path("./diagrams/system-architecture.png")
image_bytes = image_path.read_bytes()
result = agent.run_sync([
"Analyze this system architecture diagram. List all components and their connections.",
BinaryContent(data=image_bytes, media_type="image/png"),
])
print(result.output)
1.2 Multi-Image Prompts
Pass multiple images in a single prompt for comparison or comprehensive analysis:
from pydantic_ai import Agent, ImageUrl
agent = Agent(
"openai:gpt-4o",
system_prompt="You compare images and identify differences."
)
# Compare two images
result = agent.run_sync([
"Compare these two UI mockups. What changed between version 1 and version 2?",
ImageUrl(url="https://example.com/mockup-v1.png"),
ImageUrl(url="https://example.com/mockup-v2.png"),
])
print(result.output)
BinaryContent for private/local files. (4) Check model-specific limits on image count and total size per request.
2. Audio, Video & Document Input
Beyond images, PydanticAI supports audio, video, and document inputs for models with those capabilities (Gemini, GPT-4o with audio):
from pydantic_ai import Agent, BinaryContent
from pathlib import Path
agent = Agent(
"gemini-2.0-flash",
system_prompt="You transcribe and analyze audio content."
)
# Process an audio file
audio_bytes = Path("./recordings/meeting.mp3").read_bytes()
result = agent.run_sync([
"Transcribe this audio and provide a summary of key points discussed.",
BinaryContent(data=audio_bytes, media_type="audio/mp3"),
])
print(result.output)
2.1 Video Understanding
from pydantic_ai import Agent, BinaryContent
from pathlib import Path
agent = Agent(
"gemini-2.0-flash",
system_prompt="You analyze video content and describe key scenes."
)
# Process a short video clip
video_bytes = Path("./clips/product-demo.mp4").read_bytes()
result = agent.run_sync([
"Describe what happens in this product demo video. "
"List the main features being demonstrated.",
BinaryContent(data=video_bytes, media_type="video/mp4"),
])
print(result.output)
2.2 PDF & Document Processing
from pydantic_ai import Agent, BinaryContent
from pydantic import BaseModel
from pathlib import Path
class InvoiceData(BaseModel):
"""Structured data extracted from an invoice."""
vendor_name: str
invoice_number: str
total_amount: float
due_date: str
line_items: list[str]
agent = Agent(
"openai:gpt-4o",
output_type=InvoiceData,
system_prompt="Extract structured data from invoice documents."
)
# Process a PDF invoice
pdf_bytes = Path("./documents/invoice-2026-001.pdf").read_bytes()
result = agent.run_sync([
"Extract all key information from this invoice.",
BinaryContent(data=pdf_bytes, media_type="application/pdf"),
])
print(f"Vendor: {result.output.vendor_name}")
print(f"Invoice #: {result.output.invoice_number}")
print(f"Total: ${result.output.total_amount:.2f}")
print(f"Due: {result.output.due_date}")
print(f"Items: {result.output.line_items}")
3. Thinking / Reasoning Mode
Thinking mode enables extended reasoning for models that support it (Claude with extended thinking, Gemini with thinking budget). The model spends additional “thinking tokens” reasoning about the problem before producing its final answer, significantly improving accuracy on complex tasks.
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
agent = Agent(
"anthropic:claude-sonnet-4-20250514",
system_prompt="You are a precise reasoning assistant.",
model_settings=ModelSettings(thinking=True),
)
result = agent.run_sync(
"A farmer has a fox, a chicken, and a bag of grain. "
"He needs to cross a river in a boat that can only carry him and one item at a time. "
"If left alone, the fox will eat the chicken, and the chicken will eat the grain. "
"How does he get everything across safely? Show your reasoning step by step."
)
print(result.output)
3.1 Accessing Thinking Tokens
You can inspect the thinking content to understand the model’s reasoning process — useful for debugging and evaluation:
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
agent = Agent(
"anthropic:claude-sonnet-4-20250514",
model_settings=ModelSettings(thinking=True),
)
result = agent.run_sync("What is 847 * 293? Show your work.")
# Access the final output
print(f"Answer: {result.output}")
# Inspect thinking content from message history
for message in result.all_messages():
if hasattr(message, "thinking"):
print(f"\n--- Model Thinking ---")
print(message.thinking[:500])
print("...")
3.2 Thinking Budget Control
CI/CD for AI Agents
A startup runs 200 agent tests in their CI pipeline (avg 3 seconds total, zero API costs). They mock models for unit tests, use a cheap model (GPT-4 mini) for integration tests, and run a nightly evaluation suite with the production model. Prompt regressions are caught before merge, and the test suite has prevented 12 production incidents in 6 months.
Control how much reasoning effort the model invests. Higher budgets improve accuracy but increase cost and latency:
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
# Minimal thinking — fast and cheap for simple tasks
quick_agent = Agent(
"anthropic:claude-sonnet-4-20250514",
model_settings=ModelSettings(thinking=True, thinking_budget=1024),
)
# Deep thinking — maximum reasoning for complex tasks
deep_agent = Agent(
"anthropic:claude-sonnet-4-20250514",
model_settings=ModelSettings(thinking=True, thinking_budget=16384),
)
# Simple question — minimal thinking is sufficient
result_quick = quick_agent.run_sync("What is the capital of Japan?")
print(f"Quick: {result_quick.output}")
# Complex question — deep thinking improves accuracy
result_deep = deep_agent.run_sync(
"Design a distributed consensus algorithm that handles Byzantine faults "
"with at most f faulty nodes in a network of 3f+1 total nodes. "
"Explain the message complexity."
)
print(f"Deep: {result_deep.output[:200]}...")
4. HTTP Request Retries
PydanticAI includes built-in retry handling for transient HTTP failures (rate limits, timeouts, server errors). Configure the retry strategy to match your reliability requirements:
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings
from httpx import Timeout
agent = Agent(
"openai:gpt-4o",
model_settings=ModelSettings(
timeout=Timeout(
connect=5.0, # Connection timeout
read=30.0, # Read timeout
write=10.0, # Write timeout
pool=5.0, # Pool timeout
),
max_retries=3, # Retry up to 3 times on transient errors
),
)
result = agent.run_sync("Generate a haiku about resilient systems.")
print(result.output)
4.1 Custom Retry Configuration
For fine-grained control over retry behavior, configure backoff strategy and retryable error codes:
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings, RetryConfig
agent = Agent(
"openai:gpt-4o",
model_settings=ModelSettings(
max_retries=5,
retry_config=RetryConfig(
initial_delay=1.0, # First retry after 1 second
max_delay=60.0, # Cap backoff at 60 seconds
backoff_factor=2.0, # Exponential: 1s, 2s, 4s, 8s, 16s
retryable_status_codes=[429, 500, 502, 503, 504],
),
),
)
# This will automatically retry on rate limits (429) with exponential backoff
result = agent.run_sync("Summarize the key principles of distributed systems.")
print(result.output)
5. Error Handling Patterns
Production agents must handle errors gracefully. PydanticAI raises specific exception types you can catch and respond to appropriately:
from pydantic_ai import Agent
from pydantic_ai.exceptions import (
ModelHTTPError,
UnexpectedModelBehavior,
AgentRunError,
)
import logging
logger = logging.getLogger(__name__)
agent = Agent("openai:gpt-4o", system_prompt="You are a helpful assistant.")
async def safe_agent_call(prompt: str) -> str:
"""Execute an agent call with comprehensive error handling."""
try:
result = await agent.run(prompt)
return result.output
except ModelHTTPError as e:
# Network/API errors (rate limits, server errors, auth failures)
logger.error(f"Model API error: {e.status_code} - {e.message}")
if e.status_code == 429:
return "I'm experiencing high demand. Please try again in a moment."
elif e.status_code == 401:
return "Authentication error. Please check API key configuration."
else:
return f"Service temporarily unavailable (HTTP {e.status_code})."
except UnexpectedModelBehavior as e:
# Model returned unexpected format or invalid response
logger.warning(f"Unexpected model behavior: {e}")
return "I encountered an unexpected response. Let me try a simpler approach."
except AgentRunError as e:
# Agent-level errors (max retries exceeded, tool failures)
logger.error(f"Agent run failed: {e}")
return "I wasn't able to complete that request. Please try rephrasing."
except Exception as e:
# Catch-all for unexpected errors
logger.exception(f"Unexpected error in agent call: {e}")
return "An unexpected error occurred. Our team has been notified."
# Usage
import asyncio
response = asyncio.run(safe_agent_call("What's the weather today?"))
print(response)
5.1 Graceful Degradation with Fallback Models
For critical applications, implement fallback to a secondary model when the primary fails:
from pydantic_ai import Agent
from pydantic_ai.exceptions import ModelHTTPError
import logging
logger = logging.getLogger(__name__)
# Primary: high-quality model
primary_agent = Agent("openai:gpt-4o", system_prompt="You are a helpful assistant.")
# Fallback: faster, more available model
fallback_agent = Agent("openai:gpt-4o-mini", system_prompt="You are a helpful assistant.")
async def resilient_call(prompt: str) -> str:
"""Try primary model, fall back to secondary on failure."""
try:
result = await primary_agent.run(prompt)
return result.output
except ModelHTTPError as e:
logger.warning(f"Primary model failed ({e.status_code}), falling back...")
try:
result = await fallback_agent.run(prompt)
return result.output
except ModelHTTPError as e2:
logger.error(f"Fallback also failed ({e2.status_code})")
return "All models are currently unavailable. Please try again later."
# Usage
import asyncio
response = asyncio.run(resilient_call("Explain microservices architecture briefly."))
print(response)
Next in the PydanticAI SDK Track
In Part 9: Multi-Agent Patterns & Testing, we’ll build multi-agent systems with delegation and handoff patterns, implement comprehensive testing strategies with TestModel, and deploy production-grade agent applications.