1. Message Types
1.1 ModelRequest & ModelResponse
PydanticAI uses a typed message system to represent all interactions. Every agent run produces a sequence of ModelRequest and ModelResponse objects that capture the full conversation:
from pydantic_ai import Agent
from pydantic_ai.messages import (
ModelRequest,
ModelResponse,
SystemPromptPart,
UserPromptPart,
TextPart,
)
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a helpful assistant."
)
result = agent.run_sync("What is Python?")
# Inspect the message history
for msg in result.all_messages():
print(f"Kind: {msg.kind}")
print(f" Parts: {msg.parts}")
print()
# Messages alternate: ModelRequest → ModelResponse → ModelRequest → ...
# First message is always a ModelRequest containing system + user prompts
# Response is a ModelResponse containing the model's text output
1.2 Message Parts
Each message contains typed “parts” representing different content types within that message:
from pydantic_ai import Agent
from pydantic_ai.messages import (
ModelRequest,
ModelResponse,
SystemPromptPart,
UserPromptPart,
TextPart,
ToolCallPart,
ToolReturnPart,
)
agent = Agent(
"openai:gpt-4o",
system_prompt="You help with math."
)
@agent.tool_plain
def multiply(a: int, b: int) -> int:
"""Multiply two numbers together."""
return a * b
result = agent.run_sync("What is 7 times 8?")
# Walk through all messages to see the full interaction
for i, msg in enumerate(result.all_messages()):
print(f"\n--- Message {i} ({msg.kind}) ---")
for part in msg.parts:
if isinstance(part, SystemPromptPart):
print(f" [System]: {part.content}")
elif isinstance(part, UserPromptPart):
print(f" [User]: {part.content}")
elif isinstance(part, TextPart):
print(f" [Text]: {part.content[:100]}...")
elif isinstance(part, ToolCallPart):
print(f" [ToolCall]: {part.tool_name}({part.args})")
elif isinstance(part, ToolReturnPart):
print(f" [ToolReturn]: {part.content}")
print(f"\nFinal answer: {result.data}")
msg.model_dump_json() for storage and deserialize with ModelRequest.model_validate_json(data). This enables conversation persistence in databases.
2. Chat History Management
2.1 Passing Message History
To continue a conversation across multiple agent runs, pass the message history from a previous result:
from pydantic_ai import Agent
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a helpful assistant. Remember context from earlier messages."
)
# First turn
result1 = agent.run_sync("My name is Alice and I work as a data scientist.")
print(f"Turn 1: {result1.data}")
# Second turn — pass history from first turn
result2 = agent.run_sync(
"What tools do you recommend for my job?",
message_history=result1.all_messages()
)
print(f"Turn 2: {result2.data}")
# Third turn — chain history forward
result3 = agent.run_sync(
"What was my name again?",
message_history=result2.all_messages()
)
print(f"Turn 3: {result3.data}")
# Output should reference "Alice" from the first message
2.2 Continuing Conversations Across Runs
For long-running chat applications, accumulate messages across multiple interactions:
from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a patient tutor. Build on previous explanations."
)
# Simulate a multi-turn conversation
conversation_history = []
questions = [
"What is a variable in programming?",
"Can you give me a Python example?",
"How is that different from a constant?",
]
for question in questions:
result = agent.run_sync(question, message_history=conversation_history)
# Update history with the full conversation so far
conversation_history = result.all_messages()
print(f"Q: {question}")
print(f"A: {result.data[:150]}...")
print()
print(f"Total messages in history: {len(conversation_history)}")
message_history, PydanticAI prepends that history before the new user message. The system prompt is included in the first request message. The model sees the full context to maintain coherence.
3. Direct Model Requests
3.1 Raw API Access
Sometimes you need low-level access to the model without agent orchestration. PydanticAI lets you use models directly:
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.messages import (
ModelRequest,
ModelResponse,
SystemPromptPart,
UserPromptPart,
)
# Create a model instance directly
model = OpenAIModel("gpt-4o")
# Build request messages manually
messages = [
ModelRequest(parts=[
SystemPromptPart(content="You are a translation assistant."),
UserPromptPart(content="Translate 'Hello, how are you?' to French."),
])
]
# Make a direct request (async)
import asyncio
async def direct_call():
response = await model.request(messages)
# response is a ModelResponse with parts
for part in response.parts:
print(f"Part type: {type(part).__name__}")
if hasattr(part, 'content'):
print(f"Content: {part.content}")
return response
result = asyncio.run(direct_call())
3.2 When to Use Direct vs Agent-Mediated Requests
Choose the right abstraction level based on your needs:
from pydantic_ai import Agent
# Use Agent when you need:
# - Structured output validation
# - Tool/function calling
# - Dependency injection
# - Automatic retries on validation failure
# - System prompt management
structured_agent = Agent(
"openai:gpt-4o",
result_type=dict,
system_prompt="Extract key-value pairs from text."
)
# Use direct model requests when you need:
# - Raw text generation without validation
# - Custom message formatting
# - Bypassing agent orchestration overhead
# - Batch processing with minimal overhead
# - Implementing custom orchestration logic
# Example: Simple completion without agent overhead
simple_agent = Agent("openai:gpt-4o")
result = simple_agent.run_sync("Translate 'good morning' to Japanese.")
print(result.data)
4. Streaming & Message Events
Multi-Tenant SaaS Agent
A B2B platform deploys the same agent code for 100+ tenants. Dependencies inject the correct database, API keys, and configuration per tenant. Testing uses mock dependencies for fast unit tests. Result: same codebase serves all tenants with tenant-specific behavior through dependency injection alone.
4.1 Text Streaming
For real-time output display, use run_stream() to get tokens as they arrive:
import asyncio
from pydantic_ai import Agent
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a storyteller. Write engaging short stories."
)
async def stream_story():
async with agent.run_stream("Write a 3-paragraph story about a robot learning to paint.") as response:
# Stream text chunks as they arrive
async for chunk in response.stream_text():
print(chunk, end="", flush=True)
print("\n\n--- Stream complete ---")
# After streaming, you can still access the full result
final_result = response.get_data()
print(f"Total length: {len(final_result)} characters")
asyncio.run(stream_story())
4.2 Structured Streaming
Even with structured output types, you can stream partial results as the model generates them:
import asyncio
from pydantic_ai import Agent
from pydantic import BaseModel
class ArticleSummary(BaseModel):
title: str
key_points: list[str]
word_count: int
reading_time_minutes: int
agent = Agent(
"openai:gpt-4o",
result_type=ArticleSummary,
system_prompt="Summarize articles into structured format."
)
async def stream_structured():
article_text = """
Artificial intelligence has transformed software development in recent years.
From code completion to automated testing, AI tools now assist developers
at every stage of the development lifecycle. Key advances include large
language models for code generation, AI-powered code review systems, and
intelligent debugging assistants that can identify root causes of complex bugs.
"""
async with agent.run_stream(f"Summarize this article:\n{article_text}") as response:
# Stream text deltas (raw model output before validation)
async for text in response.stream_text(delta=True):
print(text, end="", flush=True)
print("\n\n--- Validated Result ---")
result = response.get_data()
print(f"Title: {result.title}")
print(f"Key points: {result.key_points}")
print(f"Word count: {result.word_count}")
asyncio.run(stream_structured())
result_type once complete. If validation fails, the stream result will raise an error when you call get_data(). Use stream_text(delta=True) for incremental chunks or stream_text() for accumulated text.
5. Conversation Patterns
5.1 Stateful Chat Agents
Build a reusable chat class that maintains conversation state across interactions:
from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse
class ChatSession:
"""Manages a multi-turn conversation with history."""
def __init__(self, model: str = "openai:gpt-4o", system_prompt: str = "You are helpful."):
self.agent = Agent(model, system_prompt=system_prompt)
self.history: list[ModelRequest | ModelResponse] = []
def send(self, message: str) -> str:
"""Send a message and get a response, maintaining history."""
result = self.agent.run_sync(message, message_history=self.history)
self.history = result.all_messages()
return result.data
def reset(self):
"""Clear conversation history."""
self.history = []
@property
def turn_count(self) -> int:
"""Number of user turns in the conversation."""
return len([m for m in self.history if m.kind == "request"]) - 1 # Subtract system
# Usage
chat = ChatSession(
system_prompt="You are a Python tutor. Explain concepts simply with examples."
)
print(chat.send("What is a list comprehension?"))
print(f"\n[Turns: {chat.turn_count}]\n")
print(chat.send("Show me a more complex example with filtering."))
print(f"\n[Turns: {chat.turn_count}]\n")
print(chat.send("How does that compare to a generator expression?"))
print(f"\n[Turns: {chat.turn_count}]")
5.2 FastAPI Chat Endpoint
Integrate PydanticAI with FastAPI for a production chat API with session management:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse
from typing import Optional
import uuid
app = FastAPI()
# In-memory session store (use Redis/DB in production)
sessions: dict[str, list[ModelRequest | ModelResponse]] = {}
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a helpful customer support agent."
)
class ChatRequest(BaseModel):
message: str
session_id: Optional[str] = None
class ChatResponse(BaseModel):
response: str
session_id: str
turn_count: int
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
# Get or create session
session_id = request.session_id or str(uuid.uuid4())
history = sessions.get(session_id, [])
# Run agent with history
result = await agent.run(request.message, message_history=history)
# Save updated history
sessions[session_id] = result.all_messages()
return ChatResponse(
response=result.data,
session_id=session_id,
turn_count=len([m for m in sessions[session_id] if m.kind == "request"]) - 1
)
@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
if session_id in sessions:
del sessions[session_id]
return {"status": "cleared"}
raise HTTPException(status_code=404, detail="Session not found")
# Run with: uvicorn main:app --reload
Next in the PydanticAI SDK Track
In Part 4: Models & Multi-Provider Support, we’ll configure all 14 supported model providers — OpenAI, Anthropic, Google, AWS Bedrock, Groq, Ollama, and more. We’ll explore provider-specific features, fallback chains, and dynamic model switching.