Back to AI App Dev Series

PydanticAI SDK Track Part 3: Messages, Chat History & Model Requests

May 24, 2026 Wasil Zafar 35 min read

Manage conversation state with typed message objects, implement multi-turn chat with message history, make direct model requests for low-level control, and build stateful conversational agents.

Table of Contents

  1. Message Types
  2. Chat History Management
  3. Direct Model Requests
  4. Streaming & Message Events
  5. Conversation Patterns
What You’ll Learn: Dependency injection in PydanticAI is how you give your agents access to databases, APIs, and services — without hardcoding connections or making agents untestable. Think of it like dependency injection in FastAPI: you declare what your agent needs, and the framework provides it at runtime. During tests, you swap in mocks. In production, you inject real services.

1. Message Types

1.1 ModelRequest & ModelResponse

PydanticAI uses a typed message system to represent all interactions. Every agent run produces a sequence of ModelRequest and ModelResponse objects that capture the full conversation:

from pydantic_ai import Agent
from pydantic_ai.messages import (
    ModelRequest,
    ModelResponse,
    SystemPromptPart,
    UserPromptPart,
    TextPart,
)

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You are a helpful assistant."
)

result = agent.run_sync("What is Python?")

# Inspect the message history
for msg in result.all_messages():
    print(f"Kind: {msg.kind}")
    print(f"  Parts: {msg.parts}")
    print()

# Messages alternate: ModelRequest → ModelResponse → ModelRequest → ...
# First message is always a ModelRequest containing system + user prompts
# Response is a ModelResponse containing the model's text output

1.2 Message Parts

Each message contains typed “parts” representing different content types within that message:

from pydantic_ai import Agent
from pydantic_ai.messages import (
    ModelRequest,
    ModelResponse,
    SystemPromptPart,
    UserPromptPart,
    TextPart,
    ToolCallPart,
    ToolReturnPart,
)

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You help with math."
)

@agent.tool_plain
def multiply(a: int, b: int) -> int:
    """Multiply two numbers together."""
    return a * b

result = agent.run_sync("What is 7 times 8?")

# Walk through all messages to see the full interaction
for i, msg in enumerate(result.all_messages()):
    print(f"\n--- Message {i} ({msg.kind}) ---")
    for part in msg.parts:
        if isinstance(part, SystemPromptPart):
            print(f"  [System]: {part.content}")
        elif isinstance(part, UserPromptPart):
            print(f"  [User]: {part.content}")
        elif isinstance(part, TextPart):
            print(f"  [Text]: {part.content[:100]}...")
        elif isinstance(part, ToolCallPart):
            print(f"  [ToolCall]: {part.tool_name}({part.args})")
        elif isinstance(part, ToolReturnPart):
            print(f"  [ToolReturn]: {part.content}")

print(f"\nFinal answer: {result.data}")
Serialization: All message types are Pydantic models themselves, so you can serialize them to JSON with msg.model_dump_json() for storage and deserialize with ModelRequest.model_validate_json(data). This enables conversation persistence in databases.

2. Chat History Management

2.1 Passing Message History

To continue a conversation across multiple agent runs, pass the message history from a previous result:

from pydantic_ai import Agent

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You are a helpful assistant. Remember context from earlier messages."
)

# First turn
result1 = agent.run_sync("My name is Alice and I work as a data scientist.")
print(f"Turn 1: {result1.data}")

# Second turn — pass history from first turn
result2 = agent.run_sync(
    "What tools do you recommend for my job?",
    message_history=result1.all_messages()
)
print(f"Turn 2: {result2.data}")

# Third turn — chain history forward
result3 = agent.run_sync(
    "What was my name again?",
    message_history=result2.all_messages()
)
print(f"Turn 3: {result3.data}")
# Output should reference "Alice" from the first message

2.2 Continuing Conversations Across Runs

For long-running chat applications, accumulate messages across multiple interactions:

from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You are a patient tutor. Build on previous explanations."
)

# Simulate a multi-turn conversation
conversation_history = []

questions = [
    "What is a variable in programming?",
    "Can you give me a Python example?",
    "How is that different from a constant?",
]

for question in questions:
    result = agent.run_sync(question, message_history=conversation_history)
    # Update history with the full conversation so far
    conversation_history = result.all_messages()
    print(f"Q: {question}")
    print(f"A: {result.data[:150]}...")
    print()

print(f"Total messages in history: {len(conversation_history)}")
History vs New Messages: When you pass message_history, PydanticAI prepends that history before the new user message. The system prompt is included in the first request message. The model sees the full context to maintain coherence.

3. Direct Model Requests

3.1 Raw API Access

Sometimes you need low-level access to the model without agent orchestration. PydanticAI lets you use models directly:

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.messages import (
    ModelRequest,
    ModelResponse,
    SystemPromptPart,
    UserPromptPart,
)

# Create a model instance directly
model = OpenAIModel("gpt-4o")

# Build request messages manually
messages = [
    ModelRequest(parts=[
        SystemPromptPart(content="You are a translation assistant."),
        UserPromptPart(content="Translate 'Hello, how are you?' to French."),
    ])
]

# Make a direct request (async)
import asyncio

async def direct_call():
    response = await model.request(messages)
    # response is a ModelResponse with parts
    for part in response.parts:
        print(f"Part type: {type(part).__name__}")
        if hasattr(part, 'content'):
            print(f"Content: {part.content}")
    return response

result = asyncio.run(direct_call())

3.2 When to Use Direct vs Agent-Mediated Requests

Choose the right abstraction level based on your needs:

from pydantic_ai import Agent

# Use Agent when you need:
# - Structured output validation
# - Tool/function calling
# - Dependency injection
# - Automatic retries on validation failure
# - System prompt management

structured_agent = Agent(
    "openai:gpt-4o",
    result_type=dict,
    system_prompt="Extract key-value pairs from text."
)

# Use direct model requests when you need:
# - Raw text generation without validation
# - Custom message formatting
# - Bypassing agent orchestration overhead
# - Batch processing with minimal overhead
# - Implementing custom orchestration logic

# Example: Simple completion without agent overhead
simple_agent = Agent("openai:gpt-4o")
result = simple_agent.run_sync("Translate 'good morning' to Japanese.")
print(result.data)
Recommendation: Prefer agents over direct model requests in most cases. Agents provide retry logic, structured validation, and consistent error handling. Use direct requests only when you need fine-grained control over the request/response cycle or are building custom orchestration.

4. Streaming & Message Events

Real-World Application

Multi-Tenant SaaS Agent

A B2B platform deploys the same agent code for 100+ tenants. Dependencies inject the correct database, API keys, and configuration per tenant. Testing uses mock dependencies for fast unit tests. Result: same codebase serves all tenants with tenant-specific behavior through dependency injection alone.

Multi-TenantDependency Injection

4.1 Text Streaming

For real-time output display, use run_stream() to get tokens as they arrive:

import asyncio
from pydantic_ai import Agent

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You are a storyteller. Write engaging short stories."
)

async def stream_story():
    async with agent.run_stream("Write a 3-paragraph story about a robot learning to paint.") as response:
        # Stream text chunks as they arrive
        async for chunk in response.stream_text():
            print(chunk, end="", flush=True)

    print("\n\n--- Stream complete ---")
    # After streaming, you can still access the full result
    final_result = response.get_data()
    print(f"Total length: {len(final_result)} characters")

asyncio.run(stream_story())

4.2 Structured Streaming

Even with structured output types, you can stream partial results as the model generates them:

import asyncio
from pydantic_ai import Agent
from pydantic import BaseModel

class ArticleSummary(BaseModel):
    title: str
    key_points: list[str]
    word_count: int
    reading_time_minutes: int

agent = Agent(
    "openai:gpt-4o",
    result_type=ArticleSummary,
    system_prompt="Summarize articles into structured format."
)

async def stream_structured():
    article_text = """
    Artificial intelligence has transformed software development in recent years.
    From code completion to automated testing, AI tools now assist developers
    at every stage of the development lifecycle. Key advances include large
    language models for code generation, AI-powered code review systems, and
    intelligent debugging assistants that can identify root causes of complex bugs.
    """

    async with agent.run_stream(f"Summarize this article:\n{article_text}") as response:
        # Stream text deltas (raw model output before validation)
        async for text in response.stream_text(delta=True):
            print(text, end="", flush=True)

    print("\n\n--- Validated Result ---")
    result = response.get_data()
    print(f"Title: {result.title}")
    print(f"Key points: {result.key_points}")
    print(f"Word count: {result.word_count}")

asyncio.run(stream_structured())
Stream + Validate: When streaming structured output, PydanticAI accumulates the full response, then validates it against your result_type once complete. If validation fails, the stream result will raise an error when you call get_data(). Use stream_text(delta=True) for incremental chunks or stream_text() for accumulated text.

5. Conversation Patterns

5.1 Stateful Chat Agents

Build a reusable chat class that maintains conversation state across interactions:

from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse

class ChatSession:
    """Manages a multi-turn conversation with history."""

    def __init__(self, model: str = "openai:gpt-4o", system_prompt: str = "You are helpful."):
        self.agent = Agent(model, system_prompt=system_prompt)
        self.history: list[ModelRequest | ModelResponse] = []

    def send(self, message: str) -> str:
        """Send a message and get a response, maintaining history."""
        result = self.agent.run_sync(message, message_history=self.history)
        self.history = result.all_messages()
        return result.data

    def reset(self):
        """Clear conversation history."""
        self.history = []

    @property
    def turn_count(self) -> int:
        """Number of user turns in the conversation."""
        return len([m for m in self.history if m.kind == "request"]) - 1  # Subtract system

# Usage
chat = ChatSession(
    system_prompt="You are a Python tutor. Explain concepts simply with examples."
)

print(chat.send("What is a list comprehension?"))
print(f"\n[Turns: {chat.turn_count}]\n")

print(chat.send("Show me a more complex example with filtering."))
print(f"\n[Turns: {chat.turn_count}]\n")

print(chat.send("How does that compare to a generator expression?"))
print(f"\n[Turns: {chat.turn_count}]")

5.2 FastAPI Chat Endpoint

Integrate PydanticAI with FastAPI for a production chat API with session management:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.messages import ModelRequest, ModelResponse
from typing import Optional
import uuid

app = FastAPI()

# In-memory session store (use Redis/DB in production)
sessions: dict[str, list[ModelRequest | ModelResponse]] = {}

agent = Agent(
    "openai:gpt-4o",
    system_prompt="You are a helpful customer support agent."
)

class ChatRequest(BaseModel):
    message: str
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    response: str
    session_id: str
    turn_count: int

@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
    # Get or create session
    session_id = request.session_id or str(uuid.uuid4())
    history = sessions.get(session_id, [])

    # Run agent with history
    result = await agent.run(request.message, message_history=history)

    # Save updated history
    sessions[session_id] = result.all_messages()

    return ChatResponse(
        response=result.data,
        session_id=session_id,
        turn_count=len([m for m in sessions[session_id] if m.kind == "request"]) - 1
    )

@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
    if session_id in sessions:
        del sessions[session_id]
        return {"status": "cleared"}
    raise HTTPException(status_code=404, detail="Session not found")

# Run with: uvicorn main:app --reload
Production Considerations: The in-memory session store above is for demonstration only. In production: (1) Use Redis or a database for session persistence. (2) Add session expiration/TTL. (3) Implement message trimming when history exceeds the context window. (4) Add authentication and rate limiting per user.
Try It Yourself: Build an agent with 3 injected dependencies: (1) a database connection (use SQLite), (2) an HTTP client for external APIs, (3) a configuration object with feature flags. Write the agent, then write a test that mocks all 3 dependencies. Verify the agent works with both real and mocked dependencies.

Next in the PydanticAI SDK Track

In Part 4: Models & Multi-Provider Support, we’ll configure all 14 supported model providers — OpenAI, Anthropic, Google, AWS Bedrock, Groq, Ollama, and more. We’ll explore provider-specific features, fallback chains, and dynamic model switching.