Back to AI App Dev Series

PydanticAI SDK Track Part 4: Models & Multi-Provider Support

May 24, 2026 Wasil Zafar 40 min read

Configure all 14 supported model providers — OpenAI, Anthropic, Google, xAI, AWS Bedrock, Cerebras, Cohere, Groq, Hugging Face, Mistral, Ollama, OpenRouter, and Outlines. Provider-specific features, fallback chains, and model switching.

Table of Contents

  1. Provider Overview
  2. Major Providers
  3. Cloud & Specialized Providers
  4. Open-Source & Local Providers
  5. Model Switching & Fallbacks
What You’ll Learn: Tools are Python functions that your agent can call to interact with the outside world — querying databases, calling APIs, performing calculations. PydanticAI makes tools type-safe: parameters are validated, return types are checked, and the framework automatically generates the tool schema for the LLM. This article covers tool definition, context access, dynamic tools, and error handling patterns.

1. Provider Overview

1.1 Model-Agnostic Design

PydanticAI’s architecture decouples agent logic from model providers. You define agents once with tools, prompts, and output types, then swap models freely without changing application code:

from pydantic_ai import Agent
from pydantic import BaseModel

class TaskResult(BaseModel):
    answer: str
    confidence: float

# Define agent logic ONCE
agent = Agent(
    "openai:gpt-4o",  # Default model
    result_type=TaskResult,
    system_prompt="Answer questions with confidence scores."
)

# Same agent, different providers — no code changes needed
result_openai = agent.run_sync("What is recursion?")
print(f"OpenAI: {result_openai.data.answer[:50]}... ({result_openai.data.confidence})")

result_anthropic = agent.run_sync("What is recursion?", model="anthropic:claude-sonnet-4-20250514")
print(f"Anthropic: {result_anthropic.data.answer[:50]}... ({result_anthropic.data.confidence})")

result_google = agent.run_sync("What is recursion?", model="google:gemini-2.0-flash")
print(f"Google: {result_google.data.answer[:50]}... ({result_google.data.confidence})")

1.2 Model String Format

PydanticAI uses a simple provider:model-name string format to identify models. Install the relevant provider package first:

# Install provider-specific dependencies
pip install 'pydantic-ai[openai]'       # OpenAI models
pip install 'pydantic-ai[anthropic]'    # Anthropic models
pip install 'pydantic-ai[google]'       # Google Gemini models
pip install 'pydantic-ai[groq]'         # Groq inference
pip install 'pydantic-ai[mistral]'      # Mistral models
pip install 'pydantic-ai[cohere]'       # Cohere Command models
pip install 'pydantic-ai[bedrock]'      # AWS Bedrock

# Install all providers at once
pip install 'pydantic-ai[openai,anthropic,google,groq]'
Model String Convention: The format is always "provider:model-id". Examples: "openai:gpt-4o", "anthropic:claude-sonnet-4-20250514", "google:gemini-2.0-flash", "groq:llama-3.3-70b-versatile". Some providers also accept model objects for advanced configuration.

2. Major Providers (OpenAI, Anthropic, Google)

2.1 OpenAI

OpenAI is the most commonly used provider. Set the OPENAI_API_KEY environment variable and use model strings like openai:gpt-4o:

import os
from pydantic_ai import Agent

# Set API key (typically in .env or environment)
os.environ["OPENAI_API_KEY"] = "sk-..."

# Using model string shorthand
agent = Agent("openai:gpt-4o", system_prompt="You are helpful.")
result = agent.run_sync("What is the Pythagorean theorem?")
print(result.data)

# Available OpenAI models:
# "openai:gpt-4o"          — Flagship multimodal model
# "openai:gpt-4o-mini"     — Fast and affordable
# "openai:o1"              — Advanced reasoning
# "openai:o1-mini"         — Fast reasoning
# "openai:gpt-4-turbo"    — Previous generation

2.2 Anthropic

Anthropic’s Claude models excel at careful reasoning and long-context tasks. Set ANTHROPIC_API_KEY:

import os
from pydantic_ai import Agent

os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

# Anthropic model string format
agent = Agent(
    "anthropic:claude-sonnet-4-20250514",
    system_prompt="You are a careful, thoughtful assistant."
)
result = agent.run_sync("Explain the difference between concurrency and parallelism.")
print(result.data)

# Available Anthropic models:
# "anthropic:claude-sonnet-4-20250514"    — Best balance of intelligence and speed
# "anthropic:claude-opus-4-20250514"      — Most capable, complex reasoning
# "anthropic:claude-haiku-3-5-20241022"   — Fastest, most affordable

2.3 Google Gemini

Google’s Gemini models offer large context windows and multimodal capabilities. Set GOOGLE_API_KEY (or use Vertex AI):

import os
from pydantic_ai import Agent

os.environ["GOOGLE_API_KEY"] = "AI..."

# Google Gemini model string
agent = Agent(
    "google:gemini-2.0-flash",
    system_prompt="You are a knowledgeable assistant."
)
result = agent.run_sync("What are the key features of Gemini 2.0?")
print(result.data)

# Available Google models:
# "google:gemini-2.0-flash"         — Fast and capable
# "google:gemini-2.5-flash"         — Latest with thinking
# "google:gemini-2.5-pro"           — Most capable
# "google:gemini-1.5-flash"         — Previous generation, 1M context

2.4 xAI (Grok)

xAI’s Grok models are accessed through the OpenAI-compatible API. Set XAI_API_KEY:

import os
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

os.environ["XAI_API_KEY"] = "xai-..."

# xAI uses OpenAI-compatible API with custom base URL
model = OpenAIModel(
    "grok-2",
    base_url="https://api.x.ai/v1",
    api_key=os.environ["XAI_API_KEY"]
)

agent = Agent(model, system_prompt="You are Grok, a helpful AI assistant.")
result = agent.run_sync("What makes you different from other AI models?")
print(result.data)
API Key Security: Never hardcode API keys in source code. Use environment variables, .env files (added to .gitignore), or secret management services. The examples above show os.environ assignments for clarity only.

3. Cloud & Specialized Providers

3.1 AWS Bedrock

AWS Bedrock provides managed access to foundation models with IAM authentication. No API keys needed — uses your AWS credentials:

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockModel

# AWS Bedrock uses IAM credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# Or configured via AWS CLI: aws configure
model = BedrockModel(
    "anthropic.claude-3-5-sonnet-20241022-v2:0",
    region_name="us-east-1"
)

agent = Agent(model, system_prompt="You are a cloud architecture advisor.")
result = agent.run_sync("What's the best AWS service for real-time data streaming?")
print(result.data)

# Available Bedrock models:
# "anthropic.claude-3-5-sonnet-20241022-v2:0"
# "anthropic.claude-3-haiku-20240307-v1:0"
# "amazon.titan-text-express-v1"
# "meta.llama3-1-70b-instruct-v1:0"

3.2 Groq

Groq provides ultra-fast inference with custom LPU hardware. Set GROQ_API_KEY:

import os
from pydantic_ai import Agent

os.environ["GROQ_API_KEY"] = "gsk_..."

# Groq model string format
agent = Agent(
    "groq:llama-3.3-70b-versatile",
    system_prompt="You are a fast, helpful assistant."
)
result = agent.run_sync("What is the time complexity of merge sort?")
print(result.data)

# Available Groq models (ultra-fast inference):
# "groq:llama-3.3-70b-versatile"    — Best quality
# "groq:llama-3.1-8b-instant"       — Fastest
# "groq:mixtral-8x7b-32768"         — MoE architecture
# "groq:gemma2-9b-it"               — Google Gemma on Groq

3.3 Cohere & Cerebras

Additional specialized providers for enterprise and high-performance use cases:

import os
from pydantic_ai import Agent

# Cohere — enterprise-focused with RAG specialization
os.environ["COHERE_API_KEY"] = "..."
cohere_agent = Agent(
    "cohere:command-r-plus",
    system_prompt="You are an enterprise knowledge assistant."
)
result = cohere_agent.run_sync("What are best practices for RAG systems?")
print(f"Cohere: {result.data[:100]}...")

# Cerebras — wafer-scale AI hardware for fast inference
os.environ["CEREBRAS_API_KEY"] = "..."
cerebras_agent = Agent(
    "cerebras:llama3.1-70b",
    system_prompt="You are a helpful assistant."
)
result = cerebras_agent.run_sync("Explain neural network backpropagation.")
print(f"Cerebras: {result.data[:100]}...")

4. Open-Source & Local Providers

Real-World Application

Automated Inventory Management

A retail chain uses PydanticAI tools to let their AI assistant manage inventory: check_stock(product, location), reorder(product, quantity), transfer_stock(product, from_loc, to_loc). Type safety on the tool parameters prevents dangerous mistakes (negative quantities, invalid locations) that previously caused real inventory discrepancies.

RetailTool Safety

4.1 Ollama (Local Models)

Ollama runs models locally on your machine. No API keys needed — just install Ollama and pull a model:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1
ollama pull codellama
ollama pull mistral
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# Ollama exposes an OpenAI-compatible API on localhost
model = OpenAIModel(
    "llama3.1",
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Ollama doesn't need a real key
)

agent = Agent(model, system_prompt="You are a local AI assistant running on Ollama.")
result = agent.run_sync("What are the advantages of running models locally?")
print(result.data)

# Benefits of local models:
# - No API costs
# - No data leaves your machine (privacy)
# - No rate limits
# - Works offline
# - Full control over model selection

4.2 Mistral & Hugging Face

Mistral offers both API access and self-hosted options. Hugging Face provides the broadest model catalog:

import os
from pydantic_ai import Agent

# Mistral AI
os.environ["MISTRAL_API_KEY"] = "..."
mistral_agent = Agent(
    "mistral:mistral-large-latest",
    system_prompt="You are a multilingual assistant."
)
result = mistral_agent.run_sync("Explain transformers in machine learning.")
print(f"Mistral: {result.data[:100]}...")

# Available Mistral models:
# "mistral:mistral-large-latest"   — Most capable
# "mistral:mistral-medium-latest"  — Balanced
# "mistral:mistral-small-latest"   — Fast and efficient
# "mistral:codestral-latest"       — Code-specialized
import os
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# Hugging Face Inference API (OpenAI-compatible)
os.environ["HF_TOKEN"] = "hf_..."

model = OpenAIModel(
    "meta-llama/Llama-3.1-70B-Instruct",
    base_url="https://api-inference.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"]
)

agent = Agent(model, system_prompt="You are a helpful assistant.")
result = agent.run_sync("What is transfer learning?")
print(result.data)

4.3 OpenRouter & Outlines

OpenRouter provides a unified API to access 200+ models from multiple providers through a single endpoint:

import os
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# OpenRouter — single API for 200+ models
os.environ["OPENROUTER_API_KEY"] = "sk-or-..."

model = OpenAIModel(
    "anthropic/claude-sonnet-4-20250514",  # Any model on OpenRouter
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"]
)

agent = Agent(model, system_prompt="You are a helpful assistant.")
result = agent.run_sync("Compare REST and GraphQL APIs.")
print(result.data)

# OpenRouter advantages:
# - Single API key for all providers
# - Automatic fallbacks between providers
# - Usage-based billing across all models
# - Access to models not available directly
OpenAI-Compatible Pattern: Many providers (Ollama, xAI, Hugging Face, OpenRouter, Together AI, Fireworks) expose OpenAI-compatible APIs. For any such provider, use OpenAIModel(model_name, base_url="...", api_key="..."). This is PydanticAI’s escape hatch for unsupported providers.

5. Model Switching & Fallbacks

5.1 Runtime Model Override

Override the model at runtime without changing agent definition. This enables cost optimization and capability matching:

from pydantic_ai import Agent
from pydantic import BaseModel

class Answer(BaseModel):
    response: str
    model_used: str

agent = Agent(
    "openai:gpt-4o-mini",  # Default: cheap and fast
    result_type=Answer,
    system_prompt="Answer questions. Include which model you are in model_used."
)

# Simple question — use the cheap default model
simple_result = agent.run_sync("What is 2 + 2?")
print(f"Simple: {simple_result.data.response} (via {simple_result.data.model_used})")

# Complex question — override with a stronger model
complex_result = agent.run_sync(
    "Analyze the trade-offs between microservices and monolithic architectures "
    "for a startup with 5 engineers building a real-time trading platform.",
    model="anthropic:claude-sonnet-4-20250514"
)
print(f"Complex: {complex_result.data.response[:100]}... (via {complex_result.data.model_used})")

# Code generation — use a code-specialized model
code_result = agent.run_sync(
    "Write a Python function to find the longest common subsequence.",
    model="openai:gpt-4o"
)
print(f"Code: {code_result.data.response[:100]}... (via {code_result.data.model_used})")

5.2 Fallback Chains for Resilience

Build robust applications with fallback logic when primary providers are unavailable:

from pydantic_ai import Agent
from pydantic_ai.exceptions import ModelHTTPError
from pydantic import BaseModel

class QueryResult(BaseModel):
    answer: str
    provider: str

agent = Agent(
    "openai:gpt-4o",
    result_type=QueryResult,
    system_prompt="Answer the question. Set provider to the model name you are."
)

# Fallback chain: try providers in order
FALLBACK_MODELS = [
    "openai:gpt-4o",
    "anthropic:claude-sonnet-4-20250514",
    "google:gemini-2.0-flash",
    "groq:llama-3.3-70b-versatile",
]

async def query_with_fallback(prompt: str) -> QueryResult:
    """Try each provider in order until one succeeds."""
    last_error = None

    for model_string in FALLBACK_MODELS:
        try:
            result = await agent.run(prompt, model=model_string)
            return result.data
        except ModelHTTPError as e:
            last_error = e
            print(f"  Provider {model_string} failed: {e}")
            continue

    raise RuntimeError(f"All providers failed. Last error: {last_error}")

# Usage
import asyncio

async def main():
    result = await query_with_fallback("What is the speed of light?")
    print(f"Answer: {result.answer}")
    print(f"Served by: {result.provider}")

asyncio.run(main())
from pydantic_ai import Agent

# Model selection based on task complexity
def select_model(prompt: str) -> str:
    """Route to appropriate model based on prompt characteristics."""
    word_count = len(prompt.split())

    if word_count < 20:
        return "openai:gpt-4o-mini"     # Simple queries: fast and cheap
    elif word_count < 100:
        return "openai:gpt-4o"          # Medium complexity
    else:
        return "anthropic:claude-sonnet-4-20250514"  # Complex analysis: best reasoning

agent = Agent("openai:gpt-4o-mini", system_prompt="Be helpful and concise.")

# Dynamic routing
prompts = [
    "What is DNA?",
    "Explain the differences between SQL and NoSQL databases with examples.",
    "Design a complete microservices architecture for an e-commerce platform " * 5,
]

for prompt in prompts:
    model = select_model(prompt)
    result = agent.run_sync(prompt[:200], model=model)
    print(f"[{model}] {result.data[:80]}...")
    print()
Cost Optimization Strategy: Define your agent once with tools and output types. Then use model routing logic to send simple queries to cheap/fast models (GPT-4o-mini, Gemini Flash) and complex reasoning tasks to powerful models (Claude Opus, GPT-4o). This can reduce costs by 60–80% without degrading quality on complex tasks.
Try It Yourself: Build a ‘budget tracker’ agent with 4 tools: add_expense(amount, category, description), get_total(category=None), get_budget_remaining(category), and generate_report(). Store data in a list (injected dependency). Test a full conversation: add 5 expenses, check totals, and generate a summary report.

Next in the PydanticAI SDK Track

In Part 5: Function Tools & Toolsets, we’ll build powerful tool-using agents with @agent.tool, plain tools, toolsets for composition, retrieval-augmented generation patterns, and tool retry strategies for robust agent behavior.