OpenAI SDK Track Part 3: Structured Outputs & Code Generation

                        
                        What You’ll Learn: Structured outputs guarantee that OpenAI’s response follows a specific JSON schema — no more parsing free-form text and hoping it’s valid JSON. This is essential for production systems where downstream code needs reliable data. Think of it like the difference between asking someone to ‘describe the weather’ (free text) and having them fill out a form with specific fields (temperature, humidity, conditions).
                    

1. JSON Mode

JSON mode is the lightest-weight way to move from freeform text to machine-readable output. It is useful for prototypes and simple integrations, but you should think of it as valid JSON output rather than strict typed output.

from openai import OpenAI

client = OpenAI()

# JSON mode: model MUST output valid JSON (but no schema enforcement)
response = client.responses.create(
    model="gpt-4.1-mini",
    input="List 3 programming languages with their year of creation. Return as JSON array.",
    text={"format": {"type": "json_object"}},
)

import json
data = json.loads(response.output_text)
print(json.dumps(data, indent=2))

                        
                        JSON Mode vs Structured Outputs: JSON mode guarantees valid JSON but doesn’t enforce a specific schema. Structured Outputs (json_schema) guarantees both valid JSON and conformance to your exact schema. Use Structured Outputs for production applications.
                    

2. Structured Outputs (json_schema)

Structured outputs are the production upgrade. By specifying an exact schema, you stop arguing with the model about output shape and turn the interaction into a contract your downstream systems can trust.

from openai import OpenAI

client = OpenAI()

# Define exact output schema — model MUST conform
schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "programming_languages",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "languages": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "year": {"type": "integer"},
                            "paradigm": {"type": "string"},
                            "typed": {"type": "boolean"},
                        },
                        "required": ["name", "year", "paradigm", "typed"],
                        "additionalProperties": False,
                    },
                },
            },
            "required": ["languages"],
            "additionalProperties": False,
        },
    },
}

response = client.responses.create(
    model="gpt-4.1-mini",
    input="List 5 popular programming languages with their creation year, primary paradigm, and whether they are statically typed.",
    text={"format": schema},
)

import json
result = json.loads(response.output_text)
for lang in result["languages"]:
    typed = "typed" if lang["typed"] else "dynamic"
    print(f"  {lang['name']} ({lang['year']}) - {lang['paradigm']}, {typed}")

Real-World Application

Automated Insurance Claims Processing

An insurance company uses structured outputs to extract claim data from customer descriptions: damage_type, estimated_cost, date_of_incident, witnesses, and policy_number. The structured guarantee means their claims pipeline never crashes on malformed data. Result: 95% of claims are auto-routed without human intervention.

InsuranceData Extraction

3. Pydantic Integration

Pydantic is often the most ergonomic way to express that contract in Python. It keeps the schema close to your application types and gives you validation, enums, optional fields, and post-processing hooks without hand-writing JSON Schema every time.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

# Define your output type with Pydantic
class MovieReview(BaseModel):
    title: str
    year: int
    rating: float
    genre: str
    summary: str
    pros: list[str]
    cons: list[str]

# Use parse method for automatic schema generation and validation
response = client.responses.parse(
    model="gpt-4.1-mini",
    input="Review the movie 'Inception' (2010) by Christopher Nolan.",
    text_format=MovieReview,
)

# response.output_parsed is a validated Pydantic object
review = response.output_parsed[0]
print(f"{review.title} ({review.year}) - {review.rating}/10")
print(f"Genre: {review.genre}")
print(f"Summary: {review.summary}")
print(f"Pros: {', '.join(review.pros)}")
print(f"Cons: {', '.join(review.cons)}")

The second example shows why typed extraction is valuable in real workflows: you are not just validating syntax, you are transforming messy natural language into data structures your application can route, store, and audit.

from pydantic import BaseModel
from openai import OpenAI
from enum import Enum

client = OpenAI()

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

class TaskExtraction(BaseModel):
    tasks: list["Task"]

class Task(BaseModel):
    title: str
    priority: Priority
    assignee: str | None
    deadline: str | None
    dependencies: list[str]

# Extract structured tasks from unstructured text
email = """
Hey team, we need to ship the authentication module by Friday.
Sarah should handle the OAuth integration (critical priority) and
Mike needs to write the unit tests (medium) after Sarah's done.
Also, someone should update the docs (low priority, no deadline).
"""

response = client.responses.parse(
    model="gpt-4.1",
    input=f"Extract all tasks from this email:\n\n{email}",
    text_format=TaskExtraction,
)

extraction = response.output_parsed[0]
for task in extraction.tasks:
    print(f"[{task.priority.value.upper()}] {task.title}")
    print(f"   Assignee: {task.assignee or 'Unassigned'}")
    print(f"   Deadline: {task.deadline or 'None'}")
    print(f"   Dependencies: {task.dependencies or 'None'}")
    print()

4. Code Generation Patterns

Code generation becomes far more reliable when the model must return code, metadata, and explanation as separate typed fields. That forces your pipeline to distinguish executable output from commentary instead of scraping one big text blob.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class GeneratedCode(BaseModel):
    language: str
    filename: str
    code: str
    explanation: str
    dependencies: list[str]

response = client.responses.parse(
    model="gpt-4.1",
    input="""Write a Python FastAPI endpoint that:
    1. Accepts a POST request with a JSON body containing 'text' and 'language'
    2. Translates the text using OpenAI
    3. Returns the translated text with metadata""",
    text_format=GeneratedCode,
)

result = response.output_parsed[0]
print(f"# {result.filename} ({result.language})")
print(f"# Dependencies: {', '.join(result.dependencies)}")
print(result.code)
print(f"\n# Explanation: {result.explanation}")

5. Data Extraction Pipelines

Extraction is where structured outputs typically pay for themselves first. Emails, documents, tickets, and notes are messy for humans but ideal for schema-based parsing because the downstream system usually knows exactly what fields it wants.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ContactInfo(BaseModel):
    name: str
    email: str | None
    phone: str | None
    company: str | None
    role: str | None

class ExtractedContacts(BaseModel):
    contacts: list[ContactInfo]

# Batch extraction from multiple documents
documents = [
    "Hi, I'm John Smith from Acme Corp. Reach me at john@acme.com or 555-0123.",
    "Dr. Sarah Chen, Lead Researcher at MIT AI Lab. Email: schen@mit.edu",
    "Meeting with Bob (CFO) scheduled. No contact info provided.",
]

for i, doc in enumerate(documents):
    response = client.responses.parse(
        model="gpt-4.1-mini",
        input=f"Extract contact information:\n\n{doc}",
        text_format=ExtractedContacts,
    )
    for contact in response.output_parsed[0].contacts:
        print(f"  {contact.name} | {contact.company} | {contact.email} | {contact.phone}")

6. Validation & Error Handling

Validation is the final guardrail. Even if the model matches the schema, you still want business-level checks for allowed values, score ranges, and domain-specific rules so bad data is rejected before it contaminates the rest of the system.

from pydantic import BaseModel, field_validator
from openai import OpenAI

client = OpenAI()

class SentimentResult(BaseModel):
    text: str
    sentiment: str
    confidence: float
    keywords: list[str]

    @field_validator("sentiment")
    @classmethod
    def validate_sentiment(cls, v):
        allowed = {"positive", "negative", "neutral", "mixed"}
        if v.lower() not in allowed:
            raise ValueError(f"Sentiment must be one of {allowed}")
        return v.lower()

    @field_validator("confidence")
    @classmethod
    def validate_confidence(cls, v):
        if not 0.0 <= v <= 1.0:
            raise ValueError("Confidence must be between 0.0 and 1.0")
        return v

# Structured output with validation
response = client.responses.parse(
    model="gpt-4.1-mini",
    input='Analyze sentiment: "The product is great but shipping was terrible."',
    text_format=SentimentResult,
)

result = response.output_parsed[0]
print(f"Sentiment: {result.sentiment} ({result.confidence:.0%} confident)")
print(f"Keywords: {', '.join(result.keywords)}")

                        
                        Try It Yourself: Create a ‘meeting notes processor’ that takes raw meeting transcript text and outputs structured JSON with: title, date, attendees (array), action_items (array with owner, description, due_date), decisions (array), and next_meeting_date. Define the schema using Pydantic, use response_format to enforce it, and test with 3 sample transcripts.
                    

Next in the SDK Track

In OA Part 4: Function Calling & Tools, we’ll define tool schemas, implement parallel function calling, and build production-grade tool orchestration patterns.

OpenAI SDK Track Part 3: Structured Outputs & Code Generation

Table of Contents