Anthropic SDK Track Part 19: Content Moderation & Legal Summarization

                        
                        What You’ll Learn: This article covers CCA Phase 1.4 (Content Moderation) and Phase 1.5 (Legal Summarization). You’ll build a multi-label content classifier with severity tiers, a confidence-based human review router, a legal document chunker with citation preservation, and a hierarchical summarizer — all using structured tool_use output and model cascading for production scale.
                    

1. Content Moderation System (CCA 1.4)

Content moderation at scale requires a system that classifies content across multiple violation categories simultaneously, assigns severity levels, and routes decisions appropriately. Claude’s tool_use feature provides the structured output format needed for reliable automation.

1.1 Severity Tiers & Classification Labels

A production moderation system operates on two axes: severity (how bad is it?) and category (what type of violation?). The CCA exam expects you to implement multi-label classification where a single piece of content can trigger multiple categories.

Severity	Action	Response Time	Example
Safe	Pass through	N/A	Normal user content
Borderline	Human review queue	4 hours	Edgy humor, ambiguous context
Violation	Auto-remove + notify	Immediate	Clear hate speech, explicit content
Critical	Auto-remove + escalate + legal	Immediate	CSAM, terrorism, imminent threats

Classification labels operate independently — content can be both harassment and hate_speech, or spam and misinformation. Each label carries its own confidence score.

1.2 Complete Moderation Classifier

This classifier uses tool_use with tool_choice: {"type": "tool", "name": "classify_content"} to force Claude to produce structured output for every input. The tool schema defines the exact fields needed for downstream routing.

import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional

# Initialize client
client = anthropic.Anthropic()

# Define the moderation classification tool
MODERATION_TOOL = {
    "name": "classify_content",
    "description": "Classify content for moderation. Assign severity and all applicable violation categories with confidence scores.",
    "input_schema": {
        "type": "object",
        "properties": {
            "severity": {
                "type": "string",
                "enum": ["safe", "borderline", "violation", "critical"],
                "description": "Overall severity level determining the action taken"
            },
            "categories": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "label": {
                            "type": "string",
                            "enum": ["harassment", "hate_speech", "spam", "explicit", "misinformation", "self_harm", "violence", "illegal_activity"]
                        },
                        "confidence": {
                            "type": "number",
                            "minimum": 0.0,
                            "maximum": 1.0,
                            "description": "Confidence score for this category (0.0 to 1.0)"
                        }
                    },
                    "required": ["label", "confidence"]
                },
                "description": "All applicable violation categories with confidence scores"
            },
            "explanation": {
                "type": "string",
                "description": "Brief explanation of the classification decision for audit logs"
            },
            "overall_confidence": {
                "type": "number",
                "minimum": 0.0,
                "maximum": 1.0,
                "description": "Overall confidence in the severity assessment"
            }
        },
        "required": ["severity", "categories", "explanation", "overall_confidence"]
    }
}

MODERATION_SYSTEM_PROMPT = """You are a content moderation classifier. Analyze the provided content and classify it using the classify_content tool.

Classification guidelines:
- safe: Content that clearly does not violate any policy
- borderline: Content that is ambiguous, could be interpreted as violating policy depending on context
- violation: Content that clearly violates policy and should be removed
- critical: Content involving imminent physical danger, CSAM, or terrorism that requires immediate escalation

For categories, assign ALL applicable labels with honest confidence scores:
- harassment: Targeted attacks, bullying, or intimidation of individuals
- hate_speech: Content attacking protected groups based on identity characteristics
- spam: Unsolicited commercial content, scams, or coordinated inauthentic behavior
- explicit: Sexually explicit content or graphic violence
- misinformation: Demonstrably false claims about health, elections, or safety
- self_harm: Content promoting or instructing self-harm or suicide
- violence: Threats or glorification of violence against individuals/groups
- illegal_activity: Content promoting or facilitating clearly illegal actions

Be calibrated: a 0.9 confidence means you are correct 90% of the time at that threshold."""


def classify_content(text: str, context: Optional[str] = None) -> dict:
    """Classify content using Claude with forced structured output."""
    user_message = f"Classify the following content for moderation:\n\n---\n{text}\n---"
    if context:
        user_message += f"\n\nAdditional context: {context}"

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=MODERATION_SYSTEM_PROMPT,
        tools=[MODERATION_TOOL],
        tool_choice={"type": "tool", "name": "classify_content"},
        messages=[{"role": "user", "content": user_message}]
    )

    # Extract the tool use result
    for block in response.content:
        if block.type == "tool_use":
            return block.input

    raise ValueError("No classification returned from model")


# Test with example content
test_cases = [
    "Hey everyone! Check out my new blog post about gardening tips.",
    "You're such a worthless piece of garbage, nobody likes you.",
    "Studies show that vaccine X causes autism in 100% of cases (completely fabricated).",
    "I'm going to find where you live and make you pay for what you said.",
]

for content in test_cases:
    result = classify_content(content)
    print(f"\nContent: {content[:60]}...")
    print(f"  Severity: {result['severity']}")
    print(f"  Categories: {[(c['label'], c['confidence']) for c in result['categories']]}")
    print(f"  Confidence: {result['overall_confidence']}")
    print(f"  Explanation: {result['explanation']}")

2. Human Review Routing

Not every moderation decision should be automated. A confidence-based routing system sends clear cases to auto-action while routing ambiguous content to human reviewers. This reduces false positives (wrongly removed content) while maintaining safety.

2.1 Confidence-Based Routing Logic

Confidence Range	Action	Rationale
≥ 0.95	Auto-action (remove/pass)	High confidence — classifier is reliable at this threshold
0.70 – 0.94	Human review queue	Uncertain — needs human judgment for context
< 0.70	Pass through (monitor)	Too uncertain to act — log for pattern analysis

The key insight: false positives destroy user trust faster than false negatives destroy platform safety. A user whose legitimate post is wrongly removed will leave. A borderline post that stays up for 4 hours while in review causes less damage than removing a legitimate post.

2.2 Complete Review Router

This router takes the classifier output and routes to one of three paths: auto-action, human review, or monitoring. It also handles policy gap detection — content that doesn’t fit existing categories but may represent emerging violation patterns.

import anthropic
import json
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Optional

# Initialize client
client = anthropic.Anthropic()


class RouteAction(Enum):
    AUTO_REMOVE = "auto_remove"
    AUTO_PASS = "auto_pass"
    HUMAN_REVIEW = "human_review"
    ESCALATE_LEGAL = "escalate_legal"
    MONITOR = "monitor"


@dataclass
class ModerationDecision:
    action: RouteAction
    severity: str
    categories: list
    overall_confidence: float
    explanation: str
    review_priority: Optional[str] = None  # low, medium, high, urgent
    policy_gap: bool = False
    feedback_id: Optional[str] = None


def route_moderation_decision(classification: dict) -> ModerationDecision:
    """Route a classification result to the appropriate action."""
    severity = classification["severity"]
    confidence = classification["overall_confidence"]
    categories = classification["categories"]

    # Critical severity always escalates immediately
    if severity == "critical":
        return ModerationDecision(
            action=RouteAction.ESCALATE_LEGAL,
            severity=severity,
            categories=categories,
            overall_confidence=confidence,
            explanation=classification["explanation"],
            review_priority="urgent"
        )

    # Safe content with high confidence passes through
    if severity == "safe" and confidence >= 0.95:
        return ModerationDecision(
            action=RouteAction.AUTO_PASS,
            severity=severity,
            categories=categories,
            overall_confidence=confidence,
            explanation=classification["explanation"]
        )

    # Violation with high confidence auto-removes
    if severity == "violation" and confidence >= 0.95:
        return ModerationDecision(
            action=RouteAction.AUTO_REMOVE,
            severity=severity,
            categories=categories,
            overall_confidence=confidence,
            explanation=classification["explanation"]
        )

    # Borderline or medium-confidence cases go to human review
    if severity == "borderline" or (severity == "violation" and confidence < 0.95):
        priority = "high" if severity == "violation" else "medium"
        return ModerationDecision(
            action=RouteAction.HUMAN_REVIEW,
            severity=severity,
            categories=categories,
            overall_confidence=confidence,
            explanation=classification["explanation"],
            review_priority=priority
        )

    # Low confidence on anything — monitor but don't act
    if confidence < 0.70:
        # Check for policy gap: content triggered no strong category match
        has_strong_category = any(c["confidence"] > 0.5 for c in categories)
        return ModerationDecision(
            action=RouteAction.MONITOR,
            severity=severity,
            categories=categories,
            overall_confidence=confidence,
            explanation=classification["explanation"],
            policy_gap=not has_strong_category
        )

    # Default: human review for anything else
    return ModerationDecision(
        action=RouteAction.HUMAN_REVIEW,
        severity=severity,
        categories=categories,
        overall_confidence=confidence,
        explanation=classification["explanation"],
        review_priority="low"
    )


def process_content(text: str) -> ModerationDecision:
    """Full pipeline: classify content then route to action."""
    # Step 1: Classify (using the classifier from Section 1)
    classification = classify_content(text)

    # Step 2: Route
    decision = route_moderation_decision(classification)

    # Step 3: Log for feedback loop
    decision.feedback_id = f"mod_{datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')}"

    return decision


# Demonstrate routing with test scenarios
test_inputs = [
    {"text": "Great article! I learned so much about Python.", "expected": "AUTO_PASS"},
    {"text": "Kill all [ethnic group] they don't deserve to live", "expected": "AUTO_REMOVE"},
    {"text": "This joke might offend some people but here goes...", "expected": "HUMAN_REVIEW"},
    {"text": "Buy cheap watches at scamsite.com!!! 90% OFF!!!", "expected": "AUTO_REMOVE"},
]

for case in test_inputs:
    decision = process_content(case["text"])
    status = "✓" if decision.action.value.upper().replace("_", " ") == case["expected"].replace("_", " ") else "✗"
    print(f"{status} '{case['text'][:50]}...'")
    print(f"  Action: {decision.action.value} | Confidence: {decision.overall_confidence}")
    print(f"  Priority: {decision.review_priority} | Policy Gap: {decision.policy_gap}")

2.3 Moderation Flow

Content Moderation Routing Flow

flowchart TD
    A[User Content] --> B[Haiku First-Pass]
    B --> C{Severity?}
    C -->|Critical| D[Immediate Escalate + Legal Team]
    C -->|Violation| E{Confidence ≥ 0.95?}
    C -->|Borderline| F[Human Review Queue]
    C -->|Safe| G{Confidence ≥ 0.95?}
    E -->|Yes| H[Auto-Remove + Notify User]
    E -->|No| I[Sonnet Re-classify]
    I --> F
    G -->|Yes| J[Pass Through]
    G -->|No| K[Monitor + Log]
    F --> L{Human Decision}
    L -->|Remove| M[Remove + Update Feedback]
    L -->|Approve| N[Pass + Update Feedback]
    M --> O[Retrain Signal]
    N --> O
    K --> P{Policy Gap?}
    P -->|Yes| Q[Flag for Policy Team]
    P -->|No| R[Standard Monitoring]

Case Study 2026

Social Media Platform Moderation at Scale

A social media platform processing millions of posts per day deployed this two-tier moderation system using Haiku for first-pass screening and Sonnet for edge cases. In a strong rollout, teams usually want to see outcomes like:

Detection quality: High accuracy on clear policy-violation cases compared with a legacy moderation pipeline
False positives: Meaningfully lower over-removal rates on benign content
Automation: Most content auto-actioned without human review
Reviewer workload: Human review queue shrinking substantially so reviewers focus on ambiguous cases
Latency: Borderline-case review windows dropping from many hours to a much shorter operational target

The model-cascading approach (Haiku screens, Sonnet adjudicates) can reduce per-item cost substantially compared with running the higher-capability model on every item while keeping strong moderation quality on ambiguous cases.

High Accuracy Lower False Positives Model Cascading Production Scale

3. Legal Document Summarization (CCA 1.5)

Legal documents present unique challenges: they are long (50–200+ pages), highly structured, and require citation preservation — every extracted fact must trace back to a specific paragraph or clause in the source. The CCA exam tests three strategies for handling long documents.

3.1 Long Document Handling Strategies

Strategy	Best For	Pros	Cons
Chunking	Documents >200K tokens	Works with any document size	Loses cross-chunk context
Sliding Window	Narrative documents	Preserves local context	Overlap increases cost
Hierarchical	Structured legal docs	Best accuracy + citations	Most complex to implement

For legal documents, the hierarchical approach works best: chunk by natural section boundaries (articles, clauses), summarize each section independently with citation tracking, then produce a final document-level summary from the section summaries.

3.2 Legal Document Chunker with Citation Tracking

import anthropic
import json
import re
from dataclasses import dataclass, field
from typing import Optional

# Initialize client
client = anthropic.Anthropic()


@dataclass
class DocumentChunk:
    chunk_id: str
    section_title: str
    text: str
    start_paragraph: int
    end_paragraph: int
    token_estimate: int


@dataclass
class CitedFact:
    fact: str
    source_chunk_id: str
    source_paragraph: int
    confidence: float


@dataclass
class SectionSummary:
    chunk_id: str
    section_title: str
    summary: str
    key_facts: list  # list of CitedFact
    parties_mentioned: list
    dates_mentioned: list
    obligations: list


def chunk_legal_document(document_text: str, max_tokens_per_chunk: int = 4000) -> list:
    """Split a legal document into chunks based on section boundaries."""
    # Split on common legal section patterns
    section_pattern = r'(?=(?:ARTICLE|SECTION|CLAUSE|PART)\s+\d+|(?:\d+\.)\s+[A-Z])'
    raw_sections = re.split(section_pattern, document_text)

    chunks = []
    current_paragraph = 1

    for i, section in enumerate(raw_sections):
        if not section.strip():
            continue

        # Extract section title from first line
        lines = section.strip().split('\n')
        title = lines[0].strip() if lines else f"Section {i + 1}"

        # Estimate tokens (rough: 1 token ≈ 4 chars)
        token_estimate = len(section) // 4

        # If section exceeds max tokens, split further by paragraphs
        if token_estimate > max_tokens_per_chunk:
            paragraphs = section.split('\n\n')
            sub_chunk_text = ""
            sub_chunk_start = current_paragraph

            for para in paragraphs:
                if (len(sub_chunk_text) + len(para)) // 4 > max_tokens_per_chunk:
                    if sub_chunk_text:
                        chunks.append(DocumentChunk(
                            chunk_id=f"chunk_{len(chunks) + 1:03d}",
                            section_title=f"{title} (continued)",
                            text=sub_chunk_text.strip(),
                            start_paragraph=sub_chunk_start,
                            end_paragraph=current_paragraph - 1,
                            token_estimate=len(sub_chunk_text) // 4
                        ))
                    sub_chunk_text = para + "\n\n"
                    sub_chunk_start = current_paragraph
                else:
                    sub_chunk_text += para + "\n\n"
                current_paragraph += 1

            if sub_chunk_text.strip():
                chunks.append(DocumentChunk(
                    chunk_id=f"chunk_{len(chunks) + 1:03d}",
                    section_title=title,
                    text=sub_chunk_text.strip(),
                    start_paragraph=sub_chunk_start,
                    end_paragraph=current_paragraph - 1,
                    token_estimate=len(sub_chunk_text) // 4
                ))
        else:
            paragraph_count = section.count('\n\n') + 1
            chunks.append(DocumentChunk(
                chunk_id=f"chunk_{len(chunks) + 1:03d}",
                section_title=title,
                text=section.strip(),
                start_paragraph=current_paragraph,
                end_paragraph=current_paragraph + paragraph_count - 1,
                token_estimate=token_estimate
            ))
            current_paragraph += paragraph_count

    return chunks


# Define extraction tool for structured output
LEGAL_EXTRACTION_TOOL = {
    "name": "extract_legal_summary",
    "description": "Extract structured information from a legal document section.",
    "input_schema": {
        "type": "object",
        "properties": {
            "summary": {"type": "string", "description": "Concise summary of this section (2-4 sentences)"},
            "key_facts": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "fact": {"type": "string"},
                        "source_paragraph": {"type": "integer", "description": "Paragraph number in the source chunk"},
                        "confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
                    },
                    "required": ["fact", "source_paragraph", "confidence"]
                }
            },
            "parties_mentioned": {"type": "array", "items": {"type": "string"}},
            "dates_mentioned": {"type": "array", "items": {"type": "string"}},
            "obligations": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "obligor": {"type": "string", "description": "Party with the obligation"},
                        "obligation": {"type": "string", "description": "What they must do"},
                        "deadline": {"type": "string", "description": "When it must be done (if specified)"}
                    },
                    "required": ["obligor", "obligation"]
                }
            }
        },
        "required": ["summary", "key_facts", "parties_mentioned", "dates_mentioned", "obligations"]
    }
}


def summarize_chunk(chunk: DocumentChunk) -> SectionSummary:
    """Summarize a single chunk with citation tracking."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are a legal document analyst. Extract key information from the provided section, preserving paragraph references for citation tracking.",
        tools=[LEGAL_EXTRACTION_TOOL],
        tool_choice={"type": "tool", "name": "extract_legal_summary"},
        messages=[{
            "role": "user",
            "content": f"Section: {chunk.section_title}\nParagraphs {chunk.start_paragraph}-{chunk.end_paragraph}\n\n{chunk.text}"
        }]
    )

    for block in response.content:
        if block.type == "tool_use":
            data = block.input
            return SectionSummary(
                chunk_id=chunk.chunk_id,
                section_title=chunk.section_title,
                summary=data["summary"],
                key_facts=[
                    CitedFact(
                        fact=f["fact"],
                        source_chunk_id=chunk.chunk_id,
                        source_paragraph=f["source_paragraph"] + chunk.start_paragraph - 1,
                        confidence=f["confidence"]
                    ) for f in data["key_facts"]
                ],
                parties_mentioned=data["parties_mentioned"],
                dates_mentioned=data["dates_mentioned"],
                obligations=data["obligations"]
            )

    raise ValueError(f"No extraction returned for chunk {chunk.chunk_id}")


# Example: Process a sample contract
sample_contract = """
ARTICLE 1 - DEFINITIONS

1.1 "Agreement" means this Master Services Agreement between Acme Corp ("Provider")
and GlobalTech Inc ("Client"), effective as of January 15, 2026.

1.2 "Services" means the cloud computing infrastructure services described in
Exhibit A, including compute, storage, and networking resources.

ARTICLE 2 - TERM AND TERMINATION

2.1 Initial Term. This Agreement shall commence on the Effective Date and continue
for a period of thirty-six (36) months unless earlier terminated.

2.2 Termination for Convenience. Either party may terminate this Agreement upon
ninety (90) days prior written notice to the other party.

2.3 Termination for Cause. Either party may terminate immediately upon written
notice if the other party materially breaches and fails to cure within thirty (30) days.

ARTICLE 3 - PAYMENT OBLIGATIONS

3.1 Fees. Client shall pay Provider the fees set forth in Exhibit B within thirty (30)
days of receipt of each monthly invoice. Late payments accrue interest at 1.5% per month.

3.2 Annual Escalation. Fees shall increase by no more than 5% annually, with Provider
giving at least sixty (60) days notice of any increase.
"""

chunks = chunk_legal_document(sample_contract)
print(f"Document split into {len(chunks)} chunks:")
for chunk in chunks:
    print(f"  {chunk.chunk_id}: {chunk.section_title} ({chunk.token_estimate} tokens, ¶{chunk.start_paragraph}-{chunk.end_paragraph})")
    summary = summarize_chunk(chunk)
    print(f"    Summary: {summary.summary[:100]}...")
    print(f"    Obligations: {len(summary.obligations)} found")
    print(f"    Cited facts: {len(summary.key_facts)} extracted")

4. Accuracy & Validation

Legal summarization demands higher accuracy than most AI tasks — an incorrect obligation or missed termination clause can have significant financial consequences. The hierarchical approach gives us natural validation checkpoints.

4.1 Hierarchical Summarizer with Accuracy Scoring

The hierarchical approach works in three phases: (1) chunk the document by sections, (2) summarize each section independently with citations, and (3) produce a final document-level summary from the section summaries. At each phase, we compute an accuracy score based on citation coverage and cross-reference consistency.

import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional

# Initialize client
client = anthropic.Anthropic()


@dataclass
class DocumentSummary:
    title: str
    parties: list
    effective_date: str
    term: str
    executive_summary: str
    section_summaries: list  # list of SectionSummary
    all_obligations: list
    key_dates: list
    termination_clauses: list
    accuracy_score: float  # 0.0 to 1.0


FINAL_SUMMARY_TOOL = {
    "name": "produce_document_summary",
    "description": "Produce a final document-level summary from section summaries.",
    "input_schema": {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "parties": {"type": "array", "items": {"type": "string"}},
            "effective_date": {"type": "string"},
            "term": {"type": "string", "description": "Duration/term of the agreement"},
            "executive_summary": {"type": "string", "description": "3-5 sentence executive summary"},
            "key_obligations": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "party": {"type": "string"},
                        "obligation": {"type": "string"},
                        "deadline": {"type": "string"},
                        "source_section": {"type": "string"}
                    },
                    "required": ["party", "obligation", "source_section"]
                }
            },
            "termination_clauses": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "type": {"type": "string", "enum": ["convenience", "cause", "expiration", "mutual"]},
                        "notice_period": {"type": "string"},
                        "conditions": {"type": "string"},
                        "source_section": {"type": "string"}
                    },
                    "required": ["type", "conditions", "source_section"]
                }
            },
            "key_dates": {"type": "array", "items": {"type": "string"}},
            "risks_and_notes": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["title", "parties", "effective_date", "term", "executive_summary", "key_obligations", "termination_clauses", "key_dates"]
    }
}


def compute_accuracy_score(section_summaries: list, final_summary: dict) -> float:
    """Compute accuracy score based on citation coverage and consistency."""
    score = 1.0
    penalties = []

    # Check 1: All parties from sections appear in final summary
    all_parties = set()
    for s in section_summaries:
        all_parties.update(s.parties_mentioned)
    final_parties = set(final_summary.get("parties", []))
    missing_parties = all_parties - final_parties
    if missing_parties:
        penalty = 0.1 * len(missing_parties)
        penalties.append(f"Missing parties: {missing_parties} (-{penalty:.2f})")
        score -= penalty

    # Check 2: All obligations have source_section references
    obligations = final_summary.get("key_obligations", [])
    unreferenced = sum(1 for o in obligations if not o.get("source_section"))
    if unreferenced:
        penalty = 0.05 * unreferenced
        penalties.append(f"Unreferenced obligations: {unreferenced} (-{penalty:.2f})")
        score -= penalty

    # Check 3: Dates mentioned in sections appear in final key_dates
    all_dates = set()
    for s in section_summaries:
        all_dates.update(s.dates_mentioned)
    final_dates = set(final_summary.get("key_dates", []))
    # Allow partial match (dates might be reformatted)
    if all_dates and not final_dates:
        penalties.append("No dates in final summary (-0.15)")
        score -= 0.15

    # Check 4: High-confidence facts are represented
    high_confidence_facts = []
    for s in section_summaries:
        high_confidence_facts.extend([f for f in s.key_facts if f.confidence >= 0.9])
    if high_confidence_facts and not obligations:
        penalties.append("High-confidence facts not reflected (-0.2)")
        score -= 0.2

    score = max(0.0, min(1.0, score))

    if penalties:
        print(f"  Accuracy penalties: {'; '.join(penalties)}")

    return round(score, 3)


def hierarchical_summarize(document_text: str) -> DocumentSummary:
    """Full hierarchical summarization pipeline with accuracy scoring."""
    # Phase 1: Chunk
    chunks = chunk_legal_document(document_text)
    print(f"Phase 1: Split into {len(chunks)} chunks")

    # Phase 2: Summarize each section
    section_summaries = []
    for chunk in chunks:
        summary = summarize_chunk(chunk)
        section_summaries.append(summary)
        print(f"Phase 2: Summarized {chunk.section_title}")

    # Phase 3: Produce final summary from section summaries
    combined_context = "\n\n".join([
        f"## {s.section_title}\n{s.summary}\nObligations: {json.dumps(s.obligations)}\nDates: {s.dates_mentioned}\nParties: {s.parties_mentioned}"
        for s in section_summaries
    ])

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system="You are a legal analyst producing a final document summary from section-level summaries. Preserve all source_section citations.",
        tools=[FINAL_SUMMARY_TOOL],
        tool_choice={"type": "tool", "name": "produce_document_summary"},
        messages=[{
            "role": "user",
            "content": f"Produce a final document summary from these section summaries:\n\n{combined_context}"
        }]
    )

    final_data = None
    for block in response.content:
        if block.type == "tool_use":
            final_data = block.input
            break

    if not final_data:
        raise ValueError("No final summary produced")

    # Phase 4: Compute accuracy
    accuracy = compute_accuracy_score(section_summaries, final_data)
    print(f"Phase 3: Final summary produced (accuracy: {accuracy})")

    return DocumentSummary(
        title=final_data["title"],
        parties=final_data["parties"],
        effective_date=final_data["effective_date"],
        term=final_data["term"],
        executive_summary=final_data["executive_summary"],
        section_summaries=section_summaries,
        all_obligations=final_data["key_obligations"],
        key_dates=final_data["key_dates"],
        termination_clauses=final_data["termination_clauses"],
        accuracy_score=accuracy
    )


# Run the full pipeline
result = hierarchical_summarize(sample_contract)
print(f"\n{'='*60}")
print(f"Document: {result.title}")
print(f"Parties: {', '.join(result.parties)}")
print(f"Term: {result.term}")
print(f"Executive Summary: {result.executive_summary}")
print(f"Obligations: {len(result.all_obligations)}")
print(f"Termination Clauses: {len(result.termination_clauses)}")
print(f"Accuracy Score: {result.accuracy_score}")

4.2 Summarization Pyramid

Hierarchical Summarization Pipeline

flowchart TD
    A[100+ Page Legal Document] --> B[Section Boundary Detection]
    B --> C1[Chunk 1: Definitions]
    B --> C2[Chunk 2: Term & Termination]
    B --> C3[Chunk 3: Payment]
    B --> C4[Chunk N: Miscellaneous]
    C1 --> D1[Section Summary + Citations]
    C2 --> D2[Section Summary + Citations]
    C3 --> D3[Section Summary + Citations]
    C4 --> D4[Section Summary + Citations]
    D1 --> E[Accuracy Validation]
    D2 --> E
    D3 --> E
    D4 --> E
    E --> F[Final Document Summary]
    F --> G{Accuracy ≥ 0.85?}
    G -->|Yes| H[Deliver to User]
    G -->|No| I[Flag for Human Review]
    I --> J[Analyst Reviews + Corrects]
    J --> K[Correction Feedback Loop]

5. Production Patterns

Moving from prototype to production requires addressing three challenges: scale (millions of items per day), cost (per-token pricing adds up), and latency (moderation must not block the user experience). Model cascading and batch processing solve all three.

5.1 Model Cascading for Moderation

Model cascading uses a fast, cheap model for initial screening and a slower, more capable model only for edge cases. For content moderation, this usually means Haiku screens the full stream and Sonnet re-classifies only the uncertain minority.

import anthropic
import json
from dataclasses import dataclass
from typing import Optional

# Initialize client
client = anthropic.Anthropic()

# Reuse the MODERATION_TOOL definition from Section 1
MODERATION_TOOL = {
    "name": "classify_content",
    "description": "Classify content for moderation with severity and categories.",
    "input_schema": {
        "type": "object",
        "properties": {
            "severity": {"type": "string", "enum": ["safe", "borderline", "violation", "critical"]},
            "categories": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "label": {"type": "string", "enum": ["harassment", "hate_speech", "spam", "explicit", "misinformation", "self_harm", "violence", "illegal_activity"]},
                        "confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
                    },
                    "required": ["label", "confidence"]
                }
            },
            "explanation": {"type": "string"},
            "overall_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
        },
        "required": ["severity", "categories", "explanation", "overall_confidence"]
    }
}

SYSTEM_PROMPT = """You are a content moderation classifier. Classify content using the classify_content tool.
Severity levels: safe, borderline, violation, critical.
Be calibrated: report your true confidence. If unsure, use a lower confidence score."""


@dataclass
class CascadeResult:
    final_classification: dict
    model_used: str
    was_escalated: bool
    total_input_tokens: int
    total_output_tokens: int


def classify_with_cascade(
    text: str,
    escalation_threshold: float = 0.90,
    fast_model: str = "claude-haiku-4-20250514",
    strong_model: str = "claude-sonnet-4-20250514"
) -> CascadeResult:
    """Two-tier model cascade: fast model screens, strong model adjudicates edge cases."""

    # Tier 1: Fast model (Haiku) screens all content
    tier1_response = client.messages.create(
        model=fast_model,
        max_tokens=512,
        system=SYSTEM_PROMPT,
        tools=[MODERATION_TOOL],
        tool_choice={"type": "tool", "name": "classify_content"},
        messages=[{"role": "user", "content": f"Classify: {text}"}]
    )

    tier1_result = None
    for block in tier1_response.content:
        if block.type == "tool_use":
            tier1_result = block.input
            break

    tier1_tokens = tier1_response.usage.input_tokens + tier1_response.usage.output_tokens

    # Decision: Is Haiku confident enough?
    if tier1_result and tier1_result["overall_confidence"] >= escalation_threshold:
        # High confidence — accept Haiku's classification
        return CascadeResult(
            final_classification=tier1_result,
            model_used=fast_model,
            was_escalated=False,
            total_input_tokens=tier1_response.usage.input_tokens,
            total_output_tokens=tier1_response.usage.output_tokens
        )

    # Tier 2: Strong model (Sonnet) re-classifies uncertain content
    tier2_response = client.messages.create(
        model=strong_model,
        max_tokens=1024,
        system=SYSTEM_PROMPT + "\n\nA fast classifier was uncertain about this content. Provide your independent assessment.",
        tools=[MODERATION_TOOL],
        tool_choice={"type": "tool", "name": "classify_content"},
        messages=[{"role": "user", "content": f"Classify (needs careful review): {text}"}]
    )

    tier2_result = None
    for block in tier2_response.content:
        if block.type == "tool_use":
            tier2_result = block.input
            break

    return CascadeResult(
        final_classification=tier2_result or tier1_result,
        model_used=strong_model,
        was_escalated=True,
        total_input_tokens=tier1_response.usage.input_tokens + tier2_response.usage.input_tokens,
        total_output_tokens=tier1_response.usage.output_tokens + tier2_response.usage.output_tokens
    )


# Demonstrate cascade behavior
test_items = [
    "I love this product! Best purchase ever.",                          # Clear safe
    "You absolute moron, go die in a fire",                              # Clear violation
    "This politician's immigration policy is destroying our country",    # Borderline — needs Sonnet
    "BREAKING: Scientists confirm earth is flat (satire account)",       # Ambiguous — misinformation or humor?
]

total_escalated = 0
for text in test_items:
    result = classify_with_cascade(text)
    total_escalated += int(result.was_escalated)
    print(f"\n'{text[:50]}...'")
    print(f"  Model: {result.model_used.split('-')[1]} | Escalated: {result.was_escalated}")
    print(f"  Severity: {result.final_classification['severity']} ({result.final_classification['overall_confidence']:.0%})")
    print(f"  Tokens: {result.total_input_tokens + result.total_output_tokens}")

print(f"\nEscalation rate: {total_escalated}/{len(test_items)} ({total_escalated/len(test_items):.0%})")

                        
                        CCA Exam Tip — Model Cascading: The CCA exam often presents scenarios where you must choose between using a single powerful model vs. a cascade. The safer default answer is usually to cascade: use Haiku for high-volume screening and Sonnet/Opus only for the uncertain remainder. The reasoning to cite is lower cost, similar quality on clear cases, and stronger performance from Sonnet/Opus on genuinely ambiguous content. Remember that overall_confidence in the tool output is your escalation signal.
                    

5.2 Batch Processing at Scale

For non-real-time moderation (reported content, pre-publication review, bulk scanning), the Message Batches API processes up to 100,000 items per batch at 50% cost reduction with 24-hour SLA.

import anthropic
import json
from datetime import datetime, timezone

# Initialize client
client = anthropic.Anthropic()

# Moderation tool definition (same as Section 1)
MODERATION_TOOL = {
    "name": "classify_content",
    "description": "Classify content for moderation.",
    "input_schema": {
        "type": "object",
        "properties": {
            "severity": {"type": "string", "enum": ["safe", "borderline", "violation", "critical"]},
            "categories": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "label": {"type": "string"},
                        "confidence": {"type": "number"}
                    },
                    "required": ["label", "confidence"]
                }
            },
            "explanation": {"type": "string"},
            "overall_confidence": {"type": "number"}
        },
        "required": ["severity", "categories", "explanation", "overall_confidence"]
    }
}


def create_moderation_batch(content_items: list) -> str:
    """Submit a batch of content items for moderation classification."""
    requests = []
    for i, item in enumerate(content_items):
        requests.append({
            "custom_id": f"mod_{i:06d}",
            "params": {
                "model": "claude-haiku-4-20250514",
                "max_tokens": 512,
                "system": "You are a content moderation classifier. Classify using the tool.",
                "tools": [MODERATION_TOOL],
                "tool_choice": {"type": "tool", "name": "classify_content"},
                "messages": [{"role": "user", "content": f"Classify: {item}"}]
            }
        })

    # Create the batch
    batch = client.messages.batches.create(requests=requests)
    print(f"Batch created: {batch.id}")
    print(f"  Items: {len(requests)}")
    print(f"  Status: {batch.processing_status}")
    print(f"  Created: {datetime.now(timezone.utc).isoformat()}")
    return batch.id


def check_batch_results(batch_id: str) -> list:
    """Check batch status and retrieve results when complete."""
    batch = client.messages.batches.retrieve(batch_id)
    print(f"Batch {batch_id}: {batch.processing_status}")
    print(f"  Succeeded: {batch.request_counts.succeeded}")
    print(f"  Failed: {batch.request_counts.errored}")
    print(f"  Processing: {batch.request_counts.processing}")

    if batch.processing_status != "ended":
        return []

    # Stream results
    results = []
    for result in client.messages.batches.results(batch_id):
        if result.result.type == "succeeded":
            message = result.result.message
            for block in message.content:
                if block.type == "tool_use":
                    results.append({
                        "custom_id": result.custom_id,
                        "classification": block.input
                    })
    return results


# Example: Submit batch of reported content
reported_content = [
    "Check out my amazing weight loss supplement! Buy now at scamsite.com",
    "I think the new policy changes are concerning and here's why...",
    "You're all pathetic losers who deserve to suffer",
    "Here's my recipe for chocolate cake: mix flour, sugar, eggs...",
    "BREAKING NEWS: Celebrity secretly a lizard person (obvious satire)",
]

batch_id = create_moderation_batch(reported_content)
print(f"\nBatch submitted. Poll with: check_batch_results('{batch_id}')")
print("Results available within 24 hours at 50% cost reduction.")

                        
                        Production Deployment Tip: Never rely solely on automated moderation for critical safety decisions. Always implement a human-in-the-loop for content in the borderline zone. The feedback from human reviewers (agree/disagree with the classifier) becomes your retraining signal. Track false positive rate weekly, compare it to your own policy threshold, and adjust the prompt or routing logic if it drifts upward. Store all human decisions as labeled examples for prompt refinement.
                    

Next in the SDK Track

In Part 20: Guardrails — Hallucinations & Consistency, we’ll cover CCA Phase 4.1 and 4.2 — reducing hallucinations with source grounding, nullable schema fields, temperature control for determinism, and eval-driven consistency.