1. Content Moderation System (CCA 1.4)
Content moderation at scale requires a system that classifies content across multiple violation categories simultaneously, assigns severity levels, and routes decisions appropriately. Claude’s tool_use feature provides the structured output format needed for reliable automation.
1.1 Severity Tiers & Classification Labels
A production moderation system operates on two axes: severity (how bad is it?) and category (what type of violation?). The CCA exam expects you to implement multi-label classification where a single piece of content can trigger multiple categories.
| Severity | Action | Response Time | Example |
|---|---|---|---|
| Safe | Pass through | N/A | Normal user content |
| Borderline | Human review queue | 4 hours | Edgy humor, ambiguous context |
| Violation | Auto-remove + notify | Immediate | Clear hate speech, explicit content |
| Critical | Auto-remove + escalate + legal | Immediate | CSAM, terrorism, imminent threats |
Classification labels operate independently — content can be both harassment and hate_speech, or spam and misinformation. Each label carries its own confidence score.
1.2 Complete Moderation Classifier
This classifier uses tool_use with tool_choice: {"type": "tool", "name": "classify_content"} to force Claude to produce structured output for every input. The tool schema defines the exact fields needed for downstream routing.
import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional
# Initialize client
client = anthropic.Anthropic()
# Define the moderation classification tool
MODERATION_TOOL = {
"name": "classify_content",
"description": "Classify content for moderation. Assign severity and all applicable violation categories with confidence scores.",
"input_schema": {
"type": "object",
"properties": {
"severity": {
"type": "string",
"enum": ["safe", "borderline", "violation", "critical"],
"description": "Overall severity level determining the action taken"
},
"categories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"label": {
"type": "string",
"enum": ["harassment", "hate_speech", "spam", "explicit", "misinformation", "self_harm", "violence", "illegal_activity"]
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Confidence score for this category (0.0 to 1.0)"
}
},
"required": ["label", "confidence"]
},
"description": "All applicable violation categories with confidence scores"
},
"explanation": {
"type": "string",
"description": "Brief explanation of the classification decision for audit logs"
},
"overall_confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Overall confidence in the severity assessment"
}
},
"required": ["severity", "categories", "explanation", "overall_confidence"]
}
}
MODERATION_SYSTEM_PROMPT = """You are a content moderation classifier. Analyze the provided content and classify it using the classify_content tool.
Classification guidelines:
- safe: Content that clearly does not violate any policy
- borderline: Content that is ambiguous, could be interpreted as violating policy depending on context
- violation: Content that clearly violates policy and should be removed
- critical: Content involving imminent physical danger, CSAM, or terrorism that requires immediate escalation
For categories, assign ALL applicable labels with honest confidence scores:
- harassment: Targeted attacks, bullying, or intimidation of individuals
- hate_speech: Content attacking protected groups based on identity characteristics
- spam: Unsolicited commercial content, scams, or coordinated inauthentic behavior
- explicit: Sexually explicit content or graphic violence
- misinformation: Demonstrably false claims about health, elections, or safety
- self_harm: Content promoting or instructing self-harm or suicide
- violence: Threats or glorification of violence against individuals/groups
- illegal_activity: Content promoting or facilitating clearly illegal actions
Be calibrated: a 0.9 confidence means you are correct 90% of the time at that threshold."""
def classify_content(text: str, context: Optional[str] = None) -> dict:
"""Classify content using Claude with forced structured output."""
user_message = f"Classify the following content for moderation:\n\n---\n{text}\n---"
if context:
user_message += f"\n\nAdditional context: {context}"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=MODERATION_SYSTEM_PROMPT,
tools=[MODERATION_TOOL],
tool_choice={"type": "tool", "name": "classify_content"},
messages=[{"role": "user", "content": user_message}]
)
# Extract the tool use result
for block in response.content:
if block.type == "tool_use":
return block.input
raise ValueError("No classification returned from model")
# Test with example content
test_cases = [
"Hey everyone! Check out my new blog post about gardening tips.",
"You're such a worthless piece of garbage, nobody likes you.",
"Studies show that vaccine X causes autism in 100% of cases (completely fabricated).",
"I'm going to find where you live and make you pay for what you said.",
]
for content in test_cases:
result = classify_content(content)
print(f"\nContent: {content[:60]}...")
print(f" Severity: {result['severity']}")
print(f" Categories: {[(c['label'], c['confidence']) for c in result['categories']]}")
print(f" Confidence: {result['overall_confidence']}")
print(f" Explanation: {result['explanation']}")
2. Human Review Routing
Not every moderation decision should be automated. A confidence-based routing system sends clear cases to auto-action while routing ambiguous content to human reviewers. This reduces false positives (wrongly removed content) while maintaining safety.
2.1 Confidence-Based Routing Logic
| Confidence Range | Action | Rationale |
|---|---|---|
| ≥ 0.95 | Auto-action (remove/pass) | High confidence — classifier is reliable at this threshold |
| 0.70 – 0.94 | Human review queue | Uncertain — needs human judgment for context |
| < 0.70 | Pass through (monitor) | Too uncertain to act — log for pattern analysis |
The key insight: false positives destroy user trust faster than false negatives destroy platform safety. A user whose legitimate post is wrongly removed will leave. A borderline post that stays up for 4 hours while in review causes less damage than removing a legitimate post.
2.2 Complete Review Router
This router takes the classifier output and routes to one of three paths: auto-action, human review, or monitoring. It also handles policy gap detection — content that doesn’t fit existing categories but may represent emerging violation patterns.
import anthropic
import json
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Optional
# Initialize client
client = anthropic.Anthropic()
class RouteAction(Enum):
AUTO_REMOVE = "auto_remove"
AUTO_PASS = "auto_pass"
HUMAN_REVIEW = "human_review"
ESCALATE_LEGAL = "escalate_legal"
MONITOR = "monitor"
@dataclass
class ModerationDecision:
action: RouteAction
severity: str
categories: list
overall_confidence: float
explanation: str
review_priority: Optional[str] = None # low, medium, high, urgent
policy_gap: bool = False
feedback_id: Optional[str] = None
def route_moderation_decision(classification: dict) -> ModerationDecision:
"""Route a classification result to the appropriate action."""
severity = classification["severity"]
confidence = classification["overall_confidence"]
categories = classification["categories"]
# Critical severity always escalates immediately
if severity == "critical":
return ModerationDecision(
action=RouteAction.ESCALATE_LEGAL,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"],
review_priority="urgent"
)
# Safe content with high confidence passes through
if severity == "safe" and confidence >= 0.95:
return ModerationDecision(
action=RouteAction.AUTO_PASS,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"]
)
# Violation with high confidence auto-removes
if severity == "violation" and confidence >= 0.95:
return ModerationDecision(
action=RouteAction.AUTO_REMOVE,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"]
)
# Borderline or medium-confidence cases go to human review
if severity == "borderline" or (severity == "violation" and confidence < 0.95):
priority = "high" if severity == "violation" else "medium"
return ModerationDecision(
action=RouteAction.HUMAN_REVIEW,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"],
review_priority=priority
)
# Low confidence on anything — monitor but don't act
if confidence < 0.70:
# Check for policy gap: content triggered no strong category match
has_strong_category = any(c["confidence"] > 0.5 for c in categories)
return ModerationDecision(
action=RouteAction.MONITOR,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"],
policy_gap=not has_strong_category
)
# Default: human review for anything else
return ModerationDecision(
action=RouteAction.HUMAN_REVIEW,
severity=severity,
categories=categories,
overall_confidence=confidence,
explanation=classification["explanation"],
review_priority="low"
)
def process_content(text: str) -> ModerationDecision:
"""Full pipeline: classify content then route to action."""
# Step 1: Classify (using the classifier from Section 1)
classification = classify_content(text)
# Step 2: Route
decision = route_moderation_decision(classification)
# Step 3: Log for feedback loop
decision.feedback_id = f"mod_{datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')}"
return decision
# Demonstrate routing with test scenarios
test_inputs = [
{"text": "Great article! I learned so much about Python.", "expected": "AUTO_PASS"},
{"text": "Kill all [ethnic group] they don't deserve to live", "expected": "AUTO_REMOVE"},
{"text": "This joke might offend some people but here goes...", "expected": "HUMAN_REVIEW"},
{"text": "Buy cheap watches at scamsite.com!!! 90% OFF!!!", "expected": "AUTO_REMOVE"},
]
for case in test_inputs:
decision = process_content(case["text"])
status = "✓" if decision.action.value.upper().replace("_", " ") == case["expected"].replace("_", " ") else "✗"
print(f"{status} '{case['text'][:50]}...'")
print(f" Action: {decision.action.value} | Confidence: {decision.overall_confidence}")
print(f" Priority: {decision.review_priority} | Policy Gap: {decision.policy_gap}")
2.3 Moderation Flow
flowchart TD
A[User Content] --> B[Haiku First-Pass]
B --> C{Severity?}
C -->|Critical| D[Immediate Escalate + Legal Team]
C -->|Violation| E{Confidence ≥ 0.95?}
C -->|Borderline| F[Human Review Queue]
C -->|Safe| G{Confidence ≥ 0.95?}
E -->|Yes| H[Auto-Remove + Notify User]
E -->|No| I[Sonnet Re-classify]
I --> F
G -->|Yes| J[Pass Through]
G -->|No| K[Monitor + Log]
F --> L{Human Decision}
L -->|Remove| M[Remove + Update Feedback]
L -->|Approve| N[Pass + Update Feedback]
M --> O[Retrain Signal]
N --> O
K --> P{Policy Gap?}
P -->|Yes| Q[Flag for Policy Team]
P -->|No| R[Standard Monitoring]
Social Media Platform Moderation at Scale
A social media platform processing millions of posts per day deployed this two-tier moderation system using Haiku for first-pass screening and Sonnet for edge cases. In a strong rollout, teams usually want to see outcomes like:
- Detection quality: High accuracy on clear policy-violation cases compared with a legacy moderation pipeline
- False positives: Meaningfully lower over-removal rates on benign content
- Automation: Most content auto-actioned without human review
- Reviewer workload: Human review queue shrinking substantially so reviewers focus on ambiguous cases
- Latency: Borderline-case review windows dropping from many hours to a much shorter operational target
The model-cascading approach (Haiku screens, Sonnet adjudicates) can reduce per-item cost substantially compared with running the higher-capability model on every item while keeping strong moderation quality on ambiguous cases.
3. Legal Document Summarization (CCA 1.5)
Legal documents present unique challenges: they are long (50–200+ pages), highly structured, and require citation preservation — every extracted fact must trace back to a specific paragraph or clause in the source. The CCA exam tests three strategies for handling long documents.
3.1 Long Document Handling Strategies
| Strategy | Best For | Pros | Cons |
|---|---|---|---|
| Chunking | Documents >200K tokens | Works with any document size | Loses cross-chunk context |
| Sliding Window | Narrative documents | Preserves local context | Overlap increases cost |
| Hierarchical | Structured legal docs | Best accuracy + citations | Most complex to implement |
For legal documents, the hierarchical approach works best: chunk by natural section boundaries (articles, clauses), summarize each section independently with citation tracking, then produce a final document-level summary from the section summaries.
3.2 Legal Document Chunker with Citation Tracking
import anthropic
import json
import re
from dataclasses import dataclass, field
from typing import Optional
# Initialize client
client = anthropic.Anthropic()
@dataclass
class DocumentChunk:
chunk_id: str
section_title: str
text: str
start_paragraph: int
end_paragraph: int
token_estimate: int
@dataclass
class CitedFact:
fact: str
source_chunk_id: str
source_paragraph: int
confidence: float
@dataclass
class SectionSummary:
chunk_id: str
section_title: str
summary: str
key_facts: list # list of CitedFact
parties_mentioned: list
dates_mentioned: list
obligations: list
def chunk_legal_document(document_text: str, max_tokens_per_chunk: int = 4000) -> list:
"""Split a legal document into chunks based on section boundaries."""
# Split on common legal section patterns
section_pattern = r'(?=(?:ARTICLE|SECTION|CLAUSE|PART)\s+\d+|(?:\d+\.)\s+[A-Z])'
raw_sections = re.split(section_pattern, document_text)
chunks = []
current_paragraph = 1
for i, section in enumerate(raw_sections):
if not section.strip():
continue
# Extract section title from first line
lines = section.strip().split('\n')
title = lines[0].strip() if lines else f"Section {i + 1}"
# Estimate tokens (rough: 1 token ≈ 4 chars)
token_estimate = len(section) // 4
# If section exceeds max tokens, split further by paragraphs
if token_estimate > max_tokens_per_chunk:
paragraphs = section.split('\n\n')
sub_chunk_text = ""
sub_chunk_start = current_paragraph
for para in paragraphs:
if (len(sub_chunk_text) + len(para)) // 4 > max_tokens_per_chunk:
if sub_chunk_text:
chunks.append(DocumentChunk(
chunk_id=f"chunk_{len(chunks) + 1:03d}",
section_title=f"{title} (continued)",
text=sub_chunk_text.strip(),
start_paragraph=sub_chunk_start,
end_paragraph=current_paragraph - 1,
token_estimate=len(sub_chunk_text) // 4
))
sub_chunk_text = para + "\n\n"
sub_chunk_start = current_paragraph
else:
sub_chunk_text += para + "\n\n"
current_paragraph += 1
if sub_chunk_text.strip():
chunks.append(DocumentChunk(
chunk_id=f"chunk_{len(chunks) + 1:03d}",
section_title=title,
text=sub_chunk_text.strip(),
start_paragraph=sub_chunk_start,
end_paragraph=current_paragraph - 1,
token_estimate=len(sub_chunk_text) // 4
))
else:
paragraph_count = section.count('\n\n') + 1
chunks.append(DocumentChunk(
chunk_id=f"chunk_{len(chunks) + 1:03d}",
section_title=title,
text=section.strip(),
start_paragraph=current_paragraph,
end_paragraph=current_paragraph + paragraph_count - 1,
token_estimate=token_estimate
))
current_paragraph += paragraph_count
return chunks
# Define extraction tool for structured output
LEGAL_EXTRACTION_TOOL = {
"name": "extract_legal_summary",
"description": "Extract structured information from a legal document section.",
"input_schema": {
"type": "object",
"properties": {
"summary": {"type": "string", "description": "Concise summary of this section (2-4 sentences)"},
"key_facts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"fact": {"type": "string"},
"source_paragraph": {"type": "integer", "description": "Paragraph number in the source chunk"},
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
},
"required": ["fact", "source_paragraph", "confidence"]
}
},
"parties_mentioned": {"type": "array", "items": {"type": "string"}},
"dates_mentioned": {"type": "array", "items": {"type": "string"}},
"obligations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"obligor": {"type": "string", "description": "Party with the obligation"},
"obligation": {"type": "string", "description": "What they must do"},
"deadline": {"type": "string", "description": "When it must be done (if specified)"}
},
"required": ["obligor", "obligation"]
}
}
},
"required": ["summary", "key_facts", "parties_mentioned", "dates_mentioned", "obligations"]
}
}
def summarize_chunk(chunk: DocumentChunk) -> SectionSummary:
"""Summarize a single chunk with citation tracking."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a legal document analyst. Extract key information from the provided section, preserving paragraph references for citation tracking.",
tools=[LEGAL_EXTRACTION_TOOL],
tool_choice={"type": "tool", "name": "extract_legal_summary"},
messages=[{
"role": "user",
"content": f"Section: {chunk.section_title}\nParagraphs {chunk.start_paragraph}-{chunk.end_paragraph}\n\n{chunk.text}"
}]
)
for block in response.content:
if block.type == "tool_use":
data = block.input
return SectionSummary(
chunk_id=chunk.chunk_id,
section_title=chunk.section_title,
summary=data["summary"],
key_facts=[
CitedFact(
fact=f["fact"],
source_chunk_id=chunk.chunk_id,
source_paragraph=f["source_paragraph"] + chunk.start_paragraph - 1,
confidence=f["confidence"]
) for f in data["key_facts"]
],
parties_mentioned=data["parties_mentioned"],
dates_mentioned=data["dates_mentioned"],
obligations=data["obligations"]
)
raise ValueError(f"No extraction returned for chunk {chunk.chunk_id}")
# Example: Process a sample contract
sample_contract = """
ARTICLE 1 - DEFINITIONS
1.1 "Agreement" means this Master Services Agreement between Acme Corp ("Provider")
and GlobalTech Inc ("Client"), effective as of January 15, 2026.
1.2 "Services" means the cloud computing infrastructure services described in
Exhibit A, including compute, storage, and networking resources.
ARTICLE 2 - TERM AND TERMINATION
2.1 Initial Term. This Agreement shall commence on the Effective Date and continue
for a period of thirty-six (36) months unless earlier terminated.
2.2 Termination for Convenience. Either party may terminate this Agreement upon
ninety (90) days prior written notice to the other party.
2.3 Termination for Cause. Either party may terminate immediately upon written
notice if the other party materially breaches and fails to cure within thirty (30) days.
ARTICLE 3 - PAYMENT OBLIGATIONS
3.1 Fees. Client shall pay Provider the fees set forth in Exhibit B within thirty (30)
days of receipt of each monthly invoice. Late payments accrue interest at 1.5% per month.
3.2 Annual Escalation. Fees shall increase by no more than 5% annually, with Provider
giving at least sixty (60) days notice of any increase.
"""
chunks = chunk_legal_document(sample_contract)
print(f"Document split into {len(chunks)} chunks:")
for chunk in chunks:
print(f" {chunk.chunk_id}: {chunk.section_title} ({chunk.token_estimate} tokens, ¶{chunk.start_paragraph}-{chunk.end_paragraph})")
summary = summarize_chunk(chunk)
print(f" Summary: {summary.summary[:100]}...")
print(f" Obligations: {len(summary.obligations)} found")
print(f" Cited facts: {len(summary.key_facts)} extracted")
4. Accuracy & Validation
Legal summarization demands higher accuracy than most AI tasks — an incorrect obligation or missed termination clause can have significant financial consequences. The hierarchical approach gives us natural validation checkpoints.
4.1 Hierarchical Summarizer with Accuracy Scoring
The hierarchical approach works in three phases: (1) chunk the document by sections, (2) summarize each section independently with citations, and (3) produce a final document-level summary from the section summaries. At each phase, we compute an accuracy score based on citation coverage and cross-reference consistency.
import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional
# Initialize client
client = anthropic.Anthropic()
@dataclass
class DocumentSummary:
title: str
parties: list
effective_date: str
term: str
executive_summary: str
section_summaries: list # list of SectionSummary
all_obligations: list
key_dates: list
termination_clauses: list
accuracy_score: float # 0.0 to 1.0
FINAL_SUMMARY_TOOL = {
"name": "produce_document_summary",
"description": "Produce a final document-level summary from section summaries.",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"parties": {"type": "array", "items": {"type": "string"}},
"effective_date": {"type": "string"},
"term": {"type": "string", "description": "Duration/term of the agreement"},
"executive_summary": {"type": "string", "description": "3-5 sentence executive summary"},
"key_obligations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"party": {"type": "string"},
"obligation": {"type": "string"},
"deadline": {"type": "string"},
"source_section": {"type": "string"}
},
"required": ["party", "obligation", "source_section"]
}
},
"termination_clauses": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["convenience", "cause", "expiration", "mutual"]},
"notice_period": {"type": "string"},
"conditions": {"type": "string"},
"source_section": {"type": "string"}
},
"required": ["type", "conditions", "source_section"]
}
},
"key_dates": {"type": "array", "items": {"type": "string"}},
"risks_and_notes": {"type": "array", "items": {"type": "string"}}
},
"required": ["title", "parties", "effective_date", "term", "executive_summary", "key_obligations", "termination_clauses", "key_dates"]
}
}
def compute_accuracy_score(section_summaries: list, final_summary: dict) -> float:
"""Compute accuracy score based on citation coverage and consistency."""
score = 1.0
penalties = []
# Check 1: All parties from sections appear in final summary
all_parties = set()
for s in section_summaries:
all_parties.update(s.parties_mentioned)
final_parties = set(final_summary.get("parties", []))
missing_parties = all_parties - final_parties
if missing_parties:
penalty = 0.1 * len(missing_parties)
penalties.append(f"Missing parties: {missing_parties} (-{penalty:.2f})")
score -= penalty
# Check 2: All obligations have source_section references
obligations = final_summary.get("key_obligations", [])
unreferenced = sum(1 for o in obligations if not o.get("source_section"))
if unreferenced:
penalty = 0.05 * unreferenced
penalties.append(f"Unreferenced obligations: {unreferenced} (-{penalty:.2f})")
score -= penalty
# Check 3: Dates mentioned in sections appear in final key_dates
all_dates = set()
for s in section_summaries:
all_dates.update(s.dates_mentioned)
final_dates = set(final_summary.get("key_dates", []))
# Allow partial match (dates might be reformatted)
if all_dates and not final_dates:
penalties.append("No dates in final summary (-0.15)")
score -= 0.15
# Check 4: High-confidence facts are represented
high_confidence_facts = []
for s in section_summaries:
high_confidence_facts.extend([f for f in s.key_facts if f.confidence >= 0.9])
if high_confidence_facts and not obligations:
penalties.append("High-confidence facts not reflected (-0.2)")
score -= 0.2
score = max(0.0, min(1.0, score))
if penalties:
print(f" Accuracy penalties: {'; '.join(penalties)}")
return round(score, 3)
def hierarchical_summarize(document_text: str) -> DocumentSummary:
"""Full hierarchical summarization pipeline with accuracy scoring."""
# Phase 1: Chunk
chunks = chunk_legal_document(document_text)
print(f"Phase 1: Split into {len(chunks)} chunks")
# Phase 2: Summarize each section
section_summaries = []
for chunk in chunks:
summary = summarize_chunk(chunk)
section_summaries.append(summary)
print(f"Phase 2: Summarized {chunk.section_title}")
# Phase 3: Produce final summary from section summaries
combined_context = "\n\n".join([
f"## {s.section_title}\n{s.summary}\nObligations: {json.dumps(s.obligations)}\nDates: {s.dates_mentioned}\nParties: {s.parties_mentioned}"
for s in section_summaries
])
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a legal analyst producing a final document summary from section-level summaries. Preserve all source_section citations.",
tools=[FINAL_SUMMARY_TOOL],
tool_choice={"type": "tool", "name": "produce_document_summary"},
messages=[{
"role": "user",
"content": f"Produce a final document summary from these section summaries:\n\n{combined_context}"
}]
)
final_data = None
for block in response.content:
if block.type == "tool_use":
final_data = block.input
break
if not final_data:
raise ValueError("No final summary produced")
# Phase 4: Compute accuracy
accuracy = compute_accuracy_score(section_summaries, final_data)
print(f"Phase 3: Final summary produced (accuracy: {accuracy})")
return DocumentSummary(
title=final_data["title"],
parties=final_data["parties"],
effective_date=final_data["effective_date"],
term=final_data["term"],
executive_summary=final_data["executive_summary"],
section_summaries=section_summaries,
all_obligations=final_data["key_obligations"],
key_dates=final_data["key_dates"],
termination_clauses=final_data["termination_clauses"],
accuracy_score=accuracy
)
# Run the full pipeline
result = hierarchical_summarize(sample_contract)
print(f"\n{'='*60}")
print(f"Document: {result.title}")
print(f"Parties: {', '.join(result.parties)}")
print(f"Term: {result.term}")
print(f"Executive Summary: {result.executive_summary}")
print(f"Obligations: {len(result.all_obligations)}")
print(f"Termination Clauses: {len(result.termination_clauses)}")
print(f"Accuracy Score: {result.accuracy_score}")
4.2 Summarization Pyramid
flowchart TD
A[100+ Page Legal Document] --> B[Section Boundary Detection]
B --> C1[Chunk 1: Definitions]
B --> C2[Chunk 2: Term & Termination]
B --> C3[Chunk 3: Payment]
B --> C4[Chunk N: Miscellaneous]
C1 --> D1[Section Summary + Citations]
C2 --> D2[Section Summary + Citations]
C3 --> D3[Section Summary + Citations]
C4 --> D4[Section Summary + Citations]
D1 --> E[Accuracy Validation]
D2 --> E
D3 --> E
D4 --> E
E --> F[Final Document Summary]
F --> G{Accuracy ≥ 0.85?}
G -->|Yes| H[Deliver to User]
G -->|No| I[Flag for Human Review]
I --> J[Analyst Reviews + Corrects]
J --> K[Correction Feedback Loop]
5. Production Patterns
Moving from prototype to production requires addressing three challenges: scale (millions of items per day), cost (per-token pricing adds up), and latency (moderation must not block the user experience). Model cascading and batch processing solve all three.
5.1 Model Cascading for Moderation
Model cascading uses a fast, cheap model for initial screening and a slower, more capable model only for edge cases. For content moderation, this usually means Haiku screens the full stream and Sonnet re-classifies only the uncertain minority.
import anthropic
import json
from dataclasses import dataclass
from typing import Optional
# Initialize client
client = anthropic.Anthropic()
# Reuse the MODERATION_TOOL definition from Section 1
MODERATION_TOOL = {
"name": "classify_content",
"description": "Classify content for moderation with severity and categories.",
"input_schema": {
"type": "object",
"properties": {
"severity": {"type": "string", "enum": ["safe", "borderline", "violation", "critical"]},
"categories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"label": {"type": "string", "enum": ["harassment", "hate_speech", "spam", "explicit", "misinformation", "self_harm", "violence", "illegal_activity"]},
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
},
"required": ["label", "confidence"]
}
},
"explanation": {"type": "string"},
"overall_confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0}
},
"required": ["severity", "categories", "explanation", "overall_confidence"]
}
}
SYSTEM_PROMPT = """You are a content moderation classifier. Classify content using the classify_content tool.
Severity levels: safe, borderline, violation, critical.
Be calibrated: report your true confidence. If unsure, use a lower confidence score."""
@dataclass
class CascadeResult:
final_classification: dict
model_used: str
was_escalated: bool
total_input_tokens: int
total_output_tokens: int
def classify_with_cascade(
text: str,
escalation_threshold: float = 0.90,
fast_model: str = "claude-haiku-4-20250514",
strong_model: str = "claude-sonnet-4-20250514"
) -> CascadeResult:
"""Two-tier model cascade: fast model screens, strong model adjudicates edge cases."""
# Tier 1: Fast model (Haiku) screens all content
tier1_response = client.messages.create(
model=fast_model,
max_tokens=512,
system=SYSTEM_PROMPT,
tools=[MODERATION_TOOL],
tool_choice={"type": "tool", "name": "classify_content"},
messages=[{"role": "user", "content": f"Classify: {text}"}]
)
tier1_result = None
for block in tier1_response.content:
if block.type == "tool_use":
tier1_result = block.input
break
tier1_tokens = tier1_response.usage.input_tokens + tier1_response.usage.output_tokens
# Decision: Is Haiku confident enough?
if tier1_result and tier1_result["overall_confidence"] >= escalation_threshold:
# High confidence — accept Haiku's classification
return CascadeResult(
final_classification=tier1_result,
model_used=fast_model,
was_escalated=False,
total_input_tokens=tier1_response.usage.input_tokens,
total_output_tokens=tier1_response.usage.output_tokens
)
# Tier 2: Strong model (Sonnet) re-classifies uncertain content
tier2_response = client.messages.create(
model=strong_model,
max_tokens=1024,
system=SYSTEM_PROMPT + "\n\nA fast classifier was uncertain about this content. Provide your independent assessment.",
tools=[MODERATION_TOOL],
tool_choice={"type": "tool", "name": "classify_content"},
messages=[{"role": "user", "content": f"Classify (needs careful review): {text}"}]
)
tier2_result = None
for block in tier2_response.content:
if block.type == "tool_use":
tier2_result = block.input
break
return CascadeResult(
final_classification=tier2_result or tier1_result,
model_used=strong_model,
was_escalated=True,
total_input_tokens=tier1_response.usage.input_tokens + tier2_response.usage.input_tokens,
total_output_tokens=tier1_response.usage.output_tokens + tier2_response.usage.output_tokens
)
# Demonstrate cascade behavior
test_items = [
"I love this product! Best purchase ever.", # Clear safe
"You absolute moron, go die in a fire", # Clear violation
"This politician's immigration policy is destroying our country", # Borderline — needs Sonnet
"BREAKING: Scientists confirm earth is flat (satire account)", # Ambiguous — misinformation or humor?
]
total_escalated = 0
for text in test_items:
result = classify_with_cascade(text)
total_escalated += int(result.was_escalated)
print(f"\n'{text[:50]}...'")
print(f" Model: {result.model_used.split('-')[1]} | Escalated: {result.was_escalated}")
print(f" Severity: {result.final_classification['severity']} ({result.final_classification['overall_confidence']:.0%})")
print(f" Tokens: {result.total_input_tokens + result.total_output_tokens}")
print(f"\nEscalation rate: {total_escalated}/{len(test_items)} ({total_escalated/len(test_items):.0%})")
overall_confidence in the tool output is your escalation signal.
5.2 Batch Processing at Scale
For non-real-time moderation (reported content, pre-publication review, bulk scanning), the Message Batches API processes up to 100,000 items per batch at 50% cost reduction with 24-hour SLA.
import anthropic
import json
from datetime import datetime, timezone
# Initialize client
client = anthropic.Anthropic()
# Moderation tool definition (same as Section 1)
MODERATION_TOOL = {
"name": "classify_content",
"description": "Classify content for moderation.",
"input_schema": {
"type": "object",
"properties": {
"severity": {"type": "string", "enum": ["safe", "borderline", "violation", "critical"]},
"categories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"label": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["label", "confidence"]
}
},
"explanation": {"type": "string"},
"overall_confidence": {"type": "number"}
},
"required": ["severity", "categories", "explanation", "overall_confidence"]
}
}
def create_moderation_batch(content_items: list) -> str:
"""Submit a batch of content items for moderation classification."""
requests = []
for i, item in enumerate(content_items):
requests.append({
"custom_id": f"mod_{i:06d}",
"params": {
"model": "claude-haiku-4-20250514",
"max_tokens": 512,
"system": "You are a content moderation classifier. Classify using the tool.",
"tools": [MODERATION_TOOL],
"tool_choice": {"type": "tool", "name": "classify_content"},
"messages": [{"role": "user", "content": f"Classify: {item}"}]
}
})
# Create the batch
batch = client.messages.batches.create(requests=requests)
print(f"Batch created: {batch.id}")
print(f" Items: {len(requests)}")
print(f" Status: {batch.processing_status}")
print(f" Created: {datetime.now(timezone.utc).isoformat()}")
return batch.id
def check_batch_results(batch_id: str) -> list:
"""Check batch status and retrieve results when complete."""
batch = client.messages.batches.retrieve(batch_id)
print(f"Batch {batch_id}: {batch.processing_status}")
print(f" Succeeded: {batch.request_counts.succeeded}")
print(f" Failed: {batch.request_counts.errored}")
print(f" Processing: {batch.request_counts.processing}")
if batch.processing_status != "ended":
return []
# Stream results
results = []
for result in client.messages.batches.results(batch_id):
if result.result.type == "succeeded":
message = result.result.message
for block in message.content:
if block.type == "tool_use":
results.append({
"custom_id": result.custom_id,
"classification": block.input
})
return results
# Example: Submit batch of reported content
reported_content = [
"Check out my amazing weight loss supplement! Buy now at scamsite.com",
"I think the new policy changes are concerning and here's why...",
"You're all pathetic losers who deserve to suffer",
"Here's my recipe for chocolate cake: mix flour, sugar, eggs...",
"BREAKING NEWS: Celebrity secretly a lizard person (obvious satire)",
]
batch_id = create_moderation_batch(reported_content)
print(f"\nBatch submitted. Poll with: check_batch_results('{batch_id}')")
print("Results available within 24 hours at 50% cost reduction.")
Next in the SDK Track
In Part 20: Guardrails — Hallucinations & Consistency, we’ll cover CCA Phase 4.1 and 4.2 — reducing hallucinations with source grounding, nullable schema fields, temperature control for determinism, and eval-driven consistency.