Reducing Hallucinations
Hallucinations — confident outputs that aren’t grounded in provided context — are the primary reliability concern in agentic systems. Unlike creative writing where imagination is valued, production agents need factual accuracy and the ability to express uncertainty when information is unavailable.
What Causes Hallucinations
In agentic contexts, hallucinations typically emerge from three sources:
- Missing context: The model is asked about information not present in provided documents, so it fills gaps with plausible-sounding content
- Overly broad questions: Vague prompts give the model latitude to speculate rather than cite specific evidence
- Forced output structure: When all fields in a schema are required, the model invents values rather than leaving blanks
Grounding Techniques
The most effective anti-hallucination strategy is grounding — constraining the model to only use information from explicit sources. Key patterns include:
- Explicit source attribution: Require the model to cite which document/paragraph supports each claim
- “I don’t know” signals: Teach the model to express uncertainty with phrases like “Based on the provided documents, I cannot determine…”
- Retrieval-augmented patterns: Provide source documents alongside the query so the model has concrete material to reference
- Nullable fields: Allow
nullin your schema when information genuinely isn’t available
Nullable Schema Fields
One of the highest-impact techniques for reducing hallucinations is making schema fields nullable. When a tool requires all fields to be populated, the model will fabricate data rather than admit it doesn’t have the information:
null for fields that may not be present in the source document, and add a source_citation field requiring the model to quote the relevant passage.
import anthropic
import json
client = anthropic.Anthropic()
# Tool with nullable fields + source citation requirement
extraction_tool = {
"name": "extract_company_info",
"description": (
"Extract company information from the provided document. "
"Use null for any field where the information is NOT explicitly "
"stated in the document. Never guess or infer values that aren't "
"directly supported by the text."
),
"input_schema": {
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "Company name as stated in the document"
},
"founding_year": {
"type": ["integer", "null"],
"description": "Year founded. null if not mentioned."
},
"revenue": {
"type": ["string", "null"],
"description": "Revenue figure. null if not mentioned."
},
"employee_count": {
"type": ["integer", "null"],
"description": "Number of employees. null if not mentioned."
},
"headquarters": {
"type": ["string", "null"],
"description": "HQ location. null if not mentioned."
},
"source_citations": {
"type": "array",
"items": {"type": "string"},
"description": (
"Direct quotes from the document supporting each "
"non-null field. One citation per extracted value."
)
}
},
"required": [
"company_name", "founding_year", "revenue",
"employee_count", "headquarters", "source_citations"
]
}
}
# Document with partial information
document = """
Acme Corp announced Q3 results yesterday. The San Francisco-based company
reported strong growth in its cloud division. CEO Jane Smith noted that
the team of 2,400 employees delivered exceptional results this quarter.
"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[extraction_tool],
messages=[{
"role": "user",
"content": (
f"Extract company information from this document. "
f"Use null for anything not explicitly stated.\n\n"
f"Document:\n{document}"
)
}]
)
# Parse the tool use result
for block in response.content:
if block.type == "tool_use":
result = block.input
print(json.dumps(result, indent=2))
# Expected output:
# {
# "company_name": "Acme Corp",
# "founding_year": null, <-- Not mentioned, correctly null
# "revenue": null, <-- Not mentioned, correctly null
# "employee_count": 2400,
# "headquarters": "San Francisco",
# "source_citations": [
# "The San Francisco-based company",
# "the team of 2,400 employees"
# ]
# }
Source Attribution & Verification
Retrieval-Augmented Pattern
The retrieval-augmented generation (RAG) pattern provides source documents alongside queries, requiring the model to ground every claim in specific passages. The key insight is that Claude performs significantly better when told exactly which documents to reference:
- Provide source documents with clear identifiers (Doc 1, Doc 2, etc.)
- Require inline citations for every factual claim
- Add self-correction prompts: “If you’re unsure about any fact, say so explicitly rather than guessing”
Verification Loop
For high-stakes applications, implement a verification loop that extracts individual claims from the model’s output and checks each against the source material:
flowchart TD
A[User Query] --> B[Retrieve Source Documents]
B --> C[Extract with Citations]
C --> D[Parse Individual Claims]
D --> E{Each Claim}
E --> F[Find Supporting Quote]
F --> G{Quote Found?}
G -->|Yes| H[Mark Verified ✓]
G -->|No| I[Mark Unverified ✗]
H --> J[Compile Verified Output]
I --> J
J --> K[Return with Confidence Scores]
Fact-Checking Agent
This pattern uses a two-pass approach: first extract claims with citations, then verify each citation actually supports the claim:
import anthropic
import json
client = anthropic.Anthropic()
def extract_claims_with_citations(text: str, sources: str) -> dict:
"""First pass: extract factual claims with source citations."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=(
"You are a precise fact extractor. For each factual claim in your "
"response, cite the exact source passage that supports it. "
"Format: {'claims': [{'claim': '...', 'citation': '...', "
"'source_id': '...'}]}. If you cannot find a supporting passage "
"for a claim, set citation to null and source_id to null."
),
tools=[{
"name": "submit_claims",
"description": "Submit extracted claims with their source citations",
"input_schema": {
"type": "object",
"properties": {
"claims": {
"type": "array",
"items": {
"type": "object",
"properties": {
"claim": {"type": "string"},
"citation": {"type": ["string", "null"]},
"source_id": {"type": ["string", "null"]}
},
"required": ["claim", "citation", "source_id"]
}
}
},
"required": ["claims"]
}
}],
messages=[{
"role": "user",
"content": (
f"Extract all factual claims from the following question's "
f"answer, citing sources.\n\nQuestion: {text}\n\n"
f"Sources:\n{sources}"
)
}]
)
for block in response.content:
if block.type == "tool_use":
return block.input
return {"claims": []}
def verify_claim(claim: str, citation: str, source_text: str) -> dict:
"""Second pass: verify each citation actually supports the claim."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
messages=[{
"role": "user",
"content": (
f"Does this citation support the claim?\n\n"
f"Claim: {claim}\n"
f"Citation: {citation}\n"
f"Full source: {source_text}\n\n"
f"Respond with ONLY 'supported', 'partially_supported', "
f"or 'not_supported' followed by a brief explanation."
)
}]
)
text = response.content[0].text.strip()
verdict = text.split()[0].lower().rstrip(",.")
return {"verdict": verdict, "explanation": text}
# Example usage
sources = """
[Doc A] TechStart Inc was founded in 2019 in Austin, Texas. The company
develops AI-powered analytics tools for enterprise customers.
[Doc B] In their 2025 annual report, TechStart reported revenue of $45M,
up 60% year-over-year. The company now employs 350 people.
"""
question = "Tell me about TechStart Inc"
# Step 1: Extract claims with citations
result = extract_claims_with_citations(question, sources)
print("Extracted claims:")
for c in result["claims"]:
print(f" - {c['claim']}")
print(f" Citation: {c['citation']}")
print(f" Source: {c['source_id']}")
print()
# Step 2: Verify each cited claim
print("Verification results:")
for c in result["claims"]:
if c["citation"]:
verification = verify_claim(c["claim"], c["citation"], sources)
print(f" {c['claim']}")
print(f" Verdict: {verification['verdict']}")
print()
Increasing Output Consistency
Temperature & top_p Effects
Temperature and top_p directly control output variability. For production agents that need deterministic results, understanding these parameters is critical:
- temperature=0: Nearly deterministic (not perfectly so due to floating-point and batching), best for factual extraction
- temperature=0.3–0.5: Low creativity, still reproducible for most structured tasks
- temperature=1.0: Default, balanced creativity and consistency
- top_p: Alternative to temperature — restricts token sampling to the most probable subset
temperature=0. For tasks requiring tool_use, temperature has less impact because the JSON schema constrains the output format. The schema itself acts as a consistency mechanism regardless of temperature.
tool_use for Structural Consistency
Forcing output through a tool schema is the most reliable way to achieve structural consistency. Unlike free-text responses that can vary in format, tool_use guarantees the output matches your exact JSON schema:
import anthropic
import json
client = anthropic.Anthropic()
# Demonstrate temperature effects on free text vs tool_use
# Approach 1: Free text with temperature=1.0 (inconsistent format)
free_text_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
temperature=1.0,
messages=[{
"role": "user",
"content": "Classify this support ticket: 'My payment failed twice today'"
}]
)
print("Free text (temp=1.0):", free_text_response.content[0].text[:100])
# Output varies: sometimes bullet points, sometimes prose, different field orders
# Approach 2: tool_use forces exact structure (consistent regardless of temp)
classification_tool = {
"name": "classify_ticket",
"description": "Classify a support ticket into category and priority",
"input_schema": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["billing", "technical", "account", "feature_request"]
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "critical"]
},
"confidence": {
"type": "number",
"description": "Confidence score between 0 and 1"
},
"reasoning": {
"type": "string",
"description": "Brief explanation for the classification"
}
},
"required": ["category", "priority", "confidence", "reasoning"]
}
}
tool_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
temperature=1.0, # Even with high temperature, structure is enforced
tool_choice={"type": "tool", "name": "classify_ticket"},
tools=[classification_tool],
messages=[{
"role": "user",
"content": "Classify this support ticket: 'My payment failed twice today'"
}]
)
for block in tool_response.content:
if block.type == "tool_use":
print("Tool use (temp=1.0):", json.dumps(block.input, indent=2))
# Always returns exact schema: category, priority, confidence, reasoning
# Structure is guaranteed even with temperature=1.0
Few-Shot Anchoring
Providing examples in the system prompt establishes format expectations that reduce output variance. The model mimics the structure, tone, and level of detail shown in examples:
- 2–3 examples are sufficient for most formatting patterns
- Include both typical cases and edge cases in examples
- Show the exact output format you expect (JSON, bullet points, etc.)
- Combine with tool_use for maximum consistency
Eval-Driven Consistency
Measuring Consistency
The most rigorous approach to consistency is empirical: run the same prompt multiple times and measure variance in outputs. This technique reveals which aspects of your prompt or configuration produce unstable results:
- Run N=5–10 times with identical inputs
- Measure field-level agreement: Do specific fields always return the same value?
- Explicit criteria reduce variance more than vague instructions (“classify as high/medium/low based on financial impact > $1000” vs “classify by priority”)
- Build golden datasets — curated examples with known-correct outputs that anchor expectations
Consistency Evaluator
import anthropic
import json
from collections import Counter
client = anthropic.Anthropic()
classification_tool = {
"name": "classify_sentiment",
"description": "Classify the sentiment of a product review",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral", "mixed"]
},
"confidence": {
"type": "number",
"description": "Confidence between 0.0 and 1.0"
},
"key_phrases": {
"type": "array",
"items": {"type": "string"},
"description": "Top 3 phrases that determined sentiment"
}
},
"required": ["sentiment", "confidence", "key_phrases"]
}
}
def run_consistency_eval(prompt: str, n_runs: int = 5) -> dict:
"""Run the same classification N times and measure agreement."""
results = []
for i in range(n_runs):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
temperature=0,
tool_choice={"type": "tool", "name": "classify_sentiment"},
tools=[classification_tool],
messages=[{"role": "user", "content": prompt}]
)
for block in response.content:
if block.type == "tool_use":
results.append(block.input)
break
# Measure agreement
sentiments = [r["sentiment"] for r in results]
confidences = [r["confidence"] for r in results]
sentiment_counts = Counter(sentiments)
majority_sentiment = sentiment_counts.most_common(1)[0]
agreement_rate = majority_sentiment[1] / n_runs
confidence_variance = (
max(confidences) - min(confidences) if confidences else 0
)
return {
"n_runs": n_runs,
"majority_sentiment": majority_sentiment[0],
"agreement_rate": agreement_rate,
"sentiment_distribution": dict(sentiment_counts),
"confidence_range": {
"min": min(confidences) if confidences else 0,
"max": max(confidences) if confidences else 0,
"variance": confidence_variance
},
"all_results": results
}
# Test with an ambiguous review (harder to classify consistently)
review = (
"The product works as advertised but shipping took forever. "
"Quality is decent for the price, though I expected better packaging."
)
eval_result = run_consistency_eval(
f"Classify the sentiment of this review:\n\n{review}"
)
print(f"Agreement rate: {eval_result['agreement_rate']:.0%}")
print(f"Majority sentiment: {eval_result['majority_sentiment']}")
print(f"Distribution: {eval_result['sentiment_distribution']}")
print(f"Confidence range: {eval_result['confidence_range']}")
# If agreement_rate < 0.8, consider:
# 1. Adding explicit criteria ("positive = recommends product")
# 2. Lowering temperature further
# 3. Adding few-shot examples for edge cases
# 4. Using tool_use with stricter enum definitions
Healthcare Company: Reducing Hallucination Risk with Nullable Fields and Citations
A healthcare analytics company was extracting patient data from clinical notes using Claude. Their initial implementation frequently hallucinated missing lab values, invented medication dosages, and fabricated dates when the chart was ambiguous or incomplete.
Changes implemented:
- Made all numeric fields nullable (
"type": ["number", "null"]) - Added
source_citationrequiring exact quotes from clinical notes - Included system prompt: “If a value is not explicitly stated in the clinical note, you MUST return null. Never infer or calculate values.”
- Added post-extraction verification checking citations against source text
Result: Internal evals showed a sharp drop in hallucinated extractions after these changes. The null rate increased materially, which was the desired behavior: the system started representing missing data honestly instead of silently inventing values.
Production Guardrails Architecture
Layered Defense
Production systems use multiple overlapping guardrails rather than relying on any single technique. Each layer catches different failure modes:
- Prompt engineering: System prompts with explicit grounding instructions and “I don’t know” permissions
- Structured output: tool_use schemas with nullable fields and enums constraining valid values
- Post-processing validation: Programmatic checks on model output (required fields populated, values in valid ranges, citations present)
- Human review gate: Flag low-confidence or high-stakes outputs for human review before action
flowchart LR
A[User Input] --> B[Input Validation]
B --> C[Prompt Engineering Layer]
C --> D[Model Call with tool_use]
D --> E[Output Schema Validation]
E --> F{Confidence Check}
F -->|High Confidence| G[Post-Processing Rules]
F -->|Low Confidence| H[Human Review Queue]
G --> I{All Checks Pass?}
I -->|Yes| J[Return to User]
I -->|No| H
H --> K[Human Approves/Rejects]
K -->|Approved| J
K -->|Rejected| L[Fallback Response]
PostToolUse Validation Hook
The PostToolUse hook pattern intercepts model outputs after tool calls to validate them programmatically before returning results to the user. This catches hallucinations that slip through prompt-level guardrails:
import anthropic
import json
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class ValidationResult:
"""Result of output validation."""
is_valid: bool
issues: list
confidence: float
def validate_extraction_output(
tool_output: dict,
source_text: str
) -> ValidationResult:
"""
PostToolUse validation hook: checks extraction output for
hallucination signals before returning to user.
"""
issues = []
confidence = 1.0
# Check 1: Required citations present
citations = tool_output.get("source_citations", [])
non_null_fields = sum(
1 for k, v in tool_output.items()
if v is not None and k not in ("source_citations",)
)
if len(citations) < non_null_fields:
issues.append(
f"Missing citations: {non_null_fields} non-null fields "
f"but only {len(citations)} citations"
)
confidence -= 0.3
# Check 2: Verify citations exist in source text
for citation in citations:
# Normalize whitespace for comparison
normalized_citation = " ".join(citation.lower().split())
normalized_source = " ".join(source_text.lower().split())
if normalized_citation not in normalized_source:
issues.append(f"Citation not found in source: '{citation[:50]}...'")
confidence -= 0.2
# Check 3: Numeric values within reasonable ranges
if tool_output.get("employee_count") is not None:
count = tool_output["employee_count"]
if count < 1 or count > 10_000_000:
issues.append(f"Suspicious employee count: {count}")
confidence -= 0.4
if tool_output.get("founding_year") is not None:
year = tool_output["founding_year"]
if year < 1800 or year > 2026:
issues.append(f"Suspicious founding year: {year}")
confidence -= 0.4
# Check 4: Flag all-null responses (model may be too conservative)
null_count = sum(
1 for k, v in tool_output.items()
if v is None and k != "source_citations"
)
total_fields = sum(
1 for k in tool_output if k != "source_citations"
)
if null_count == total_fields:
issues.append("All fields null - model may be overly conservative")
confidence -= 0.1
return ValidationResult(
is_valid=len(issues) == 0,
issues=issues,
confidence=max(0.0, confidence)
)
# Example: validate an extraction result
source = """
Pinnacle Systems was established in 2015 in Boston. The company
specializes in enterprise security solutions and employs 890 staff.
"""
# Simulated model output (imagine this came from a tool_use call)
model_output = {
"company_name": "Pinnacle Systems",
"founding_year": 2015,
"revenue": None,
"employee_count": 890,
"headquarters": "Boston",
"source_citations": [
"Pinnacle Systems was established in 2015 in Boston",
"employs 890 staff"
]
}
result = validate_extraction_output(model_output, source)
print(f"Valid: {result.is_valid}")
print(f"Confidence: {result.confidence:.2f}")
if result.issues:
print(f"Issues: {result.issues}")
else:
print("All validation checks passed!")
When to Use Each Technique
Different guardrail techniques are most effective for different scenarios:
| Technique | Best For | Limitations |
|---|---|---|
| Nullable fields | Data extraction from documents with incomplete info | Model may become overly conservative |
| Source citations | RAG applications requiring traceability | Increases token usage; citations may be imprecise |
| temperature=0 | Classification, extraction, deterministic tasks | Not fully deterministic; reduces creative problem-solving |
| tool_use schemas | Any task requiring structured, consistent output | Adds latency; not ideal for open-ended generation |
| Post-processing validation | High-stakes outputs (financial, medical, legal) | Requires domain-specific rules; adds complexity |
| Human review gate | Critical decisions, novel scenarios, edge cases | Introduces latency; doesn’t scale for all outputs |
Next in the Series
In Part 21: Guardrails — Jailbreaks & Prompt Leak, we’ll cover CCA Phase 4.3 and 4.4 — defending against adversarial prompt injection attacks, preventing system prompt leakage, implementing input sanitization, and building multi-layer jailbreak defenses.