1. CrewAI Built-in Tracing
CrewAI includes native tracing that captures every agent decision, tool call, and LLM interaction. Enable it with a single CLI command to get production-grade visibility into your crew executions.
1.1 Enabling Tracing with CrewAI AMP
The CrewAI AMP (Agent Monitoring Platform) provides a hosted dashboard for viewing traces, analyzing performance, and debugging agent behavior:
# Authenticate with CrewAI platform
crewai login
# Traces are automatically sent after authentication
# No code changes needed — just run your crew normally
crewai run
Once authenticated, all crew executions automatically send traces to the AMP dashboard. You can view:
- Agent decision trees and reasoning chains
- Tool invocation sequences with inputs/outputs
- LLM call details (prompts, responses, token usage)
- Task delegation patterns and handoffs
- Execution timelines with duration breakdowns
1.2 Viewing Traces Programmatically
from crewai import Agent, Task, Crew, Process
import os
# Ensure you're logged in (crewai login)
# Traces are sent automatically — no additional config needed
research_agent = Agent(
role="Research Analyst",
goal="Produce thorough research on any topic",
backstory="Senior analyst with expertise in data gathering.",
llm="gpt-4o",
verbose=True # Verbose mode shows trace info in console too
)
research_task = Task(
description="Research the current state of quantum computing in 2026.",
expected_output="Detailed research report with key developments.",
agent=research_agent
)
crew = Crew(
agents=[research_agent],
tasks=[research_task],
process=Process.sequential,
verbose=True
)
# Run — traces automatically captured and sent to AMP
result = crew.kickoff()
print(f"Result: {result.raw[:200]}...")
print("\nView traces at: https://app.crewai.com/traces")
2. OpenTelemetry Integrations
CrewAI supports OpenTelemetry (OTel) for exporting traces to any OTel-compatible backend. This enables integration with enterprise observability stacks.
2.1 Langfuse Integration
Langfuse is an open-source LLM engineering platform for tracing, prompt management, and evaluation. It provides rich visualization of agent workflows:
import os
from crewai import Agent, Task, Crew, Process
from langfuse import Langfuse
from langfuse.openai import openai # Patches OpenAI client
# Configure Langfuse credentials
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
# Initialize Langfuse client
langfuse = Langfuse()
# Create a trace for the crew execution
trace = langfuse.trace(
name="research-crew-execution",
metadata={"crew_type": "research", "version": "1.0"}
)
analyst = Agent(
role="Market Analyst",
goal="Analyze market trends and provide insights",
backstory="Expert financial analyst with focus on tech sector.",
llm="gpt-4o",
verbose=True
)
analysis_task = Task(
description="Analyze the AI infrastructure market for Q2 2026.",
expected_output="Market analysis with growth projections.",
agent=analyst
)
crew = Crew(
agents=[analyst],
tasks=[analysis_task],
process=Process.sequential
)
result = crew.kickoff()
# Log result to Langfuse
trace.update(output=result.raw[:500])
langfuse.flush()
print(f"Result: {result.raw[:200]}...")
print("View in Langfuse: https://cloud.langfuse.com/traces")
2.2 Arize Phoenix & OpenLIT
Arize Phoenix provides AI observability with automatic instrumentation for CrewAI:
import os
import phoenix as px
from phoenix.otel import register
from openinference.instrumentation.crewai import CrewAIInstrumentor
# Launch Phoenix (local or cloud)
px.launch_app()
# Register OpenTelemetry tracer
tracer_provider = register(project_name="crewai-research")
# Instrument CrewAI — auto-captures all agent activity
CrewAIInstrumentor().instrument(tracer_provider=tracer_provider)
from crewai import Agent, Task, Crew, Process
# All crew executions are now automatically traced
agent = Agent(
role="Data Scientist",
goal="Build predictive models",
backstory="ML expert with production experience.",
llm="gpt-4o"
)
task = Task(
description="Design a churn prediction model architecture.",
expected_output="Model architecture document with feature engineering plan.",
agent=agent
)
crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff()
print(f"Result: {result.raw[:200]}...")
print("View traces in Phoenix UI: http://localhost:6006")
OpenLIT provides OTel-native monitoring with a single-line setup:
import openlit
from crewai import Agent, Task, Crew, Process
# One-line initialization — auto-instruments CrewAI
openlit.init(otlp_endpoint="http://localhost:4318")
agent = Agent(
role="Code Reviewer",
goal="Review code for bugs and best practices",
backstory="Senior engineer specializing in code quality.",
llm="gpt-4o"
)
task = Task(
description="Review this Python function for security issues: {code}",
expected_output="Security review with findings and recommendations.",
agent=agent
)
crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff(inputs={"code": "def login(user, pwd): ..."})
print(result.raw)
3. Platform Integrations
3.1 Datadog Integration
Datadog provides enterprise-grade APM with AI-specific dashboards for monitoring agent performance in production:
import os
from crewai import Agent, Task, Crew, Process
# Datadog integration via environment variables
os.environ["DD_API_KEY"] = "your-datadog-api-key"
os.environ["DD_SITE"] = "datadoghq.com"
os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "crewai-production"
os.environ["DD_LLMOBS_AGENTLESS_ENABLED"] = "1"
# Enable Datadog LLM Observability
from ddtrace.llmobs import LLMObs
LLMObs.enable(
ml_app="crewai-production",
integrations_enabled=True,
agentless_enabled=True
)
# All CrewAI operations are now traced in Datadog
support_agent = Agent(
role="Customer Support Agent",
goal="Resolve customer issues efficiently",
backstory="Experienced support specialist.",
llm="gpt-4o"
)
support_task = Task(
description="Handle customer complaint: {complaint}",
expected_output="Resolution with follow-up actions.",
agent=support_agent
)
crew = Crew(
agents=[support_agent],
tasks=[support_task],
process=Process.sequential
)
result = crew.kickoff(inputs={"complaint": "Order delayed by 2 weeks"})
print(f"Resolution: {result.raw}")
print("View in Datadog: LLM Observability dashboard")
3.2 MLflow Integration
MLflow tracks ML experiments, model versions, and now LLM agent traces:
import mlflow
from crewai import Agent, Task, Crew, Process
# Enable MLflow CrewAI autologging
mlflow.crewai.autolog()
# Set experiment for organized tracking
mlflow.set_experiment("crewai-experiments")
with mlflow.start_run(run_name="research-crew-v2"):
researcher = Agent(
role="Research Scientist",
goal="Conduct thorough literature reviews",
backstory="PhD researcher with publication experience.",
llm="gpt-4o"
)
review_task = Task(
description="Review recent papers on transformer architectures.",
expected_output="Literature review summary with key findings.",
agent=researcher
)
crew = Crew(
agents=[researcher],
tasks=[review_task],
process=Process.sequential
)
result = crew.kickoff()
# MLflow auto-logs: traces, token usage, latencies, agent configs
mlflow.log_param("model", "gpt-4o")
mlflow.log_metric("output_length", len(result.raw))
print(f"Result: {result.raw[:200]}...")
print(f"MLflow run: {mlflow.active_run().info.run_id}")
Debugging a Silent Failure
A team’s research crew was producing incomplete reports. Tracing revealed the issue: the web search tool was rate-limited and returning empty results 30% of the time, but the agent was silently continuing without the data. Solution: added a tool hook that retries on empty results and alerts if retries exceed 3. Observability turned a mysterious quality issue into a 5-minute fix.
4. Advanced Observability
4.1 Portkey & Weave
Portkey acts as an AI gateway with built-in observability, caching, and fallback routing:
import os
from crewai import Agent, Task, Crew, Process
# Portkey as AI gateway — provides observability + reliability
os.environ["PORTKEY_API_KEY"] = "your-portkey-key"
# Configure CrewAI to route through Portkey
agent = Agent(
role="Content Strategist",
goal="Develop content strategies",
backstory="Marketing expert with data-driven approach.",
llm="gpt-4o",
llm_config={
"base_url": "https://api.portkey.ai/v1",
"default_headers": {
"x-portkey-api-key": os.environ["PORTKEY_API_KEY"],
"x-portkey-provider": "openai",
"x-portkey-trace-id": "crewai-content-strategy"
}
}
)
task = Task(
description="Create a Q3 content calendar for {brand}.",
expected_output="Monthly content calendar with themes and channels.",
agent=agent
)
crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff(inputs={"brand": "TechCorp"})
print(result.raw)
Weave (Weights & Biases) provides experiment tracking tailored for AI agents:
import weave
from crewai import Agent, Task, Crew, Process
# Initialize Weave project
weave.init("crewai-monitoring")
@weave.op()
def run_research_crew(topic: str) -> str:
"""Tracked crew execution with Weave."""
researcher = Agent(
role="Senior Researcher",
goal=f"Research {topic} comprehensively",
backstory="Expert researcher with broad knowledge.",
llm="gpt-4o"
)
task = Task(
description=f"Produce a detailed research brief on: {topic}",
expected_output="Research brief with key findings and sources.",
agent=researcher
)
crew = Crew(agents=[researcher], tasks=[task], process=Process.sequential)
result = crew.kickoff()
return result.raw
# Weave automatically tracks inputs, outputs, and execution metadata
output = run_research_crew("edge computing trends 2026")
print(f"Research output: {output[:200]}...")
print("View in W&B Weave dashboard")
5. Telemetry & Performance
5.1 CrewAI Telemetry System
CrewAI collects anonymous telemetry to improve the framework. You can opt in or out, and configure what data is shared:
import os
from crewai import Agent, Task, Crew, Process
# Opt out of telemetry (set before importing crewai in production)
os.environ["CREWAI_TELEMETRY_OPT_OUT"] = "true"
# Or opt in with custom configuration
os.environ["CREWAI_TELEMETRY_OPT_OUT"] = "false"
# What telemetry collects (when opted in):
# - Framework version
# - Python version
# - Number of agents/tasks
# - Process type used
# - LLM provider (not keys or prompts)
# - Execution duration
# - Error types (not messages)
print(f"Telemetry opt-out: {os.environ.get('CREWAI_TELEMETRY_OPT_OUT', 'false')}")
Build a custom cost tracker for production budget management:
from crewai import Agent, Task, Crew, Process
from dataclasses import dataclass, field
from typing import Dict, List
import time
@dataclass
class CostMetrics:
"""Track execution costs and performance metrics."""
total_tokens: int = 0
total_cost: float = 0.0
execution_time: float = 0.0
task_metrics: List[Dict] = field(default_factory=list)
# Pricing per 1M tokens (GPT-4o)
INPUT_COST_PER_M: float = 2.50
OUTPUT_COST_PER_M: float = 10.00
def add_task_result(self, task_name: str, input_tokens: int, output_tokens: int, duration: float):
input_cost = (input_tokens / 1_000_000) * self.INPUT_COST_PER_M
output_cost = (output_tokens / 1_000_000) * self.OUTPUT_COST_PER_M
task_cost = input_cost + output_cost
self.total_tokens += input_tokens + output_tokens
self.total_cost += task_cost
self.execution_time += duration
self.task_metrics.append({
"task": task_name,
"tokens": input_tokens + output_tokens,
"cost": f"${task_cost:.4f}",
"duration": f"{duration:.2f}s"
})
def summary(self) -> str:
lines = ["Cost Summary", "=" * 40]
for m in self.task_metrics:
lines.append(f" {m['task']}: {m['tokens']} tokens, {m['cost']}, {m['duration']}")
lines.append(f"\n TOTAL: {self.total_tokens} tokens, ${self.total_cost:.4f}, {self.execution_time:.2f}s")
return "\n".join(lines)
# Usage
metrics = CostMetrics()
metrics.add_task_result("Research", input_tokens=2500, output_tokens=800, duration=3.2)
metrics.add_task_result("Writing", input_tokens=3200, output_tokens=1500, duration=5.1)
metrics.add_task_result("Review", input_tokens=1800, output_tokens=400, duration=2.0)
print(metrics.summary())
cache=True on crews to avoid redundant LLM calls. (2) Set max_rpm to prevent runaway costs. (3) Use cheaper models (GPT-4o-mini) for simple tasks and expensive models (GPT-4o) only for complex reasoning. (4) Monitor token usage per agent to identify inefficient prompts.
Next in the CrewAI SDK Track
In Part 14: Migration, Evaluation & Production, we’ll migrate from LangGraph to CrewAI, evaluate use cases strategically, select optimal LLMs, handle version upgrades, and deploy production systems without LiteLLM dependency.