CrewAI SDK Track Part 13: Observability & Telemetry

                        
                        What You’ll Learn: When your crew makes unexpected decisions, how do you debug it? Observability gives you the answer: traces show every agent thought, tool call, and delegation. CrewAI integrates with 15+ platforms (Langfuse, Datadog, Arize, MLflow) so you can choose your preferred monitoring stack. Think of it like the browser DevTools Network tab — you see every request and response flowing through the system.
                    

1. CrewAI Built-in Tracing

CrewAI includes native tracing that captures every agent decision, tool call, and LLM interaction. Enable it with a single CLI command to get production-grade visibility into your crew executions.

1.1 Enabling Tracing with CrewAI AMP

The CrewAI AMP (Agent Monitoring Platform) provides a hosted dashboard for viewing traces, analyzing performance, and debugging agent behavior:

# Authenticate with CrewAI platform
crewai login

# Traces are automatically sent after authentication
# No code changes needed — just run your crew normally
crewai run

Once authenticated, all crew executions automatically send traces to the AMP dashboard. You can view:

Agent decision trees and reasoning chains
Tool invocation sequences with inputs/outputs
LLM call details (prompts, responses, token usage)
Task delegation patterns and handoffs
Execution timelines with duration breakdowns

1.2 Viewing Traces Programmatically

from crewai import Agent, Task, Crew, Process
import os

# Ensure you're logged in (crewai login)
# Traces are sent automatically — no additional config needed

research_agent = Agent(
    role="Research Analyst",
    goal="Produce thorough research on any topic",
    backstory="Senior analyst with expertise in data gathering.",
    llm="gpt-4o",
    verbose=True  # Verbose mode shows trace info in console too
)

research_task = Task(
    description="Research the current state of quantum computing in 2026.",
    expected_output="Detailed research report with key developments.",
    agent=research_agent
)

crew = Crew(
    agents=[research_agent],
    tasks=[research_task],
    process=Process.sequential,
    verbose=True
)

# Run — traces automatically captured and sent to AMP
result = crew.kickoff()
print(f"Result: {result.raw[:200]}...")
print("\nView traces at: https://app.crewai.com/traces")

                        
                        Trace Contents: Each trace includes the full execution graph: which agent handled which task, what tools were called (with arguments), LLM prompts and completions, token counts, latencies, and any errors. Use this to identify slow steps, unnecessary tool calls, or prompt inefficiencies.
                    

2. OpenTelemetry Integrations

CrewAI supports OpenTelemetry (OTel) for exporting traces to any OTel-compatible backend. This enables integration with enterprise observability stacks.

2.1 Langfuse Integration

Langfuse is an open-source LLM engineering platform for tracing, prompt management, and evaluation. It provides rich visualization of agent workflows:

import os
from crewai import Agent, Task, Crew, Process
from langfuse import Langfuse
from langfuse.openai import openai  # Patches OpenAI client

# Configure Langfuse credentials
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"

# Initialize Langfuse client
langfuse = Langfuse()

# Create a trace for the crew execution
trace = langfuse.trace(
    name="research-crew-execution",
    metadata={"crew_type": "research", "version": "1.0"}
)

analyst = Agent(
    role="Market Analyst",
    goal="Analyze market trends and provide insights",
    backstory="Expert financial analyst with focus on tech sector.",
    llm="gpt-4o",
    verbose=True
)

analysis_task = Task(
    description="Analyze the AI infrastructure market for Q2 2026.",
    expected_output="Market analysis with growth projections.",
    agent=analyst
)

crew = Crew(
    agents=[analyst],
    tasks=[analysis_task],
    process=Process.sequential
)

result = crew.kickoff()

# Log result to Langfuse
trace.update(output=result.raw[:500])
langfuse.flush()

print(f"Result: {result.raw[:200]}...")
print("View in Langfuse: https://cloud.langfuse.com/traces")

2.2 Arize Phoenix & OpenLIT

Arize Phoenix provides AI observability with automatic instrumentation for CrewAI:

import os
import phoenix as px
from phoenix.otel import register
from openinference.instrumentation.crewai import CrewAIInstrumentor

# Launch Phoenix (local or cloud)
px.launch_app()

# Register OpenTelemetry tracer
tracer_provider = register(project_name="crewai-research")

# Instrument CrewAI — auto-captures all agent activity
CrewAIInstrumentor().instrument(tracer_provider=tracer_provider)

from crewai import Agent, Task, Crew, Process

# All crew executions are now automatically traced
agent = Agent(
    role="Data Scientist",
    goal="Build predictive models",
    backstory="ML expert with production experience.",
    llm="gpt-4o"
)

task = Task(
    description="Design a churn prediction model architecture.",
    expected_output="Model architecture document with feature engineering plan.",
    agent=agent
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff()

print(f"Result: {result.raw[:200]}...")
print("View traces in Phoenix UI: http://localhost:6006")

OpenLIT provides OTel-native monitoring with a single-line setup:

import openlit
from crewai import Agent, Task, Crew, Process

# One-line initialization — auto-instruments CrewAI
openlit.init(otlp_endpoint="http://localhost:4318")

agent = Agent(
    role="Code Reviewer",
    goal="Review code for bugs and best practices",
    backstory="Senior engineer specializing in code quality.",
    llm="gpt-4o"
)

task = Task(
    description="Review this Python function for security issues: {code}",
    expected_output="Security review with findings and recommendations.",
    agent=agent
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff(inputs={"code": "def login(user, pwd): ..."})
print(result.raw)

3. Platform Integrations

3.1 Datadog Integration

Datadog provides enterprise-grade APM with AI-specific dashboards for monitoring agent performance in production:

import os
from crewai import Agent, Task, Crew, Process

# Datadog integration via environment variables
os.environ["DD_API_KEY"] = "your-datadog-api-key"
os.environ["DD_SITE"] = "datadoghq.com"
os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "crewai-production"
os.environ["DD_LLMOBS_AGENTLESS_ENABLED"] = "1"

# Enable Datadog LLM Observability
from ddtrace.llmobs import LLMObs
LLMObs.enable(
    ml_app="crewai-production",
    integrations_enabled=True,
    agentless_enabled=True
)

# All CrewAI operations are now traced in Datadog
support_agent = Agent(
    role="Customer Support Agent",
    goal="Resolve customer issues efficiently",
    backstory="Experienced support specialist.",
    llm="gpt-4o"
)

support_task = Task(
    description="Handle customer complaint: {complaint}",
    expected_output="Resolution with follow-up actions.",
    agent=support_agent
)

crew = Crew(
    agents=[support_agent],
    tasks=[support_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"complaint": "Order delayed by 2 weeks"})
print(f"Resolution: {result.raw}")
print("View in Datadog: LLM Observability dashboard")

3.2 MLflow Integration

MLflow tracks ML experiments, model versions, and now LLM agent traces:

import mlflow
from crewai import Agent, Task, Crew, Process

# Enable MLflow CrewAI autologging
mlflow.crewai.autolog()

# Set experiment for organized tracking
mlflow.set_experiment("crewai-experiments")

with mlflow.start_run(run_name="research-crew-v2"):
    researcher = Agent(
        role="Research Scientist",
        goal="Conduct thorough literature reviews",
        backstory="PhD researcher with publication experience.",
        llm="gpt-4o"
    )

    review_task = Task(
        description="Review recent papers on transformer architectures.",
        expected_output="Literature review summary with key findings.",
        agent=researcher
    )

    crew = Crew(
        agents=[researcher],
        tasks=[review_task],
        process=Process.sequential
    )

    result = crew.kickoff()

    # MLflow auto-logs: traces, token usage, latencies, agent configs
    mlflow.log_param("model", "gpt-4o")
    mlflow.log_metric("output_length", len(result.raw))

    print(f"Result: {result.raw[:200]}...")
    print(f"MLflow run: {mlflow.active_run().info.run_id}")

Real-World Application

Debugging a Silent Failure

A team’s research crew was producing incomplete reports. Tracing revealed the issue: the web search tool was rate-limited and returning empty results 30% of the time, but the agent was silently continuing without the data. Solution: added a tool hook that retries on empty results and alerts if retries exceed 3. Observability turned a mysterious quality issue into a 5-minute fix.

TracingDebugging

4. Advanced Observability

4.1 Portkey & Weave

Portkey acts as an AI gateway with built-in observability, caching, and fallback routing:

import os
from crewai import Agent, Task, Crew, Process

# Portkey as AI gateway — provides observability + reliability
os.environ["PORTKEY_API_KEY"] = "your-portkey-key"

# Configure CrewAI to route through Portkey
agent = Agent(
    role="Content Strategist",
    goal="Develop content strategies",
    backstory="Marketing expert with data-driven approach.",
    llm="gpt-4o",
    llm_config={
        "base_url": "https://api.portkey.ai/v1",
        "default_headers": {
            "x-portkey-api-key": os.environ["PORTKEY_API_KEY"],
            "x-portkey-provider": "openai",
            "x-portkey-trace-id": "crewai-content-strategy"
        }
    }
)

task = Task(
    description="Create a Q3 content calendar for {brand}.",
    expected_output="Monthly content calendar with themes and channels.",
    agent=agent
)

crew = Crew(agents=[agent], tasks=[task], process=Process.sequential)
result = crew.kickoff(inputs={"brand": "TechCorp"})
print(result.raw)

Weave (Weights & Biases) provides experiment tracking tailored for AI agents:

import weave
from crewai import Agent, Task, Crew, Process

# Initialize Weave project
weave.init("crewai-monitoring")

@weave.op()
def run_research_crew(topic: str) -> str:
    """Tracked crew execution with Weave."""
    researcher = Agent(
        role="Senior Researcher",
        goal=f"Research {topic} comprehensively",
        backstory="Expert researcher with broad knowledge.",
        llm="gpt-4o"
    )

    task = Task(
        description=f"Produce a detailed research brief on: {topic}",
        expected_output="Research brief with key findings and sources.",
        agent=researcher
    )

    crew = Crew(agents=[researcher], tasks=[task], process=Process.sequential)
    result = crew.kickoff()
    return result.raw

# Weave automatically tracks inputs, outputs, and execution metadata
output = run_research_crew("edge computing trends 2026")
print(f"Research output: {output[:200]}...")
print("View in W&B Weave dashboard")

                        
                        Platform Comparison: Use Langfuse for open-source self-hosted tracing. Datadog for enterprise APM integration. Phoenix for local development debugging. Portkey for AI gateway + observability. MLflow for experiment tracking. Weave for W&B ecosystem integration.
                    

5. Telemetry & Performance

5.1 CrewAI Telemetry System

CrewAI collects anonymous telemetry to improve the framework. You can opt in or out, and configure what data is shared:

import os
from crewai import Agent, Task, Crew, Process

# Opt out of telemetry (set before importing crewai in production)
os.environ["CREWAI_TELEMETRY_OPT_OUT"] = "true"

# Or opt in with custom configuration
os.environ["CREWAI_TELEMETRY_OPT_OUT"] = "false"

# What telemetry collects (when opted in):
# - Framework version
# - Python version
# - Number of agents/tasks
# - Process type used
# - LLM provider (not keys or prompts)
# - Execution duration
# - Error types (not messages)

print(f"Telemetry opt-out: {os.environ.get('CREWAI_TELEMETRY_OPT_OUT', 'false')}")

Build a custom cost tracker for production budget management:

from crewai import Agent, Task, Crew, Process
from dataclasses import dataclass, field
from typing import Dict, List
import time

@dataclass
class CostMetrics:
    """Track execution costs and performance metrics."""
    total_tokens: int = 0
    total_cost: float = 0.0
    execution_time: float = 0.0
    task_metrics: List[Dict] = field(default_factory=list)

    # Pricing per 1M tokens (GPT-4o)
    INPUT_COST_PER_M: float = 2.50
    OUTPUT_COST_PER_M: float = 10.00

    def add_task_result(self, task_name: str, input_tokens: int, output_tokens: int, duration: float):
        input_cost = (input_tokens / 1_000_000) * self.INPUT_COST_PER_M
        output_cost = (output_tokens / 1_000_000) * self.OUTPUT_COST_PER_M
        task_cost = input_cost + output_cost

        self.total_tokens += input_tokens + output_tokens
        self.total_cost += task_cost
        self.execution_time += duration

        self.task_metrics.append({
            "task": task_name,
            "tokens": input_tokens + output_tokens,
            "cost": f"${task_cost:.4f}",
            "duration": f"{duration:.2f}s"
        })

    def summary(self) -> str:
        lines = ["Cost Summary", "=" * 40]
        for m in self.task_metrics:
            lines.append(f"  {m['task']}: {m['tokens']} tokens, {m['cost']}, {m['duration']}")
        lines.append(f"\n  TOTAL: {self.total_tokens} tokens, ${self.total_cost:.4f}, {self.execution_time:.2f}s")
        return "\n".join(lines)

# Usage
metrics = CostMetrics()
metrics.add_task_result("Research", input_tokens=2500, output_tokens=800, duration=3.2)
metrics.add_task_result("Writing", input_tokens=3200, output_tokens=1500, duration=5.1)
metrics.add_task_result("Review", input_tokens=1800, output_tokens=400, duration=2.0)
print(metrics.summary())

                        
                        Cost Optimization Tips: (1) Use cache=True on crews to avoid redundant LLM calls. (2) Set max_rpm to prevent runaway costs. (3) Use cheaper models (GPT-4o-mini) for simple tasks and expensive models (GPT-4o) only for complex reasoning. (4) Monitor token usage per agent to identify inefficient prompts.
                    

                        
                        Try It Yourself: Set up observability for a crew: (1) enable CrewAI’s built-in tracing, (2) integrate with Langfuse (free tier), (3) run a complex 3-agent crew, (4) in the Langfuse dashboard, identify: the most expensive agent, the slowest tool call, and any failed attempts. Write a summary of optimization opportunities based on the traces.
                    

Next in the CrewAI SDK Track

In Part 14: Migration, Evaluation & Production, we’ll migrate from LangGraph to CrewAI, evaluate use cases strategically, select optimal LLMs, handle version upgrades, and deploy production systems without LiteLLM dependency.