CrewAI SDK Track Part 14: Migration, Evaluation & Production

                        
                        What You’ll Learn: This final article covers the practical concerns of production deployment: migrating existing LangGraph workflows to CrewAI, choosing the right LLM for each agent, upgrading between CrewAI versions safely, and deploying without optional dependencies. It’s your checklist for going from ‘works in Jupyter’ to ‘runs reliably in production.’
                    

1. Migrating from LangGraph

LangGraph and CrewAI solve similar problems (multi-agent orchestration) but with fundamentally different paradigms. LangGraph uses explicit graph construction with nodes and edges; CrewAI uses declarative role-based agents with Flows for complex orchestration.

                        
                        Why Migrate: CrewAI offers cleaner syntax (fewer lines for equivalent logic), built-in persistence and memory, type-safe state management via Pydantic, 75+ pre-built tools, and a unified CLI for project scaffolding. LangGraph requires more boilerplate but offers finer-grained control over execution graphs.
                    

1.1 Pattern Mapping: LangGraph → CrewAI

LangGraph Concept	CrewAI Equivalent	Notes
`StateGraph`	`Flow[State]`	Type-safe Pydantic state
`add_node()`	`@start()` / `@listen()`	Decorator-based, implicit graph
`add_edge()`	`@listen(method)`	Edges are implicit via listeners
`add_conditional_edges()`	`@router(method)`	Returns route name string
`ToolNode`	Agent `tools=[...]`	Tools bound to agents directly
`checkpointer`	Built-in persistence	Automatic with CrewAI Flows
`interrupt_before`	`human_input=True`	Task-level HITL
`compile()`	`flow.kickoff()`	No compile step needed

1.2 Step-by-Step Migration

Here’s a practical example migrating a research workflow from LangGraph to CrewAI:

# ============================================================
# BEFORE: LangGraph implementation
# ============================================================
# from langgraph.graph import StateGraph, END
# from typing import TypedDict
#
# class ResearchState(TypedDict):
#     topic: str
#     research: str
#     draft: str
#     final: str
#
# def research_node(state):
#     # LLM call for research
#     return {"research": "...research results..."}
#
# def write_node(state):
#     # LLM call to write
#     return {"draft": "...draft content..."}
#
# def review_node(state):
#     # LLM call to review
#     return {"final": "...final content..."}
#
# graph = StateGraph(ResearchState)
# graph.add_node("research", research_node)
# graph.add_node("write", write_node)
# graph.add_node("review", review_node)
# graph.add_edge("research", "write")
# graph.add_edge("write", "review")
# graph.add_edge("review", END)
# graph.set_entry_point("research")
# app = graph.compile()
# result = app.invoke({"topic": "AI trends"})

# ============================================================
# AFTER: CrewAI Flows implementation (equivalent)
# ============================================================
from crewai.flow.flow import Flow, start, listen
from pydantic import BaseModel
from crewai import Agent, Task, Crew, Process

class ResearchState(BaseModel):
    topic: str = ""
    research: str = ""
    draft: str = ""
    final: str = ""

class ResearchFlow(Flow[ResearchState]):

    @start()
    def research_step(self):
        """Equivalent to LangGraph's research_node."""
        researcher = Agent(
            role="Researcher",
            goal=f"Research {self.state.topic} thoroughly",
            backstory="Expert researcher.",
            llm="gpt-4o"
        )
        task = Task(
            description=f"Research: {self.state.topic}",
            expected_output="Detailed research notes.",
            agent=researcher
        )
        crew = Crew(agents=[researcher], tasks=[task], process=Process.sequential)
        result = crew.kickoff()
        self.state.research = result.raw
        return self.state.research

    @listen(research_step)
    def write_step(self, research):
        """Equivalent to LangGraph's write_node."""
        writer = Agent(
            role="Writer",
            goal="Write compelling content from research",
            backstory="Published author.",
            llm="gpt-4o"
        )
        task = Task(
            description=f"Write article based on: {research[:500]}",
            expected_output="Polished article draft.",
            agent=writer
        )
        crew = Crew(agents=[writer], tasks=[task], process=Process.sequential)
        result = crew.kickoff()
        self.state.draft = result.raw
        return self.state.draft

    @listen(write_step)
    def review_step(self, draft):
        """Equivalent to LangGraph's review_node."""
        reviewer = Agent(
            role="Editor",
            goal="Polish and finalize content",
            backstory="Senior editor.",
            llm="gpt-4o"
        )
        task = Task(
            description=f"Review and finalize: {draft[:500]}",
            expected_output="Publication-ready article.",
            agent=reviewer
        )
        crew = Crew(agents=[reviewer], tasks=[task], process=Process.sequential)
        result = crew.kickoff()
        self.state.final = result.raw
        return self.state.final

# Run the migrated flow
flow = ResearchFlow()
flow.state.topic = "AI trends in 2026"
result = flow.kickoff()
print(f"Final output: {result[:200]}...")

                        
                        Migration Benefits: The CrewAI version requires no explicit compile() step, has type-safe state via Pydantic, uses natural Python decorators instead of graph API, and each step gets a fully-capable agent (not just a function) with tool access, delegation, and memory.
                    

2. Evaluating Use Cases

2.1 When to Choose Crews vs Flows

CrewAI offers two orchestration patterns. Choosing correctly is critical for maintainability and performance:

Criteria	Use Crews	Use Flows
Task relationships	Simple sequential or hierarchical	Complex DAGs, conditional branches
State management	Task context passing (implicit)	Explicit Pydantic state (typed)
Human intervention	`human_input=True` on tasks	Dedicated HITL steps with routing
Error handling	Retry at task level	Checkpoint + resume from any step
Parallelism	Limited (process type)	Full: `and_`, `or_` combinators
External triggers	Not supported	Event-driven via `@listen`
Best for	Team collaboration tasks	Complex business workflows

from crewai import Agent, Task, Crew, Process

# USE CASE EVALUATION: Simple content pipeline → Use Crew
# When tasks are sequential with clear agent roles

writer = Agent(role="Writer", goal="Write articles", backstory="Expert writer.", llm="gpt-4o")
editor = Agent(role="Editor", goal="Edit articles", backstory="Senior editor.", llm="gpt-4o")

write_task = Task(description="Write about {topic}.", expected_output="Draft article.", agent=writer)
edit_task = Task(description="Edit the draft.", expected_output="Polished article.", agent=editor, context=[write_task])

# Simple crew — right choice for linear workflows
crew = Crew(
    agents=[writer, editor],
    tasks=[write_task, edit_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"topic": "microservices patterns"})
print(f"Simple pipeline result: {result.raw[:100]}...")

from crewai.flow.flow import Flow, start, listen, router
from pydantic import BaseModel

# USE CASE EVALUATION: Complex approval workflow → Use Flow
# When you need conditional routing, state, and HITL

class ApprovalState(BaseModel):
    content: str = ""
    risk_score: float = 0.0
    approved: bool = False
    route_taken: str = ""

class ApprovalFlow(Flow[ApprovalState]):

    @start()
    def assess_risk(self):
        self.state.risk_score = 0.7  # Simulated AI assessment
        return self.state.risk_score

    @router(assess_risk)
    def route_by_risk(self, score):
        if score < 0.3:
            return "auto_approve"
        elif score < 0.7:
            return "single_review"
        else:
            return "committee_review"

    @listen("auto_approve")
    def auto_approve(self):
        self.state.approved = True
        self.state.route_taken = "auto"
        return "Approved automatically"

    @listen("single_review")
    def single_review(self):
        self.state.route_taken = "single"
        # Would involve HITL here
        return "Single reviewer approved"

    @listen("committee_review")
    def committee_review(self):
        self.state.route_taken = "committee"
        # Would involve multiple HITL steps
        return "Committee reviewed"

flow = ApprovalFlow()
result = flow.kickoff()
print(f"Route taken: {flow.state.route_taken}")
print(f"Result: {result}")

Real-World Application

LangGraph to CrewAI Migration

A startup migrated their 500-line LangGraph agent to CrewAI in 3 days. The result: 60% less code, clearer separation of concerns (agents vs tasks vs process), built-in memory without custom implementation, and easier onboarding for new engineers. The main challenge: adapting graph-based conditional routing to CrewAI’s process model.

MigrationLangGraph Comparison

3. Strategic LLM Selection

3.1 Choosing the Right Model for Each Agent

Not every agent needs GPT-4o. Match model capability to task complexity for optimal cost-performance ratio:

from crewai import Agent, Task, Crew, Process

# STRATEGY: Use expensive models for complex reasoning,
# cheap models for simple tasks

# Complex reasoning → GPT-4o or Claude 3.5 Sonnet
senior_analyst = Agent(
    role="Senior Strategy Analyst",
    goal="Develop complex business strategies",
    backstory="20 years of consulting experience.",
    llm="gpt-4o",  # Needs strong reasoning
    verbose=True
)

# Moderate complexity → GPT-4o-mini
content_writer = Agent(
    role="Content Writer",
    goal="Write clear, engaging content",
    backstory="Experienced technical writer.",
    llm="gpt-4o-mini",  # Good enough for writing
    verbose=True
)

# Simple tasks → GPT-4o-mini or local models
data_formatter = Agent(
    role="Data Formatter",
    goal="Format and structure data outputs",
    backstory="Detail-oriented data specialist.",
    llm="gpt-4o-mini",  # Simple formatting tasks
    verbose=True
)

# Complex analysis task
strategy_task = Task(
    description="Analyze market entry strategy for {market}.",
    expected_output="Detailed strategy with risk assessment.",
    agent=senior_analyst
)

# Writing task
content_task = Task(
    description="Write executive summary from the strategy.",
    expected_output="1-page executive summary.",
    agent=content_writer,
    context=[strategy_task]
)

# Formatting task
format_task = Task(
    description="Format the summary as a structured report.",
    expected_output="Formatted report with sections and bullets.",
    agent=data_formatter,
    context=[content_task]
)

crew = Crew(
    agents=[senior_analyst, content_writer, data_formatter],
    tasks=[strategy_task, content_task, format_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"market": "Southeast Asia fintech"})
print(result.raw[:300])

                        
                        Cost Comparison (per 1M tokens): GPT-4o: $2.50 input / $10 output. GPT-4o-mini: $0.15 input / $0.60 output. Claude 3.5 Sonnet: $3 input / $15 output. A mixed crew using GPT-4o for 1 agent and GPT-4o-mini for 3 agents can reduce costs by 60-70% with minimal quality loss.
                    

4. Upgrading CrewAI

4.1 Version Compatibility and Upgrade Workflow

CrewAI follows semantic versioning. Major versions may include breaking changes; minor versions are backward-compatible:

# Check current version
pip show crewai

# Upgrade to latest
pip install --upgrade crewai crewai-tools

# Upgrade to specific version
pip install crewai==0.100.0

# Check for breaking changes before upgrading
# Visit: https://github.com/crewAIInc/crewAI/releases

Test your existing crews after upgrading to catch any breaking changes:

import crewai
from crewai import Agent, Task, Crew, Process

# Verify version after upgrade
print(f"CrewAI version: {crewai.__version__}")

# Smoke test: ensure basic crew execution still works
test_agent = Agent(
    role="Test Agent",
    goal="Verify framework functionality",
    backstory="QA specialist.",
    llm="gpt-4o-mini"  # Use cheap model for testing
)

test_task = Task(
    description="Respond with 'OK' to confirm the framework is working.",
    expected_output="The word OK.",
    agent=test_agent
)

crew = Crew(
    agents=[test_agent],
    tasks=[test_task],
    process=Process.sequential
)

try:
    result = crew.kickoff()
    print(f"Upgrade verified! Response: {result.raw}")
except Exception as e:
    print(f"UPGRADE ISSUE: {e}")
    print("Check release notes for breaking changes.")

                        
                        Upgrade Checklist: (1) Read the changelog for your target version. (2) Test in a separate virtualenv first. (3) Run your test suite. (4) Check deprecated warnings in verbose output. (5) Update any agents.yaml / tasks.yaml if schema changed. (6) Verify all tools still import correctly.
                    

5. Production Without LiteLLM

5.1 Direct Provider Connections

By default, CrewAI uses LiteLLM as a universal LLM proxy. For production environments with strict dependency requirements, you can connect directly to providers:

import os
from crewai import Agent, Task, Crew, Process

# Direct OpenAI connection (no LiteLLM middleware)
os.environ["OPENAI_API_KEY"] = "sk-..."

# Standard OpenAI model string works directly
agent_openai = Agent(
    role="Production Agent",
    goal="Handle production workloads reliably",
    backstory="Battle-tested production agent.",
    llm="gpt-4o",  # Direct OpenAI
    max_retry_limit=3,
    verbose=True
)

# For Azure OpenAI (direct connection)
os.environ["AZURE_API_KEY"] = "your-azure-key"
os.environ["AZURE_API_BASE"] = "https://your-resource.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2024-02-15-preview"

agent_azure = Agent(
    role="Azure Agent",
    goal="Process tasks via Azure OpenAI",
    backstory="Enterprise-grade agent.",
    llm="azure/gpt-4o",  # Azure deployment name
    verbose=True
)

# For Anthropic (direct)
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

agent_claude = Agent(
    role="Claude Agent",
    goal="Leverage Claude for analysis",
    backstory="Analytical agent.",
    llm="anthropic/claude-sonnet-4-20250514",
    verbose=True
)

task = Task(
    description="Analyze {topic} from a production reliability perspective.",
    expected_output="Production readiness assessment.",
    agent=agent_openai
)

crew = Crew(
    agents=[agent_openai],
    tasks=[task],
    process=Process.sequential,
    max_rpm=60  # Rate limiting for production
)

result = crew.kickoff(inputs={"topic": "microservices deployment"})
print(result.raw)

Production hardening checklist for deployed crews:

import os
from crewai import Agent, Task, Crew, Process

# ============================================================
# PRODUCTION HARDENING CHECKLIST
# ============================================================

# 1. Rate limiting — prevent cost overruns
MAX_RPM = 30  # Requests per minute

# 2. Retry configuration
MAX_RETRIES = 3

# 3. Timeout configuration
REQUEST_TIMEOUT = 120  # seconds

# 4. Error handling wrapper
def run_crew_safely(crew, inputs, max_attempts=3):
    """Production-safe crew execution with retries."""
    for attempt in range(max_attempts):
        try:
            result = crew.kickoff(inputs=inputs)
            return {"success": True, "output": result.raw, "attempt": attempt + 1}
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_attempts - 1:
                return {"success": False, "error": str(e), "attempt": attempt + 1}
    return {"success": False, "error": "Max attempts exceeded"}

# 5. Production agent with all safeguards
production_agent = Agent(
    role="Production Processor",
    goal="Process requests reliably and efficiently",
    backstory="Optimized for production throughput and reliability.",
    llm="gpt-4o-mini",  # Cost-effective for production
    max_retry_limit=MAX_RETRIES,
    verbose=False  # Reduce noise in production
)

production_task = Task(
    description="Process incoming request: {request}",
    expected_output="Structured response to the request.",
    agent=production_agent
)

production_crew = Crew(
    agents=[production_agent],
    tasks=[production_task],
    process=Process.sequential,
    max_rpm=MAX_RPM,
    verbose=False
)

# 6. Execute with safety wrapper
result = run_crew_safely(
    production_crew,
    inputs={"request": "Summarize Q2 sales performance"}
)
print(f"Success: {result['success']}")
print(f"Attempts: {result['attempt']}")
if result['success']:
    print(f"Output: {result['output'][:200]}...")

# Production deployment configuration (docker-compose.yml)
version: '3.8'
services:
  crewai-worker:
    build: .
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - CREWAI_TELEMETRY_OPT_OUT=true
      - LOG_LEVEL=WARNING
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 2G
          cpus: '1.0'
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "python", "-c", "import crewai; print('ok')"]
      interval: 30s
      timeout: 10s
      retries: 3

                        
                        Production Security: (1) Never log full LLM responses (PII risk). (2) Use environment variables or secrets managers for API keys. (3) Implement request rate limiting at the application level. (4) Set up alerting on error rates and cost spikes. (5) Use separate API keys for dev/staging/production with appropriate rate limits.
                    

                        
                        Try It Yourself: Perform a mini-migration: take a simple LangGraph workflow (3 nodes: research → analyze → write) and rewrite it as a CrewAI crew with equivalent functionality. Compare: lines of code, readability, execution time, and output quality. Document which patterns mapped cleanly and which required different approaches.
                    

CrewAI SDK Track Complete!

Congratulations on completing the 14-part CrewAI SDK track! You’ve mastered multi-agent crews, tasks and processes, Flows for stateful orchestration, 75+ tools, MCP integration, knowledge and memory, planning and reasoning, human-in-the-loop, observability, and production deployment. Return to the AI App Dev Series Hub to explore other SDK tracks.