AI Application Development Mastery Part 10: Multi-Agent Systems

Introduction: The Power of Agent Collaboration

                        
                        Series Overview: This is Part 10 of our 18-part AI Application Development Mastery series. We now move beyond single agents into the world of multi-agent systems, where specialized agents collaborate, debate, and coordinate to solve problems no single agent can tackle alone.
                    

AI Application Development Mastery

Your 20-step learning path • Currently on Step 10

1

10

Multi-Agent Systems

Supervisor, swarm, debate, role-based collaboration

You Are Here

11

AI Application Design Patterns

RAG, chat+memory, workflow automation, agent loops

12

Ecosystem & Frameworks

LlamaIndex, Haystack, HuggingFace, vLLM

13

MCP Foundations & Architecture

Protocol design, Host/Client/Server, primitives, security

14

MCP in Production

Building servers, integrations, scaling, agent systems

15

Evaluation & LLMOps

Prompt eval, tracing, LangSmith, experiment tracking

16

Production AI Systems

APIs, queues, caching, streaming, scaling

17

Safety, Guardrails & Reliability

Input filtering, hallucination mitigation, prompt injection

18

Advanced Topics

Fine-tuning, tool learning, hybrid LLM+symbolic

19

Building Real AI Applications

Chatbot, document QA, coding assistant, full-stack

20

Future of AI Applications

Autonomous agents, self-improving, multi-modal, AI OS

A single AI agent can be remarkably powerful, but many real-world tasks require multiple perspectives, specialized skills, and collaborative problem-solving. Multi-agent systems bring together specialized agents that can divide labor, debate solutions, verify each other's work, and coordinate on complex tasks that exceed the capability of any single agent.

Consider a software development workflow: one agent writes code, another reviews it for bugs, a third writes tests, and a fourth handles documentation. Or consider research synthesis: one agent searches literature, another extracts key findings, a third identifies contradictions, and a fourth synthesizes everything into a coherent report. These are the kinds of problems multi-agent systems were designed to solve.

                        
                        Key Insight: Multi-agent systems are not about making more LLM calls — they are about creating structured collaboration patterns where specialized agents produce collectively better results than any single agent could alone. The orchestration pattern you choose (supervisor, swarm, debate, hierarchical) fundamentally shapes the system's behavior.
                    

1. Why Multi-Agent Systems

Single agents hit fundamental limitations when tasks require diverse expertise, quality verification, or parallel processing. Multi-agent systems address these limitations by distributing responsibility across specialized agents.

1.1 Single Agent vs Multi-Agent

Dimension	Single Agent	Multi-Agent System
Complexity handling	One prompt must cover everything	Each agent focuses on one sub-task
Quality assurance	Self-review only (limited)	Cross-agent verification and critique
Parallelism	Sequential execution	Concurrent agents on independent tasks
Specialization	Jack of all trades	Each agent is a domain expert
Context window	Single window, shared across all tasks	Each agent has its own context budget
Cost	Lower (one LLM call chain)	Higher (multiple agents, more tokens)
Debugging	Simpler trace	More complex inter-agent traces
Failure modes	Single point of failure	Can recover via redundancy

1.2 When to Use Multi-Agent

                        
                        Use multi-agent when:
                        The task requires diverse expertise (e.g., coding + testing + documentation)
Quality benefits from adversarial verification (e.g., writer + critic)
Sub-tasks can run in parallel for speed
The context window of a single agent is insufficient for the full task
You need role-based access control (different agents access different tools/data)

                    

                        
                        Avoid multi-agent when:
                        A single well-prompted agent can handle the task effectively
Cost constraints are tight (multi-agent multiplies token usage)
Latency is critical (coordination adds overhead)
The task is simple enough that adding agents adds complexity without benefit

                    

1.3 Core Concepts & Terminology

Concept	Definition
Agent	An autonomous unit with an LLM, system prompt, tools, and memory that can act independently
Role	The specialized identity and responsibilities assigned to an agent (e.g., "Researcher", "Coder", "Reviewer")
Orchestrator	The mechanism that coordinates agent interactions (supervisor agent, graph, or protocol)
Message passing	How agents communicate — shared state, direct messages, broadcast, or publish/subscribe
Task delegation	Assigning specific sub-tasks to specific agents based on their capabilities
Consensus	The mechanism by which agents agree on a final answer or output
Conflict resolution	Handling disagreements between agents (voting, supervisor override, debate rounds)

2. Framework Comparison: AutoGen vs CrewAI vs LangGraph

Three frameworks dominate the multi-agent landscape, each with a fundamentally different philosophy. Understanding their architectural differences is critical to choosing the right one for your use case.

2.1 AutoGen Deep Dive

AutoGen (by Microsoft Research) pioneered the conversational multi-agent paradigm. Agents interact through natural conversation, taking turns sending messages to each other like participants in a group chat.

# AutoGen multi-agent example: Code generation with review
# pip install pyautogen

import os
import autogen

# Configuration for LLM — use environment variable for API key
# export OPENAI_API_KEY="sk-..."
config_list = [
    {
        "model": "gpt-4",
        "api_key": os.getenv("OPENAI_API_KEY")
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
    "seed": 42
}

# Define specialized agents
coder = autogen.AssistantAgent(
    name="Coder",
    system_message="""You are an expert Python developer.
    Write clean, well-documented, production-quality code.
    Always include type hints, docstrings, and error handling.
    When you receive feedback, revise your code accordingly.""",
    llm_config=llm_config
)

reviewer = autogen.AssistantAgent(
    name="Reviewer",
    system_message="""You are a senior code reviewer.
    Review code for: bugs, security issues, performance,
    readability, and adherence to best practices.
    Be specific and constructive in your feedback.
    If the code is production-ready, say 'APPROVED'.""",
    llm_config=llm_config
)

tester = autogen.AssistantAgent(
    name="Tester",
    system_message="""You are a QA engineer specializing in testing.
    Write comprehensive unit tests using pytest.
    Cover edge cases, error conditions, and happy paths.
    Include test fixtures and parameterized tests where appropriate.""",
    llm_config=llm_config
)

# User proxy triggers the conversation
user_proxy = autogen.UserProxyAgent(
    name="UserProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "coding_output",
        "use_docker": False
    }
)

# Create a group chat with all agents
groupchat = autogen.GroupChat(
    agents=[user_proxy, coder, reviewer, tester],
    messages=[],
    max_round=12,
    speaker_selection_method="round_robin"
)

manager = autogen.GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

# Initiate the multi-agent workflow
user_proxy.initiate_chat(
    manager,
    message="""Build a Python rate limiter class that:
    1. Supports both fixed-window and sliding-window algorithms
    2. Is thread-safe
    3. Can be used as a decorator
    4. Supports Redis backend for distributed rate limiting"""
)

                        
                        AutoGen Architecture: Agents are conversational participants. The GroupChat manager selects which agent speaks next using round-robin, random, or LLM-based selection. Agents can execute code in sandboxed environments, making AutoGen particularly strong for coding tasks.
                    

2.2 CrewAI Deep Dive

CrewAI takes a role-playing, task-oriented approach inspired by real-world team structures. You define agents with roles, goals, and backstories, then assign them to specific tasks with dependencies.

# CrewAI multi-agent example: Research and writing team
# pip install crewai crewai-tools

import os
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool

# Set API keys via environment variables
# export OPENAI_API_KEY="sk-..."
# export SERPER_API_KEY="your-serper-key"  (required by SerperDevTool)

# Tools available to agents
search_tool = SerperDevTool()
web_tool = WebsiteSearchTool()

# Define agents with roles, goals, and backstories
researcher = Agent(
    role="Senior Research Analyst",
    goal="Conduct thorough research on the given topic, "
         "finding the most relevant and up-to-date information",
    backstory="""You are a senior research analyst at a top
    technology consultancy. You have 15 years of experience
    analyzing emerging technology trends. You are methodical,
    thorough, and always cite your sources.""",
    tools=[search_tool, web_tool],
    verbose=True,
    allow_delegation=True,
    max_iter=5,
    llm="gpt-4"
)

writer = Agent(
    role="Technical Content Writer",
    goal="Transform research findings into engaging, accurate, "
         "and well-structured technical articles",
    backstory="""You are an award-winning technical writer who
    has published extensively in IEEE, ACM, and major tech blogs.
    You excel at making complex topics accessible without
    sacrificing accuracy.""",
    verbose=True,
    allow_delegation=False,
    llm="gpt-4"
)

editor = Agent(
    role="Senior Editor",
    goal="Ensure the final article is publication-ready with "
         "perfect grammar, logical flow, and factual accuracy",
    backstory="""You are a senior editor at a prestigious
    technology publication. You have an eye for detail and
    ensure every piece meets the highest editorial standards.
    You check facts, improve clarity, and polish prose.""",
    verbose=True,
    allow_delegation=False,
    llm="gpt-4"
)

# Define tasks with dependencies
research_task = Task(
    description="""Research the current state of multi-agent AI systems.
    Cover: key frameworks (AutoGen, CrewAI, LangGraph),
    real-world deployments, limitations, and future directions.
    Provide at least 10 key findings with sources.""",
    expected_output="A detailed research report with 10+ findings, "
                    "each with source citations and relevance scores.",
    agent=researcher
)

writing_task = Task(
    description="""Using the research report, write a 2000-word
    technical article on multi-agent AI systems.
    Include: introduction, 3-4 main sections, code examples,
    comparison tables, and a conclusion with predictions.""",
    expected_output="A 2000-word article in markdown format with "
                    "code examples and comparison tables.",
    agent=writer,
    context=[research_task]  # Depends on research
)

editing_task = Task(
    description="""Review and polish the article for publication.
    Check: factual accuracy, grammar, flow, technical correctness.
    Provide the final version ready for publication.""",
    expected_output="The final polished article ready for publication.",
    agent=editor,
    context=[writing_task]  # Depends on writing
)

# Assemble the crew
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential,  # or Process.hierarchical
    verbose=True,
    memory=True,
    planning=True
)

# Execute the crew
result = crew.kickoff()
print(result)

                        
                        CrewAI Architecture: Built around the metaphor of a real-world team. Each agent has a role, goal, and backstory that shape its behavior. Tasks have explicit dependencies, creating a DAG (directed acyclic graph) of work. CrewAI supports both sequential and hierarchical process modes.
                    

2.3 LangGraph Multi-Agent

LangGraph provides the most flexible and customizable multi-agent framework through its graph-based architecture. You define agents as nodes, communication as edges, and orchestration logic as conditional routing.

# LangGraph multi-agent example: Supervisor pattern
# pip install langgraph langchain-openai

import os
from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
import operator

# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."

# Define shared state
class MultiAgentState(TypedDict):
    messages: Annotated[list, operator.add]
    next_agent: str
    research_output: str
    code_output: str
    review_output: str
    final_output: str

# Initialize LLM
llm = ChatOpenAI(model="gpt-4", temperature=0.7)

# Define agent nodes
def researcher_node(state: MultiAgentState) -> dict:
    """Research agent that gathers information."""
    messages = [
        SystemMessage(content="""You are a research specialist.
        Gather comprehensive information on the requested topic.
        Provide structured findings with key insights."""),
        HumanMessage(content=str(state["messages"][-1]))
    ]
    response = llm.invoke(messages)
    return {
        "messages": [response],
        "research_output": response.content
    }

def coder_node(state: MultiAgentState) -> dict:
    """Coding agent that writes implementation."""
    context = state.get("research_output", "")
    messages = [
        SystemMessage(content="""You are an expert Python developer.
        Based on the research provided, write production-quality
        implementation code with type hints and docstrings."""),
        HumanMessage(content=f"Research context:\n{context}\n\n"
                     f"Task: {state['messages'][0]}")
    ]
    response = llm.invoke(messages)
    return {
        "messages": [response],
        "code_output": response.content
    }

def reviewer_node(state: MultiAgentState) -> dict:
    """Review agent that evaluates code quality."""
    code = state.get("code_output", "")
    messages = [
        SystemMessage(content="""You are a senior code reviewer.
        Evaluate the code for correctness, performance, security,
        and best practices. Provide specific, actionable feedback.
        End with APPROVED or NEEDS_REVISION."""),
        HumanMessage(content=f"Review this code:\n{code}")
    ]
    response = llm.invoke(messages)
    return {
        "messages": [response],
        "review_output": response.content
    }

def supervisor_node(state: MultiAgentState) -> dict:
    """Supervisor that routes to the next agent."""
    messages = [
        SystemMessage(content="""You are a project supervisor.
        Based on the current state, decide which agent should
        act next. Respond with exactly one of:
        'researcher', 'coder', 'reviewer', or 'FINISH'.

        Route to 'researcher' if we need more information.
        Route to 'coder' if we have research and need code.
        Route to 'reviewer' if we have code to review.
        Route to 'FINISH' if the review is APPROVED."""),
        HumanMessage(content=f"Messages so far: "
                     f"{len(state['messages'])}\n"
                     f"Has research: "
                     f"{bool(state.get('research_output'))}\n"
                     f"Has code: "
                     f"{bool(state.get('code_output'))}\n"
                     f"Review status: "
                     f"{state.get('review_output', 'none')[:100]}")
    ]
    response = llm.invoke(messages)
    next_agent = response.content.strip().lower()

    if "finish" in next_agent:
        next_agent = "FINISH"
    elif "researcher" in next_agent:
        next_agent = "researcher"
    elif "coder" in next_agent:
        next_agent = "coder"
    elif "reviewer" in next_agent:
        next_agent = "reviewer"
    else:
        next_agent = "FINISH"

    return {"next_agent": next_agent}

# Build the graph
def route_supervisor(state: MultiAgentState) -> str:
    return state.get("next_agent", "FINISH")

workflow = StateGraph(MultiAgentState)

# Add nodes
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_node("reviewer", reviewer_node)

# Add edges
workflow.set_entry_point("supervisor")
workflow.add_conditional_edges(
    "supervisor",
    route_supervisor,
    {
        "researcher": "researcher",
        "coder": "coder",
        "reviewer": "reviewer",
        "FINISH": END
    }
)
# All agents route back to supervisor
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")
workflow.add_edge("reviewer", "supervisor")

# Compile
graph = workflow.compile()

# Execute
result = graph.invoke({
    "messages": [HumanMessage(
        content="Build a thread-safe connection pool in Python"
    )],
    "next_agent": "",
    "research_output": "",
    "code_output": "",
    "review_output": "",
    "final_output": ""
})

2.4 Head-to-Head Comparison Table

Dimension	AutoGen	CrewAI	LangGraph
Architecture	Conversation-based group chat	Role-playing task pipeline	Graph-based state machine
Agent definition	Name + system message	Role + goal + backstory	Function nodes + state
Communication	Natural language messages in group chat	Implicit via task context and delegation	Shared state dictionary + message passing
Orchestration	GroupChatManager with speaker selection	Sequential or hierarchical process	Conditional edges with custom routing
Code execution	Built-in (Docker/local sandbox)	Via tools	Via custom tool nodes
Flexibility	Medium — good defaults, limited customization	Low-Medium — opinionated, role-focused	High — fully customizable graph topology
Learning curve	Low — intuitive chat metaphor	Low — team metaphor is natural	Medium-High — graph concepts required
State management	Conversation history	Task outputs as context	Typed state with checkpointing
Human-in-the-loop	Built-in UserProxyAgent	Via human_input=True on tasks	interrupt_before / interrupt_after nodes
Best for	Coding tasks, conversational collaboration, research	Content creation, business workflows, team simulation	Complex orchestration, production systems, custom patterns
Limitations	Limited control over conversation flow, can loop	Less flexible for non-linear workflows	More boilerplate, steeper learning curve
Production readiness	Good — Microsoft-backed	Growing — active development	Excellent — LangChain ecosystem, LangSmith integration

3. Orchestration Patterns

The orchestration pattern determines how agents interact, who controls the flow, and how decisions are made. Choosing the right pattern is the most important architectural decision in multi-agent design.

3.1 Supervisor Pattern

A central supervisor agent coordinates all other agents, deciding who acts next, what they work on, and when the task is complete. The supervisor sees all outputs and maintains the global view.

# Supervisor pattern: Central coordinator
# The supervisor decides which specialist to invoke
# NOTE: This is a design pattern illustration — replace _call_llm
# with your actual LLM client (e.g., OpenAI, LangChain).

class SupervisorPattern:
    """
    Central supervisor agent that routes tasks to specialists.

    Flow: User -> Supervisor -> [Specialist A | B | C] -> Supervisor -> ...
    The supervisor maintains global awareness of all agent outputs.
    """

    def __init__(self, specialists: dict):
        self.specialists = specialists
        self.conversation_history = []

    def supervisor_decide(self, task: str, history: list) -> str:
        """Supervisor LLM decides which specialist to invoke."""
        specialist_names = list(self.specialists.keys())
        prompt = f"""You are a project supervisor coordinating
        these specialists: {specialist_names}.

        Task: {task}
        History: {history[-3:] if history else 'None'}

        Which specialist should handle this next?
        Respond with just the specialist name, or 'DONE'."""
        # LLM call returns specialist name
        return self._call_llm(prompt)

    def run(self, task: str, max_rounds: int = 10) -> str:
        for round_num in range(max_rounds):
            decision = self.supervisor_decide(
                task, self.conversation_history
            )
            if decision == "DONE":
                return self._synthesize_final_output()

            specialist = self.specialists.get(decision)
            if specialist:
                result = specialist.execute(
                    task, self.conversation_history
                )
                self.conversation_history.append({
                    "agent": decision,
                    "round": round_num,
                    "output": result
                })
        return self._synthesize_final_output()

    def _call_llm(self, prompt: str) -> str:
        """Call your LLM provider. Replace with actual implementation."""
        # Example: return openai.chat.completions.create(...).choices[0].message.content
        raise NotImplementedError("Replace with your LLM client")

    def _synthesize_final_output(self) -> str:
        """Synthesize all agent outputs into a final response."""
        all_outputs = "\n".join(
            f"{h['agent']}: {h['output']}" for h in self.conversation_history
        )
        return self._call_llm(f"Synthesize these results:\n{all_outputs}")

                        
                        When to use Supervisor: Best for tasks with clear decomposition where a central authority should maintain control — such as software development pipelines, customer service escalation, or any workflow where quality gating is important.
                    

Multi-Agent Orchestration Patterns

graph TD
    subgraph SEQ ["Sequential"]
        S1["Agent A"] --> S2["Agent B"]
        S2 --> S3["Agent C"]
    end

    subgraph PAR ["Parallel / Fan-Out"]
        P0["Router"] --> P1["Agent A"]
        P0 --> P2["Agent B"]
        P0 --> P3["Agent C"]
    end

    subgraph HIER ["Hierarchical"]
        H0["Supervisor"]
        H0 --> H1["Worker A"]
        H0 --> H2["Worker B"]
        H1 -.-> H0
        H2 -.-> H0
    end

    style SEQ fill:#e8f4f4,stroke:#3B9797
    style PAR fill:#f0f4f8,stroke:#16476A
    style HIER fill:#f8f9fa,stroke:#132440

3.2 Swarm Pattern

In the swarm pattern, agents operate autonomously without a central coordinator. Each agent decides independently whether to act, and agents can hand off tasks to other agents based on local conditions.

# Swarm pattern: Decentralized autonomous agents
# Each agent decides independently whether to act
# NOTE: This is a design pattern illustration — replace _execute
# and _calculate_relevance with your actual LLM/matching logic.

class SwarmAgent:
    """An autonomous agent in a swarm system."""

    def __init__(self, name: str, expertise: list[str],
                 handoff_rules: dict):
        self.name = name
        self.expertise = expertise
        self.handoff_rules = handoff_rules
        # Rules map conditions to target agent names

    def should_act(self, task: str) -> bool:
        """Determine if this agent should handle the task."""
        relevance = self._calculate_relevance(task)
        return relevance > 0.7  # Threshold

    def act(self, task: str, shared_state: dict) -> dict:
        """Process the task and return results."""
        result = self._execute(task, shared_state)

        # Determine if handoff is needed
        handoff_target = self._check_handoff(result)
        return {
            "output": result,
            "handoff_to": handoff_target,
            "confidence": self._assess_confidence(result)
        }

    def _calculate_relevance(self, task: str) -> float:
        """Score how relevant this agent is to the task (0-1)."""
        # Simple keyword overlap; replace with embedding similarity
        task_words = set(task.lower().split())
        expertise_words = set(
            w.lower() for e in self.expertise for w in e.split()
        )
        if not task_words:
            return 0.0
        return len(task_words & expertise_words) / len(task_words)

    def _execute(self, task: str, shared_state: dict) -> str:
        """Execute the task. Replace with your LLM call."""
        raise NotImplementedError("Replace with your LLM client")

    def _assess_confidence(self, result: str) -> float:
        """Assess confidence in the result (0-1)."""
        return 0.8  # Placeholder — use LLM self-eval in production

    def _check_handoff(self, result: str) -> str | None:
        """Check if result should be handed off to another agent."""
        for condition, target in self.handoff_rules.items():
            if condition in result.lower():
                return target
        return None


class SwarmOrchestrator:
    """Manages a swarm of autonomous agents."""

    def __init__(self, agents: list[SwarmAgent]):
        self.agents = {a.name: a for a in agents}
        self.shared_state = {}

    def _select_initial_agent(self, task: str) -> str:
        """Select the best starting agent based on relevance."""
        scores = {
            name: agent._calculate_relevance(task)
            for name, agent in self.agents.items()
        }
        return max(scores, key=scores.get)

    def run(self, task: str, max_handoffs: int = 8) -> dict:
        current_agent = self._select_initial_agent(task)
        results = []

        for _ in range(max_handoffs):
            agent = self.agents[current_agent]
            result = agent.act(task, self.shared_state)
            results.append({
                "agent": current_agent,
                "output": result["output"]
            })
            self.shared_state[current_agent] = result["output"]

            if result["handoff_to"] is None:
                break  # No more handoffs needed
            current_agent = result["handoff_to"]

        return {"results": results, "state": self.shared_state}

                        
                        When to use Swarm: Best for customer service routing, triage systems, and workflows where the next step depends on the output of the current step. OpenAI's Swarm framework popularized this pattern for lightweight, handoff-based multi-agent systems.
                    

3.3 Debate Pattern

In the debate pattern, multiple agents argue for different positions, challenge each other's reasoning, and converge on a higher-quality answer through structured argumentation.

# Debate pattern: Agents argue and converge on best answer
# Particularly effective for analysis, evaluation, and decisions
# NOTE: proposer, critic, and judge are agent objects with
# generate_proposal(), critique(), revise(), and synthesize() methods.

class DebateSystem:
    """
    Multiple agents debate a topic to reach a better conclusion.

    Pattern: Proposer -> Critic -> Proposer revision ->
             Critic re-evaluation -> Judge synthesizes
    """

    def __init__(self, proposer, critic, judge,
                 max_rounds: int = 3):
        self.proposer = proposer
        self.critic = critic
        self.judge = judge
        self.max_rounds = max_rounds

    def run_debate(self, topic: str) -> dict:
        debate_log = []

        # Initial proposal
        proposal = self.proposer.generate_proposal(topic)
        debate_log.append({
            "round": 0,
            "agent": "proposer",
            "content": proposal
        })

        for round_num in range(1, self.max_rounds + 1):
            # Critic challenges the proposal
            critique = self.critic.critique(
                topic, proposal, debate_log
            )
            debate_log.append({
                "round": round_num,
                "agent": "critic",
                "content": critique
            })

            # Proposer revises based on critique
            proposal = self.proposer.revise(
                topic, proposal, critique, debate_log
            )
            debate_log.append({
                "round": round_num,
                "agent": "proposer",
                "content": proposal
            })

            # Check if critic is satisfied
            if "I CONCUR" in critique.upper():
                break

        # Judge synthesizes the final answer
        final_answer = self.judge.synthesize(
            topic, debate_log
        )

        return {
            "final_answer": final_answer,
            "debate_log": debate_log,
            "rounds": round_num
        }

                        
                        When to use Debate: Best for tasks where quality matters more than speed — such as legal analysis, medical diagnosis support, strategic planning, and any domain where adversarial verification catches errors that self-review misses.
                    

3.4 Hierarchical Pattern

The hierarchical pattern creates a tree of managers and workers. A top-level manager decomposes the task, mid-level managers coordinate sub-teams, and leaf agents execute specific sub-tasks.

# Hierarchical pattern: Tree of managers and workers
# Best for large, complex projects with natural decomposition
# NOTE: This is a design pattern illustration — replace _call_llm
# with your actual LLM client.

import json as _json

class WorkerAgent:
    """A leaf-level agent that executes tasks directly."""
    def __init__(self, name: str, expertise: str):
        self.name = name
        self.expertise = expertise

    def execute(self, task: str) -> str:
        """Execute the task. Replace with your LLM call."""
        raise NotImplementedError("Replace with your LLM client")

class HierarchicalManager:
    """A manager agent that coordinates a team of sub-agents."""

    def __init__(self, name: str, sub_agents: list,
                 decomposition_strategy: str = "functional"):
        self.name = name
        self.sub_agents = {a.name: a for a in sub_agents}
        self.strategy = decomposition_strategy

    def decompose_task(self, task: str) -> list[dict]:
        """Break a task into sub-tasks for each sub-agent."""
        prompt = f"""Decompose this task into sub-tasks,
        one for each team member: {list(self.sub_agents.keys())}

        Task: {task}

        Return a JSON list of sub-tasks with 'agent' and 'task' keys."""
        return self._call_llm_json(prompt)

    def coordinate(self, task: str) -> dict:
        """Decompose, delegate, collect, and synthesize."""
        # Step 1: Decompose
        sub_tasks = self.decompose_task(task)

        # Step 2: Delegate and collect results
        results = {}
        for sub_task in sub_tasks:
            agent_name = sub_task["agent"]
            agent = self.sub_agents.get(agent_name)
            if agent:
                if hasattr(agent, "coordinate"):
                    # Sub-agent is also a manager (recursion)
                    results[agent_name] = agent.coordinate(
                        sub_task["task"]
                    )
                else:
                    # Leaf agent executes directly
                    results[agent_name] = agent.execute(
                        sub_task["task"]
                    )

        # Step 3: Synthesize results
        return self.synthesize(task, results)

    def synthesize(self, original_task: str,
                   results: dict) -> dict:
        """Combine sub-agent results into a coherent output."""
        prompt = f"""Synthesize these team results into a
        coherent final output for: {original_task}

        Results: {results}"""
        return {
            "manager": self.name,
            "synthesis": self._call_llm(prompt),
            "sub_results": results
        }

    def _call_llm(self, prompt: str) -> str:
        """Call your LLM provider. Replace with actual implementation."""
        raise NotImplementedError("Replace with your LLM client")

    def _call_llm_json(self, prompt: str) -> list[dict]:
        """Call LLM and parse JSON response. Replace with actual implementation."""
        response = self._call_llm(prompt)
        return _json.loads(response)

# Example: Three-level hierarchy
# CEO -> [CTO, CPO] -> [Backend Dev, Frontend Dev, QA, Designer]
backend_dev = WorkerAgent("BackendDev", "Python/APIs")
frontend_dev = WorkerAgent("FrontendDev", "React/TypeScript")
qa_engineer = WorkerAgent("QA", "Testing/Automation")
designer = WorkerAgent("Designer", "UI/UX/Figma")

cto = HierarchicalManager(
    "CTO", [backend_dev, frontend_dev, qa_engineer]
)
cpo = HierarchicalManager("CPO", [designer])
ceo = HierarchicalManager("CEO", [cto, cpo])

# CEO decomposes -> CTO/CPO decompose -> workers execute
result = ceo.coordinate(
    "Build a user dashboard with analytics and notifications"
)

                        
                        When to use Hierarchical: Best for large, complex projects that naturally decompose into departments or teams — such as full application development, comprehensive research projects, or enterprise workflow automation with multiple domains.
                    

Pattern Selection Guide

Pattern	Agents	Control	Best For	Cost
Supervisor	3-8	Centralized	Dev pipelines, quality-gated workflows	Medium
Swarm	2-6	Decentralized	Routing, triage, customer service	Low-Medium
Debate	2-4	Adversarial	Analysis, evaluation, high-stakes decisions	Medium-High
Hierarchical	5-20+	Tree/delegated	Large projects, enterprise workflows	High

4. LangGraph Multi-Agent Implementation

LangGraph provides the most powerful and flexible framework for building production multi-agent systems. Let us implement each orchestration pattern as a complete LangGraph graph.

4.1 Supervisor Graph

The supervisor pattern implements a central coordinator agent that receives user requests, delegates subtasks to specialized worker agents (research, coding, review), and synthesizes their outputs into a final response. In LangGraph, this is modeled as a graph where the supervisor node routes to worker nodes via conditional edges, with each worker returning results that flow back to the supervisor for the next routing decision. This is the most common multi-agent architecture in production.

# Complete LangGraph supervisor multi-agent system
# pip install langgraph langchain-openai

import os
from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
    HumanMessage, SystemMessage, AIMessage
)
from langchain_core.tools import tool
import operator

# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."

# Shared state for the multi-agent system
class SupervisorState(TypedDict):
    messages: Annotated[list, operator.add]
    next: str
    iteration: int

llm = ChatOpenAI(model="gpt-4", temperature=0)

# Define tools for specialized agents
@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return f"Search results for: {query}"

@tool
def execute_python(code: str) -> str:
    """Execute Python code safely."""
    try:
        exec_globals = {}
        exec(code, exec_globals)
        return str(exec_globals.get("result", "Code executed."))
    except Exception as e:
        return f"Error: {e}"

@tool
def write_file(filename: str, content: str) -> str:
    """Write content to a file."""
    return f"Written {len(content)} chars to {filename}"

# Create specialized agents using LangGraph prebuilt
research_agent = create_react_agent(
    llm,
    tools=[search_web],
    state_modifier="""You are a research specialist.
    Gather comprehensive information and present structured findings.
    Always cite sources and rate confidence levels."""
)

coding_agent = create_react_agent(
    llm,
    tools=[execute_python, write_file],
    state_modifier="""You are an expert Python developer.
    Write clean, production-quality code with type hints,
    docstrings, and comprehensive error handling."""
)

review_agent = create_react_agent(
    llm,
    tools=[],
    state_modifier="""You are a senior code reviewer.
    Check for bugs, security issues, performance problems,
    and adherence to best practices.
    End your review with APPROVED or NEEDS_REVISION."""
)

# Supervisor routing logic
AGENTS = ["researcher", "coder", "reviewer"]

def supervisor_node(state: SupervisorState) -> dict:
    """Supervisor decides which agent to invoke next."""
    system_prompt = f"""You are a team supervisor managing:
    {AGENTS}. Given the conversation state, decide which
    agent should act next, or if we should FINISH.

    Rules:
    - Start with 'researcher' if no research exists
    - Route to 'coder' after research is complete
    - Route to 'reviewer' after code is written
    - FINISH if reviewer says APPROVED
    - Route back to 'coder' if reviewer says NEEDS_REVISION
    - Maximum {state.get('iteration', 0)}/5 iterations

    Respond with ONLY the agent name or 'FINISH'."""

    messages = [SystemMessage(content=system_prompt)]
    messages.extend(state.get("messages", [])[-6:])

    response = llm.invoke(messages)
    next_agent = response.content.strip().upper()

    if "FINISH" in next_agent or state.get("iteration", 0) >= 5:
        return {"next": "FINISH", "iteration": state.get("iteration", 0) + 1}

    for agent in AGENTS:
        if agent.upper() in next_agent:
            return {"next": agent, "iteration": state.get("iteration", 0) + 1}

    return {"next": "FINISH", "iteration": state.get("iteration", 0) + 1}

# Agent wrapper nodes
def researcher_node(state: SupervisorState) -> dict:
    result = research_agent.invoke(state)
    return {"messages": [AIMessage(
        content=f"[Researcher]: {result['messages'][-1].content}",
        name="researcher"
    )]}

def coder_node(state: SupervisorState) -> dict:
    result = coding_agent.invoke(state)
    return {"messages": [AIMessage(
        content=f"[Coder]: {result['messages'][-1].content}",
        name="coder"
    )]}

def reviewer_node(state: SupervisorState) -> dict:
    result = review_agent.invoke(state)
    return {"messages": [AIMessage(
        content=f"[Reviewer]: {result['messages'][-1].content}",
        name="reviewer"
    )]}

# Build the supervisor graph
def route_after_supervisor(state: SupervisorState) -> str:
    return state.get("next", "FINISH")

builder = StateGraph(SupervisorState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("coder", coder_node)
builder.add_node("reviewer", reviewer_node)

builder.set_entry_point("supervisor")
builder.add_conditional_edges(
    "supervisor",
    route_after_supervisor,
    {
        "researcher": "researcher",
        "coder": "coder",
        "reviewer": "reviewer",
        "FINISH": END
    }
)

for agent in AGENTS:
    builder.add_edge(agent, "supervisor")

supervisor_graph = builder.compile()

# Run the supervisor system
result = supervisor_graph.invoke({
    "messages": [HumanMessage(
        content="Build a rate-limited API client for GitHub"
    )],
    "next": "",
    "iteration": 0
})

4.2 Swarm Graph

The swarm pattern replaces the central supervisor with peer-to-peer handoffs — each agent decides which agent should handle the conversation next, based on the user’s needs. This decentralized approach scales better than supervisor architectures because adding a new agent doesn’t require updating a central routing function. It’s particularly effective for customer service scenarios where requests naturally flow between triage, billing, technical support, and sales.

# LangGraph swarm pattern: Decentralized handoffs
# pip install langgraph langchain-openai

import os
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
import operator

# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."

class SwarmState(TypedDict):
    messages: Annotated[list, operator.add]
    current_agent: str
    handoff_count: int

llm = ChatOpenAI(model="gpt-4", temperature=0)

def triage_agent(state: SwarmState) -> dict:
    """Initial triage agent that routes to specialists."""
    messages = [
        SystemMessage(content="""You are a triage agent.
        Classify the user request and hand off to the
        appropriate specialist:
        - 'billing' for payment/subscription issues
        - 'technical' for bugs/errors/how-to questions
        - 'sales' for pricing/plans/upgrades
        - 'resolved' if you can answer directly

        Start your response with [HANDOFF:target] or [RESOLVED].
        Then provide your response."""),
        *state["messages"]
    ]
    response = llm.invoke(messages)
    content = response.content

    if "[HANDOFF:billing]" in content:
        next_agent = "billing"
    elif "[HANDOFF:technical]" in content:
        next_agent = "technical"
    elif "[HANDOFF:sales]" in content:
        next_agent = "sales"
    else:
        next_agent = "resolved"

    return {
        "messages": [response],
        "current_agent": next_agent,
        "handoff_count": state.get("handoff_count", 0) + 1
    }

def billing_agent(state: SwarmState) -> dict:
    """Billing specialist agent."""
    messages = [
        SystemMessage(content="""You are a billing specialist.
        Handle payment, subscription, and refund issues.
        If the issue requires technical investigation,
        respond with [HANDOFF:technical].
        If resolved, respond with [RESOLVED]."""),
        *state["messages"]
    ]
    response = llm.invoke(messages)
    content = response.content

    if "[HANDOFF:technical]" in content:
        next_agent = "technical"
    elif "[HANDOFF:sales]" in content:
        next_agent = "sales"
    else:
        next_agent = "resolved"

    return {
        "messages": [response],
        "current_agent": next_agent,
        "handoff_count": state.get("handoff_count", 0) + 1
    }

def technical_agent(state: SwarmState) -> dict:
    """Technical support specialist agent."""
    messages = [
        SystemMessage(content="""You are a technical support
        specialist. Debug issues, provide solutions, and
        guide users through technical problems.
        If it is a billing issue, respond with [HANDOFF:billing].
        If resolved, respond with [RESOLVED]."""),
        *state["messages"]
    ]
    response = llm.invoke(messages)
    content = response.content

    if "[HANDOFF:billing]" in content:
        next_agent = "billing"
    else:
        next_agent = "resolved"

    return {
        "messages": [response],
        "current_agent": next_agent,
        "handoff_count": state.get("handoff_count", 0) + 1
    }

def sales_agent(state: SwarmState) -> dict:
    """Sales specialist agent."""
    messages = [
        SystemMessage(content="""You are a sales specialist.
        Help with pricing, plans, upgrades, and demos.
        Respond with [RESOLVED] when done."""),
        *state["messages"]
    ]
    response = llm.invoke(messages)
    return {
        "messages": [response],
        "current_agent": "resolved",
        "handoff_count": state.get("handoff_count", 0) + 1
    }

# Routing function
def route_swarm(state: SwarmState) -> str:
    if state.get("handoff_count", 0) >= 5:
        return "resolved"
    return state.get("current_agent", "resolved")

# Build swarm graph
swarm_builder = StateGraph(SwarmState)
swarm_builder.add_node("triage", triage_agent)
swarm_builder.add_node("billing", billing_agent)
swarm_builder.add_node("technical", technical_agent)
swarm_builder.add_node("sales", sales_agent)

swarm_builder.set_entry_point("triage")

# Each agent routes based on handoff
for node in ["triage", "billing", "technical", "sales"]:
    swarm_builder.add_conditional_edges(
        node,
        route_swarm,
        {
            "billing": "billing",
            "technical": "technical",
            "sales": "sales",
            "resolved": END
        }
    )

swarm_graph = swarm_builder.compile()

4.3 Debate Graph

The debate pattern uses adversarial collaboration to improve output quality: a proposer generates an initial response, a critic identifies weaknesses and suggests improvements, and a judge evaluates whether the refined output meets quality thresholds. This multi-round process continues until the judge is satisfied or a maximum number of rounds is reached. Debate architectures excel at tasks where first-draft quality is unreliable, such as complex analysis, legal reasoning, or creative writing.

# LangGraph debate pattern: Adversarial improvement
# pip install langgraph langchain-openai

import os
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
import operator

# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."

class DebateState(TypedDict):
    messages: Annotated[list, operator.add]
    proposal: str
    critique: str
    round: int
    consensus_reached: bool

llm = ChatOpenAI(model="gpt-4", temperature=0.7)

def proposer_node(state: DebateState) -> dict:
    """Generate or revise a proposal."""
    critique = state.get("critique", "")
    round_num = state.get("round", 0)

    if round_num == 0:
        # Initial proposal
        system = """You are a solution architect.
        Generate a thorough, well-reasoned proposal
        for the given problem. Be specific and detailed."""
    else:
        # Revision based on critique
        system = f"""You are a solution architect.
        Your previous proposal was critiqued:

        {critique}

        Revise your proposal to address the concerns.
        Improve weak points while keeping strong elements.
        Be specific about what you changed and why."""

    messages = [
        SystemMessage(content=system),
        state["messages"][0]  # Original question
    ]
    response = llm.invoke(messages)
    return {
        "messages": [response],
        "proposal": response.content,
        "round": round_num + 1
    }

def critic_node(state: DebateState) -> dict:
    """Critically evaluate the proposal."""
    proposal = state.get("proposal", "")
    round_num = state.get("round", 0)

    system = f"""You are a critical evaluator (Round {round_num}).
    Analyze this proposal for weaknesses, gaps, logical flaws,
    missing considerations, and potential failure modes.

    Proposal:
    {proposal}

    If the proposal is excellent with no significant issues,
    say 'CONSENSUS: The proposal is sound.'
    Otherwise, provide specific, constructive criticism."""

    messages = [
        SystemMessage(content=system),
        state["messages"][0]
    ]
    response = llm.invoke(messages)
    consensus = "CONSENSUS" in response.content.upper()

    return {
        "messages": [response],
        "critique": response.content,
        "consensus_reached": consensus
    }

def judge_node(state: DebateState) -> dict:
    """Synthesize the debate into a final answer."""
    system = """You are an impartial judge.
    Review the entire debate and produce the best possible
    final answer, incorporating the strongest arguments from
    both the proposer and critic. Be comprehensive."""

    messages = [SystemMessage(content=system)]
    messages.extend(state["messages"])
    response = llm.invoke(messages)
    return {"messages": [response]}

def should_continue_debate(state: DebateState) -> str:
    if state.get("consensus_reached", False):
        return "judge"
    if state.get("round", 0) >= 3:
        return "judge"
    return "proposer"

# Build debate graph
debate_builder = StateGraph(DebateState)
debate_builder.add_node("proposer", proposer_node)
debate_builder.add_node("critic", critic_node)
debate_builder.add_node("judge", judge_node)

debate_builder.set_entry_point("proposer")
debate_builder.add_edge("proposer", "critic")
debate_builder.add_conditional_edges(
    "critic",
    should_continue_debate,
    {
        "proposer": "proposer",
        "judge": "judge"
    }
)
debate_builder.add_edge("judge", END)

debate_graph = debate_builder.compile()

# Run a debate
result = debate_graph.invoke({
    "messages": [HumanMessage(
        content="What is the best architecture for a real-time "
                "collaborative document editor at scale?"
    )],
    "proposal": "",
    "critique": "",
    "round": 0,
    "consensus_reached": False
})

5. Agent Role Design

Well-designed agent roles are the foundation of effective multi-agent systems. Poor role design leads to overlapping responsibilities, communication breakdowns, and suboptimal outputs.

5.1 Role Design Principles

Principle	Description	Example
Single Responsibility	Each agent should have one clear, focused purpose	"Code writer" not "code writer and tester and reviewer"
Clear Boundaries	Define what each agent does and does NOT do	"Researcher gathers info, does NOT make recommendations"
Complementary Skills	Agents should have non-overlapping expertise	Backend dev + Frontend dev + QA (not three "full-stack" agents)
Appropriate Granularity	Not too broad (ineffective) or too narrow (overhead)	3-5 agents is typical; 20+ agents adds more cost than value
Explicit Communication Protocol	Define how agents share information and signal completion	"End review with APPROVED or NEEDS_REVISION"

# Role design template for multi-agent systems

class AgentRoleTemplate:
    """Template for designing well-structured agent roles."""

    def __init__(self):
        self.role_template = """
        # Agent Role Card

        ## Identity
        - Name: {name}
        - Role: {role}
        - Expertise: {expertise}

        ## Responsibilities
        - Primary: {primary_responsibility}
        - Secondary: {secondary_responsibilities}
        - Does NOT do: {exclusions}

        ## Communication Protocol
        - Accepts input from: {input_sources}
        - Sends output to: {output_targets}
        - Completion signal: {completion_signal}
        - Escalation trigger: {escalation_trigger}

        ## Tools Available
        {tools_list}

        ## Quality Criteria
        {quality_criteria}
        """

    def create_role(self, **kwargs) -> str:
        return self.role_template.format(**kwargs)


# Example: Software development team roles
dev_team_roles = {
    "architect": {
        "name": "SystemArchitect",
        "role": "System Architect",
        "expertise": "System design, patterns, scalability",
        "primary_responsibility": "Design system architecture "
            "and define component interfaces",
        "secondary_responsibilities": "Review technical "
            "decisions, define coding standards",
        "exclusions": "Writing implementation code, testing",
        "input_sources": "User requirements, project manager",
        "output_targets": "Developers, reviewer",
        "completion_signal": "ARCHITECTURE_COMPLETE",
        "escalation_trigger": "Conflicting requirements, "
            "scalability concerns",
        "tools_list": "- draw_diagram\n- search_patterns",
        "quality_criteria": "- Clear component boundaries\n"
            "- Scalability considerations\n"
            "- Technology justification"
    },
    "developer": {
        "name": "BackendDeveloper",
        "role": "Backend Developer",
        "expertise": "Python, APIs, databases, async programming",
        "primary_responsibility": "Implement backend services "
            "according to architecture specs",
        "secondary_responsibilities": "Write unit tests, "
            "document API endpoints",
        "exclusions": "Architecture decisions, frontend code, "
            "deployment configuration",
        "input_sources": "Architect specs, reviewer feedback",
        "output_targets": "Reviewer, tester",
        "completion_signal": "CODE_COMPLETE",
        "escalation_trigger": "Architecture ambiguity, "
            "blocked by dependencies",
        "tools_list": "- execute_python\n- write_file\n"
            "- query_database",
        "quality_criteria": "- Type hints on all functions\n"
            "- Docstrings\n- Error handling\n"
            "- 80%+ test coverage"
    }
}

5.2 Task Delegation Strategies

Effective task delegation determines system performance. There are several strategies for deciding which agent handles which task:

# Task delegation strategies for multi-agent systems
# NOTE: Agents are expected to have .name, .expertise_keywords,
# .active_task_count, .tier, and .description attributes.

def classify_priority(task: str) -> str:
    """Classify task priority based on keywords. Replace with LLM call."""
    critical_keywords = ["outage", "down", "critical", "urgent", "security"]
    if any(kw in task.lower() for kw in critical_keywords):
        return "critical"
    return "normal"

class TaskDelegation:
    """Different strategies for delegating tasks to agents."""

    @staticmethod
    def capability_based(task: str, agents: list) -> str:
        """Delegate to the agent whose capabilities best match."""
        scores = {}
        for agent in agents:
            overlap = len(
                set(task.lower().split()) &
                set(agent.expertise_keywords)
            )
            scores[agent.name] = overlap
        return max(scores, key=scores.get)

    @staticmethod
    def load_balanced(agents: list) -> str:
        """Delegate to the agent with the fewest active tasks."""
        return min(agents, key=lambda a: a.active_task_count).name

    @staticmethod
    def priority_based(task: str, agents: list) -> str:
        """Delegate based on task priority and agent tier."""
        task_priority = classify_priority(task)
        if task_priority == "critical":
            return next(a.name for a in agents if a.tier == "senior")
        return next(a.name for a in agents if a.tier == "junior")

    @staticmethod
    def llm_routed(task: str, agents: list, llm) -> str:
        """Use an LLM to decide the best agent for the task."""
        agent_descriptions = "\n".join(
            f"- {a.name}: {a.description}" for a in agents
        )
        prompt = f"""Given this task: {task}
        And these available agents:
        {agent_descriptions}
        Which agent should handle this? Respond with just the name."""
        response = llm.invoke(prompt)
        return response.content.strip()

5.3 Conflict Resolution

When agents disagree, you need a structured mechanism to resolve conflicts and converge on a decision.

# Conflict resolution mechanisms for multi-agent systems
# NOTE: This is a design pattern illustration — replace LLM calls
# with your actual LLM client.

class ConflictResolver:
    """Resolve disagreements between agents."""

    def __init__(self, strategy: str = "supervisor_override"):
        self.strategy = strategy

    def resolve(self, agent_outputs: dict,
                context: str) -> dict:
        """Resolve conflicting outputs from multiple agents."""
        if self.strategy == "supervisor_override":
            return self._supervisor_override(
                agent_outputs, context
            )
        elif self.strategy == "majority_vote":
            return self._majority_vote(agent_outputs)
        elif self.strategy == "confidence_weighted":
            return self._confidence_weighted(agent_outputs)
        elif self.strategy == "debate_resolution":
            return self._debate_resolution(
                agent_outputs, context
            )

    def _supervisor_override(self, outputs: dict,
                             context: str) -> dict:
        """Supervisor agent makes the final call."""
        prompt = f"""Multiple agents disagree on this task.
        Context: {context}

        Agent outputs:
        {self._format_outputs(outputs)}

        As supervisor, synthesize the best answer,
        noting which agents were correct and why."""
        decision = self._call_supervisor_llm(prompt)
        return {"decision": decision, "method": "supervisor"}

    def _majority_vote(self, outputs: dict) -> dict:
        """Simple majority vote on categorical decisions."""
        from collections import Counter
        votes = [
            o.get("recommendation") for o in outputs.values()
        ]
        winner = Counter(votes).most_common(1)[0]
        return {
            "decision": winner[0],
            "votes": dict(Counter(votes)),
            "method": "majority_vote"
        }

    def _confidence_weighted(self, outputs: dict) -> dict:
        """Weight agent outputs by their confidence scores."""
        weighted_outputs = []
        for agent, output in outputs.items():
            confidence = output.get("confidence", 0.5)
            weighted_outputs.append({
                "agent": agent,
                "output": output["content"],
                "weight": confidence
            })
        # Select highest-confidence output
        best = max(weighted_outputs, key=lambda x: x["weight"])
        return {
            "decision": best["output"],
            "confidence": best["weight"],
            "method": "confidence_weighted"
        }

    def _debate_resolution(self, outputs: dict,
                           context: str) -> dict:
        """Agents debate their positions for N rounds."""
        rounds = []
        for round_num in range(3):
            for agent, output in outputs.items():
                rebuttal = self._generate_rebuttal(
                    agent, output, rounds
                )
                rounds.append({
                    "round": round_num,
                    "agent": agent,
                    "argument": rebuttal
                })

        # Judge synthesizes after debate
        final = self._judge_debate(context, rounds)
        return {
            "decision": final,
            "debate_log": rounds,
            "method": "debate_resolution"
        }

    def _format_outputs(self, outputs: dict) -> str:
        """Format agent outputs for display."""
        return "\n".join(f"- {agent}: {out}" for agent, out in outputs.items())

    def _call_supervisor_llm(self, prompt: str) -> str:
        """Call LLM for supervisor decisions. Replace with actual client."""
        raise NotImplementedError("Replace with your LLM client")

    def _generate_rebuttal(self, agent: str, output: dict,
                           rounds: list) -> str:
        """Generate a rebuttal from an agent given the debate history."""
        raise NotImplementedError("Replace with your LLM client")

    def _judge_debate(self, context: str, rounds: list) -> str:
        """Synthesize the debate rounds into a final decision."""
        raise NotImplementedError("Replace with your LLM client")

                        
                        Common Anti-Patterns in Multi-Agent Design:
                        Too many agents — More agents means more cost, latency, and coordination overhead. Start with 2-3 and add only when needed.
Overlapping roles — If two agents can do the same thing, they will produce conflicting outputs. Ensure clear boundaries.
Missing termination condition — Without explicit stopping criteria, agents can loop indefinitely. Always set max_rounds.
Shared context pollution — When all agents dump into a single context, noise overwhelms signal. Use scoped state.
No human escape hatch — Always include a mechanism for human intervention when agents get stuck.

                    

6. Exercises & Self-Assessment

Exercise 1

Framework Selection Challenge

For each scenario, choose the best framework (AutoGen, CrewAI, or LangGraph) and orchestration pattern (supervisor, swarm, debate, hierarchical). Justify your choices:

A law firm needs AI to research case law, draft arguments, and have a partner review them
A customer service system that routes tickets to billing, technical, or sales specialists
A content creation pipeline: research, write, edit, SEO-optimize, and schedule social media posts
A medical diagnosis support system where multiple specialists must agree on a treatment plan
A software development team: architect designs, developers implement, QA tests, DevOps deploys

Exercise 2

Build a Debate System

Implement a complete debate system using LangGraph with:

An optimist agent that argues for a given technology investment
A pessimist agent that argues against it, highlighting risks
A judge that synthesizes after 3 rounds of debate
Typed state tracking all arguments and counter-arguments
A convergence check that stops early if both sides agree

Exercise 3

AutoGen vs CrewAI Comparison

Implement the same task using both AutoGen and CrewAI:

Choose a task: "Research and write a comparison of React vs Vue.js for a new startup project"
Implement with AutoGen using GroupChat with 3 agents (researcher, writer, reviewer)
Implement with CrewAI using the same 3 roles
Compare: output quality, token usage, execution time, and code complexity
Document which framework felt more natural for this use case and why

Exercise 4

Role Design Workshop

Design agent roles for a complete AI-powered code review pipeline:

Define 4-5 agents with distinct, non-overlapping roles using the role card template from Section 5.1
Specify the communication protocol between each pair of agents
Define conflict resolution strategy when the security reviewer and performance reviewer disagree
Add a human-in-the-loop checkpoint before any code is approved for merge
Estimate the token cost per code review assuming GPT-4 pricing

Exercise 5

Reflective Questions

Why does the supervisor pattern tend to produce more consistent outputs than the swarm pattern? What are the trade-offs?
How would you handle a situation where agents in a debate system start agreeing too quickly without genuine critical analysis?
What are the cost implications of a 5-agent hierarchical system vs a 3-agent supervisor system for the same task?
How do you decide the optimal number of agents for a given task? What is the minimum and maximum you would recommend?
Compare the concept of "agent roles" in AI systems to roles in a human team. What translates well and what does not?

Multi-Agent System Document Generator

Design and document a multi-agent system architecture. Download as Word, Excel, PDF, or PowerPoint.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

System Name *

Orchestration Type *

Agents *

Communication Protocol

Conflict Resolution

Additional Notes

Author Name

Conclusion & Next Steps

You now have a comprehensive understanding of multi-agent AI systems, from fundamental concepts to production-ready implementations. Here are the key takeaways from Part 10:

Multi-agent systems solve problems that single agents cannot — they enable specialization, quality verification through adversarial review, parallelism, and distributed context management
Three frameworks dominate — AutoGen excels at conversational coding tasks, CrewAI makes role-based team simulation intuitive, and LangGraph provides maximum flexibility for custom orchestration
Four core patterns — Supervisor (centralized control), Swarm (decentralized handoffs), Debate (adversarial improvement), and Hierarchical (tree-structured delegation) each serve different use cases
Role design principles — Single responsibility, clear boundaries, complementary skills, appropriate granularity, and explicit communication protocols are essential
Conflict resolution — Supervisor override, majority voting, confidence weighting, and structured debate provide mechanisms for handling agent disagreements
Watch for anti-patterns — Too many agents, overlapping roles, missing termination conditions, shared context pollution, and no human escape hatch are common pitfalls

Next in the Series

In Part 11: AI Application Design Patterns, we catalog the complete set of proven design patterns for AI applications — from simple prompt-response to complex multi-agent orchestration. You will learn which pattern to apply for each use case and which framework implements each pattern best.

Cookie Consent

Cookie Preferences