AI Agents & Agentic Workflows

Introduction: The Age of AI Agents

                        
                        Series Context: This is Part 13 of 24 in the AI in the Wild series. Parts 1–12 covered foundations, training, RAG, fine-tuning, safety, and MLOps. Now we turn to agents — AI systems that don't just respond but actively plan and act.
                    

AI in the Wild

Your 24-step learning path • Currently on Step 13

1

Series Introduction

Why AI in the wild matters

2

LLM Foundations

Transformers, tokenization, prompting

3

Prompt Engineering

Few-shot, chain-of-thought, templates

4

RAG Systems

Retrieval-augmented generation

5

Fine-Tuning LLMs

LoRA, QLoRA, PEFT

6

Embeddings & Vector DBs

Semantic search, FAISS, Pinecone

7

Evaluation & Testing

RAGAS, benchmarks, red-teaming

8

AI Safety & Alignment

RLHF, Constitutional AI, guardrails

9

MLOps for LLMs

CI/CD, monitoring, drift detection

10

Multimodal AI

Vision-language, audio, video

11

AI Infrastructure

GPU clusters, serving, quantization

12

Production LLM APIs

OpenAI, Anthropic, Gemini at scale

13

Tool use, planning, multi-agent systems

You Are Here

14

AI in Healthcare

Medical imaging, clinical NLP, drug discovery

15

AI in Finance

Fraud detection, credit scoring, trading

16

AI in Legal & Compliance

Contract analysis, regulatory AI

17

AI in Education

Personalized learning, tutors

18

AI in Manufacturing

Predictive maintenance, quality control

19

AI Ethics & Fairness

Bias, explainability, governance

20

Generative AI & Creativity

DALL-E, Sora, creative workflows

21

AI & Edge Computing

On-device inference, TinyML

22

Future of AI

AGI timelines, frontier models

23

Building AI Products

PM for AI, user research, iteration

24

AI Career Paths

Roles, skills, interview prep

An AI agent is an AI system that perceives its environment, reasons about what to do, takes actions using tools, and iterates toward a goal — without requiring a human in the loop for every step. The shift from LLMs-as-chatbots to LLMs-as-agents represents one of the most consequential developments in applied AI.

                        
                        The Key Difference: A chatbot answers a question in one shot. An agent observes, plans, acts, observes again, and continues until the task is done. Agents are loops; chatbots are single turns.
                    

What Makes Something an Agent?

Agents have four capabilities that distinguish them from simple LLM applications:

Tool Use: The ability to call external functions — search engines, calculators, APIs, code interpreters, databases.
Planning: Breaking complex goals into subtasks and sequencing them logically.
Memory: Retaining information across steps — either in context (short-term) or in a vector store (long-term).
Autonomy: Deciding which actions to take based on observations, without explicit per-step human instruction.

Real-World Example

Devin: The AI Software Engineer

Cognition's Devin agent can read a GitHub issue, plan an implementation, write code across multiple files, run tests, fix failures, and open a pull request — all without human intervention. It uses a code editor, terminal, browser, and memory as tools, and loops until the CI passes.

This is qualitatively different from GitHub Copilot completing a single line. Devin is an autonomous agent; Copilot is an AI-assisted autocomplete. Both are valuable, but they operate at different levels of the stack.

The Agent Loop

Every agent — from the simplest tool-caller to the most sophisticated multi-agent system — runs a variation of the same loop:

OBSERVE → THINK → ACT → OBSERVE → THINK → ACT → ... → DONE

In practice:

Observe: Receive the task + any new information from the environment (tool outputs, user messages).
Think: The LLM reasons about what to do next. This may include explicit chain-of-thought, planning, or self-critique.
Act: Call a tool, generate a response, update memory, or hand off to another agent.
Repeat: Feed the tool output back to the LLM and continue until the stopping condition is met.

Tool Use & Function Calling

Tool use is the foundation of agentic behavior. Without tools, an LLM can only reason over information in its context window — it cannot browse the web, run code, query a database, or call an API. Tools are what connect the LLM's reasoning to the real world.

How Tool Calling Works

Modern LLMs (GPT-4o, Claude 3.5, Gemini 1.5) natively support structured tool calling:

You describe available tools as JSON schemas (name, description, parameters).
The LLM decides whether to call a tool, and if so, which one and with what arguments.
Your code executes the tool and returns results.
The LLM reads the result and decides whether to call another tool or produce a final answer.

                        
                        Critical Insight — Tool Descriptions Matter: The LLM chooses which tool to call based entirely on your descriptions. A poorly described tool will be misused or ignored. Write tool descriptions as if you're explaining the function to a smart but literal engineer who has never seen your codebase.
                    

LangChain Tool-Using Agent (Code Example 1)

LangChain's create_tool_calling_agent makes it straightforward to build agents that pick from a toolkit based on the task at hand. The example below builds a financial research agent that can search the web, do arithmetic, and look up stock prices.

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.tools import tool
from langchain_core.prompts import ChatPromptTemplate
import requests, json

# Define tools the agent can use
@tool
def search_web(query: str) -> str:
    """Search the web for current information about a topic."""
    # In production: integrate Tavily, Serper, or Brave Search API
    return f"Web search results for '{query}': [simulated results]"

@tool
def calculate(expression: str) -> str:
    """Safely evaluate a mathematical expression."""
    try:
        # Use eval with restricted globals for safety
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a given ticker symbol."""
    # In production: integrate Alpha Vantage, Yahoo Finance, etc.
    prices = {"AAPL": 189.50, "MSFT": 415.20, "GOOGL": 175.30, "NVDA": 875.00}
    price = prices.get(ticker.upper(), "ticker not found")
    return f"{ticker.upper()}: ${price}"

# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_web, calculate, get_stock_price]
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful financial research assistant. Use tools to gather data before answering."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}")  # tool call/result history
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=5)

result = executor.invoke({"input": "Compare the stock prices of Apple and NVIDIA. Which has a higher P/E ratio if Apple's EPS is $6.25 and NVIDIA's is $1.95?"})
print(result["output"])
# Agent plan: get_stock_price(AAPL) -> get_stock_price(NVDA) -> calculate P/E ratios -> compare

Anatomy of a Tool Call Trace

With verbose=True, you can see the agent's reasoning. A typical trace looks like:

> Entering new AgentExecutor chain...
Thought: I need to get the stock prices first.
Action: get_stock_price
Action Input: {"ticker": "AAPL"}
Observation: AAPL: $189.50

Action: get_stock_price
Action Input: {"ticker": "NVDA"}
Observation: NVDA: $875.00

Thought: Now I can calculate P/E ratios.
Action: calculate
Action Input: {"expression": "189.50 / 6.25"}
Observation: 30.32

Action: calculate
Action Input: {"expression": "875.00 / 1.95"}
Observation: 448.72

Final Answer: Apple's P/E is 30.32x vs NVIDIA's 448.72x. NVIDIA trades at a dramatically higher multiple, reflecting expectations of explosive AI-driven earnings growth.

Best Practice

Tool Design Principles

One responsibility: Each tool does exactly one thing. Don't combine search + summarize into one tool.
Rich descriptions: Include what the tool does, when to use it, and what it returns.
Typed parameters: Use Pydantic models for complex inputs — the LLM follows schemas more reliably.
Idempotent where possible: Avoid tools with side effects unless necessary (e.g., write-to-database).
Error messages matter: Return descriptive errors — the agent will try to recover based on what you return.

Planning & Reasoning Patterns

Raw tool calling is reactive. For complex tasks, agents need to plan — to decompose goals into steps before executing them. Several reasoning patterns have emerged as reliable approaches.

Agentic Reasoning Patterns

flowchart LR
    subgraph ReAct["ReAct Pattern"]
        R1["Thought"] --> R2["Action"]
        R2 --> R3["Observation"]
        R3 --> R1
    end
    
    subgraph PlanExec["Plan-and-Execute"]
        P1["Plan\n(Full plan upfront)"] --> P2["Execute Step 1"]
        P2 --> P3["Execute Step 2"]
        P3 --> P4["Execute Step N"]
        P4 -.->|"Replan"| P1
    end
    
    subgraph Reflect["Reflection"]
        F1["Generate"] --> F2["Critique"]
        F2 --> F3["Refine"]
        F3 --> F1
    end

ReAct: Reason + Act

The ReAct pattern (Yao et al., 2022) interleaves reasoning traces with action calls. Before each tool call, the agent explicitly writes its reasoning in natural language. This improves performance on complex tasks and makes agent behavior auditable.

Pattern: ReAct

ReAct vs. Direct Action

Direct Action (brittle): Task → Tool Call → Answer. No explicit reasoning, hard to debug, misses multi-step dependencies.

ReAct (robust): Task → Thought → Tool Call → Observation → Thought → Tool Call → ... → Final Answer. Each step is justified and the agent can course-correct based on observations.

GPT-4o and Claude 3.5 Sonnet both support ReAct natively through their tool-calling APIs — the "thought" traces appear in the streamed response before each tool call.

Plan-and-Execute

For tasks requiring many steps, Plan-and-Execute separates the planning phase from execution. A "planner" LLM creates a task list upfront; an "executor" works through each step, potentially replanning if a step fails.

from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner

planner = load_chat_planner(llm)
executor = load_agent_executor(llm, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)

# The agent will:
# 1. Create a numbered plan: ["Step 1: Search for...", "Step 2: Calculate...", ...]
# 2. Execute each step, passing outputs to subsequent steps
# 3. Synthesize a final answer from all step outputs
agent.run("Research the top 3 AI chip manufacturers, compare their 2025 revenue, and predict market share in 2027")

Reflection & Self-Critique

Reflection agents evaluate their own outputs and iteratively improve them. This pattern is especially powerful for creative tasks, code generation, and research summaries.

REFLECTION_PROMPT = """
You wrote the following response:
<response>{response}</response>

Critique this response. What is missing? What could be more accurate or clearer?
Then rewrite an improved version addressing your critique.
"""
# Typical improvement: 2-3 reflection cycles yield significantly better outputs

Agentic Design Patterns Comparison

Different patterns suit different task types. The table below maps patterns to their optimal use cases and associated risks:

Pattern	Description	Use Case	Risk	Example System
ReAct	Interleave reasoning traces with tool calls	Multi-hop QA, research tasks	Verbose prompts, higher latency	Perplexity AI, Bing Copilot
Plan-and-Execute	Create full plan upfront, execute step-by-step	Complex workflows, project automation	Brittle if plan is wrong early	Devin, OpenDevin
Reflection	Evaluate and refine own outputs iteratively	Code generation, writing, analysis	Higher cost, may over-iterate	GPT-4 with self-critique
Multi-Agent Debate	Multiple agents argue different positions, synthesize	Fact-checking, complex decisions	Expensive, may amplify hallucinations	MetaGPT, Society of Mind
Supervisor-Worker	Orchestrator delegates to specialist sub-agents	Large-scale task decomposition	Supervisor bottleneck, coordination overhead	CrewAI, LangGraph
Tool-Augmented RAG	Agent decides whether to retrieve or call tools	Customer support, knowledge Q&A	Routing errors between retrieval and action	Salesforce Einstein, ServiceNow

Agent Memory Systems

Memory is what transforms a stateless LLM into an agent that learns and adapts. Without memory, every conversation starts from zero — the agent cannot recall user preferences, past decisions, or previously gathered facts.

Types of Agent Memory

Memory Taxonomy

Four Memory Types

In-Context Memory (Working): The current conversation history within the context window. Fast but bounded — typically 8K–200K tokens.
External Memory (Episodic): Vector store of past conversations and experiences, retrieved via semantic search. Effectively unlimited.
Semantic Memory (Knowledge): A structured knowledge base or RAG corpus of facts about the world. Retrieved when needed.
Procedural Memory (Skills): Encoded in the model's weights via fine-tuning or in reusable tool/prompt templates. Always available but not updatable at runtime.

Vector-Based Episodic Memory (Code Example 3)

The following class implements a FAISS-backed episodic memory store that lets agents remember past conversations and retrieve relevant context using semantic similarity.

from openai import OpenAI
from sentence_transformers import SentenceTransformer
import faiss, numpy as np, json
from datetime import datetime

class AgentMemory:
    """Vector-based episodic memory for AI agents."""
    def __init__(self, dim: int = 384):
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        self.index = faiss.IndexFlatIP(dim)
        self.memories = []

    def store(self, content: str, metadata: dict = {}):
        embedding = self.embedder.encode([content], normalize_embeddings=True)
        self.index.add(embedding.astype('float32'))
        self.memories.append({"content": content, "timestamp": datetime.now().isoformat(), **metadata})

    def retrieve(self, query: str, k: int = 3) -> list[dict]:
        if self.index.ntotal == 0: return []
        q_emb = self.embedder.encode([query], normalize_embeddings=True).astype('float32')
        scores, indices = self.index.search(q_emb, min(k, self.index.ntotal))
        return [{"memory": self.memories[i], "relevance": float(scores[0][j])}
                for j, i in enumerate(indices[0]) if i != -1]

# Usage: agent remembers past conversations
memory = AgentMemory()
memory.store("User prefers Python over JavaScript for data science tasks", {"type": "preference"})
memory.store("Previous analysis showed NVIDIA stock outperforming AMD by 40% in 2024", {"type": "fact"})

relevant = memory.retrieve("What programming language does the user prefer?")
print(relevant[0]["memory"]["content"])  # -> "User prefers Python..."

Memory Management Strategies

As agents accumulate memory, several challenges emerge:

Recency Bias: Always injecting the most recent context may miss important older facts. Use hybrid scoring: relevance + recency.
Memory Summarization: Periodically compress older memories. GPT-4 can summarize 20 past conversations into a 500-word profile.
Memory Isolation: In multi-user systems, each user's memories must be namespaced — use metadata filtering on the vector index.
Forgetting: Not all memories are worth keeping. Implement TTL-based expiry for time-sensitive facts.

                        
                        Production Memory Stacks: Mem0, Zep, and Letta are purpose-built memory layers for AI agents in production. They handle persistence, search, summarization, and user-level isolation so you don't have to build it yourself.
                    

Multi-Agent Systems

Single agents have limits. They can lose context on very long tasks, lack specialization, and may hallucinate without a second opinion. Multi-agent systems address these limitations by having several specialized agents collaborate — each doing what it does best.

Why Multi-Agent?

Design Motivation

When Single Agents Break Down

Context limits: A task requiring 200+ pages of analysis exceeds any single context window. Split across agents.
Specialization: A "researcher" agent trained on browsing is better at search than a "coder" agent. Specialize roles.
Verification: A single agent checking its own work catches ~60% of errors. A separate critic agent catches ~85% (empirical estimate from AutoGen paper).
Parallelism: Tasks with independent subtasks run faster in parallel agents than sequentially in one agent.

AutoGen Multi-Agent System (Code Example 2)

Microsoft's AutoGen framework makes it straightforward to build group chats where agents with distinct personas collaborate on complex tasks.

import autogen

# Multi-agent research team: planner + researcher + critic + executor
config_list = [{"model": "gpt-4o", "api_key": "..."}]
llm_config = {"config_list": config_list, "temperature": 0.1}

planner = autogen.AssistantAgent(
    name="Planner",
    system_message="""Break complex tasks into steps. Assign each step to the right specialist.
    Always create a plan before work begins. Format: PLAN: step1, step2, step3""",
    llm_config=llm_config
)

researcher = autogen.AssistantAgent(
    name="Researcher",
    system_message="Gather information and data. Cite sources. Focus on facts, not opinions.",
    llm_config=llm_config
)

critic = autogen.AssistantAgent(
    name="Critic",
    system_message="Review outputs for accuracy, completeness, and logical consistency. Point out flaws.",
    llm_config=llm_config
)

executor = autogen.UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",  # fully automated
    code_execution_config={"work_dir": "workspace", "use_docker": False},
    max_consecutive_auto_reply=5
)

# Group chat: agents collaborate to solve the task
groupchat = autogen.GroupChat(
    agents=[planner, researcher, critic, executor],
    messages=[], max_round=12
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
executor.initiate_chat(manager, message="Analyze the competitive landscape of LLM providers in 2025. Include market share estimates and key differentiators.")

Multi-Agent Orchestration Patterns

Architecture Pattern

Common Topologies

Pipeline: Agent A → Agent B → Agent C. Each agent processes the output of the previous. Simple but no parallelism.
Supervisor-Worker: A central orchestrator delegates subtasks to worker agents and aggregates results. Most common in production.
Peer-to-Peer: Agents communicate directly. Flexible but harder to reason about and debug.
Debate: Multiple agents argue for different conclusions; a judge synthesizes. High quality, high cost.

                        
                        Prompt Injection in Multi-Agent Systems: When agents read content from the web or user-uploaded files, malicious content can hijack the agent's behavior by embedding instructions. Always sanitize inputs and use bounded tool permissions. This is one of the most serious security challenges in production agentic systems.
                    

Agent Frameworks Compared

The agent framework ecosystem has grown rapidly since 2023. Choosing the right framework depends on your use case, team's Python expertise, and whether you need multi-agent support out of the box.

Framework	Creator	Agent Type	Tool Integration	Multi-Agent	Python/JS	Best For
LangChain	LangChain Inc.	ReAct, tool-calling	Excellent (200+ integrations)	LangGraph	Both	General-purpose RAG + agents
AutoGen	Microsoft Research	Conversational, group chat	Good (custom tools)	Native (group chat)	Python	Research, code generation teams
CrewAI	CrewAI Inc.	Role-based crews	Good (LangChain tools)	Native (crews)	Python	Business process automation
Semantic Kernel	Microsoft	Planner, function calling	Excellent (plugins)	Agent framework	Both + C#	Enterprise .NET + Python apps
LlamaIndex	LlamaIndex Inc.	ReAct, structured	Excellent (data-focused)	Multi-agent beta	Both	Document Q&A, data analysis

Decision Guide

Framework Selection Heuristics

Building a customer-facing product with RAG + agents? → LangChain + LangGraph
Academic/research multi-agent experiments? → AutoGen
Business workflow automation with clear roles? → CrewAI
Enterprise .NET shop extending existing apps? → Semantic Kernel
Heavy document analysis, financial data? → LlamaIndex
Full control, production hardening, no abstraction overhead? → Build on raw API + your own orchestration

Production Considerations

Deploying agents in production is fundamentally different from running demos. Agents that work perfectly in testing can fail in unpredictable ways in production — looping indefinitely, calling expensive tools unnecessarily, or executing harmful actions when given malicious inputs.

Safety & Guardrails

Production Safety

Essential Guardrails

Max iterations: Always set max_iterations. A buggy tool can cause infinite loops that burn through your API budget in minutes.
Confirm before irreversible actions: Email sending, database writes, API calls with side effects should require explicit confirmation or a human-in-the-loop checkpoint.
Tool permissions: Apply principle of least privilege. A customer support agent should never have access to billing or admin tools.
Input sanitization: Treat all externally sourced content (web pages, user files) as potentially adversarial. Strip HTML, limit length, validate formats.
Output validation: Validate agent outputs before acting on them — especially for structured outputs like JSON or SQL.

Observability

Agents are harder to debug than standard software because their behavior is non-deterministic. Investing in observability infrastructure is essential:

Trace every run: Log the full reasoning trace — every thought, tool call, and observation. LangSmith, Weights & Biases Weave, and Langfuse are purpose-built for this.
Token usage: Track token consumption per run and per tool to identify cost hotspots.
Latency breakdown: Separate LLM inference time from tool execution time. Tool latency is usually the bigger issue.
Success/failure rates: Define task success criteria and track them. A 90% task completion rate might be acceptable; 60% is not.

Cost Management

Agents can be expensive. A single complex research task might make 10–20 LLM calls and dozens of tool calls. Strategies to control costs:

Use smaller models for simpler steps: Route tool selection and formatting steps to GPT-4o Mini; reserve GPT-4o for complex reasoning.
Cache tool results: Web search results, API responses, and database queries can be cached. Many tasks re-use the same data.
Prompt compression: Use LLMLingua or similar tools to compress conversation history before injecting into context.
Budget limits: Implement per-user and per-task token budgets with hard stops.

                        
                        Real Cost Benchmark: A well-optimized GPT-4o agent handling a moderately complex research task (5–8 LLM calls) costs approximately $0.05–$0.20 per task. Without optimization, the same task can cost $1–$3. At scale, this difference is enormous.
                    

Exercises & Practice

Building agents is a hands-on skill. Work through these exercises in order — each one reinforces a different layer of the agentic stack.

Beginner

Exercise 1: Two-Tool Agent

Create a simple tool-using agent with exactly 2 tools: a calculator and a dictionary lookup (use the Free Dictionary API or a hardcoded dict). Write 10 test questions that require using one or both tools to answer. For each question, log which tools were called and whether the agent arrived at the correct answer.

Goals: Understand tool description quality, observe the tool selection process, practice prompt engineering for tool-using agents.

Stretch: Add a third tool (currency converter) and test questions that require all three tools in sequence.

Intermediate

Exercise 2: Research Agent with Memory

Build a research agent that can: (1) search the web using the Tavily API, (2) read the full content of URLs using a browser tool, and (3) summarize findings into a structured report. Add vector-based episodic memory so the agent remembers past research sessions. Test by researching a topic in multiple sessions — the agent should reference previous findings.

Goals: Implement multi-step tool chaining, build and use vector memory, produce structured output from unstructured tool results.

Evaluation: Compare report quality (factual accuracy, completeness, citation quality) between session 1 (no prior memory) and session 3 (2 prior sessions in memory).

Advanced

Exercise 3: Multi-Agent Code Review Pipeline

Design and implement a 4-agent code review workflow:

Agent A (Generator): Given a spec, writes Python code to solve a programming problem.
Agent B (Reviewer): Reviews Agent A's code for bugs, style issues, and edge cases.
Agent C (Tester): Generates and runs unit tests against Agent A's code, reports failures.
Agent D (Improver): Reads B's review and C's test failures, produces an improved version of the code.

Run the pipeline on 10 programming challenges (LeetCode easy/medium). Track: number of iterations per problem, test pass rate after Agent D's revision vs Agent A's original, and qualitative review quality from Agent B.

Key Question: Does the multi-agent pipeline produce meaningfully better code than a single agent with self-reflection? What is the cost difference?

AI Agent Design Document Generator

Document your AI agent design for review, compliance, and team alignment. Download as Word, Excel, PDF, or PowerPoint.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Agent Name *

Agent Type

Intended Use *

Tools Available

Memory Strategy

Safety Constraints

Evaluation Metrics

Known Limitations

Agent Owner / Team

Author Name

Cookie Consent

Cookie Preferences

AI Agents & Agentic Workflows

Table of Contents

Introduction: The Age of AI Agents

AI in the Wild

Series Introduction

LLM Foundations

Prompt Engineering

RAG Systems

Fine-Tuning LLMs

Embeddings & Vector DBs

Evaluation & Testing

AI Safety & Alignment

MLOps for LLMs

Multimodal AI

AI Infrastructure

Production LLM APIs

AI Agents & Agentic Workflows

AI in Healthcare

AI in Finance

AI in Legal & Compliance

AI in Education

AI in Manufacturing

AI Ethics & Fairness

Generative AI & Creativity

AI & Edge Computing

Future of AI

Building AI Products

AI Career Paths

What Makes Something an Agent?

Devin: The AI Software Engineer

The Agent Loop

Tool Use & Function Calling

How Tool Calling Works

LangChain Tool-Using Agent (Code Example 1)

Anatomy of a Tool Call Trace

Tool Design Principles

Planning & Reasoning Patterns

ReAct: Reason + Act

ReAct vs. Direct Action

Plan-and-Execute

Reflection & Self-Critique

Agentic Design Patterns Comparison

Agent Memory Systems

Types of Agent Memory

Four Memory Types

Vector-Based Episodic Memory (Code Example 3)

Memory Management Strategies

Multi-Agent Systems

Why Multi-Agent?

When Single Agents Break Down

AutoGen Multi-Agent System (Code Example 2)

Multi-Agent Orchestration Patterns

Common Topologies

Agent Frameworks Compared

Framework Selection Heuristics

Production Considerations

Safety & Guardrails

Essential Guardrails

Observability

Cost Management

Exercises & Practice

Exercise 1: Two-Tool Agent

Exercise 2: Research Agent with Memory

Exercise 3: Multi-Agent Code Review Pipeline

AI Agent Design Document Generator

Continue the Series

Part 14: AI in Healthcare & Life Sciences

Part 15: AI in Finance & Fraud Detection

Part 12: Production LLM APIs