Back to Technology

AI Application Development Mastery Part 7: Agents — Core of Modern AI Apps

April 1, 2026 Wasil Zafar 43 min read

Move beyond static chains to intelligent agents that reason, select tools, and take autonomous action. Master the agent decision loop, ReAct reasoning, tool-calling patterns, LangChain's AgentExecutor, agent memory strategies, error handling, and debugging techniques that separate toy demos from production-grade AI agents.

Table of Contents

  1. Agents vs Chains
  2. Tool Usage
  3. Agent Types
  4. LangChain Implementation
  5. Agent Memory
  6. Error Handling & Debugging
  7. Exercises & Self-Assessment
  8. Agent Spec Generator
  9. Conclusion & Next Steps

Introduction: From Chains to Agents

Series Overview: This is Part 7 of our 18-part AI Application Development Mastery series. We now enter the agent domain — where AI applications move from executing predefined chains to autonomously deciding what actions to take, which tools to use, and how to handle unexpected situations.

AI Application Development Mastery

Your 20-step learning path • Currently on Step 7
1
Foundations & Evolution of AI Apps
Pre-LLM era, transformers, LLM revolution
2
LLM Fundamentals for Developers
Tokens, context windows, sampling, API patterns
3
Prompt Engineering Mastery
Zero/few-shot, CoT, ReAct, structured outputs
4
LangChain Core Concepts
Chains, prompts, LLMs, tools, LCEL
5
Retrieval-Augmented Generation (RAG)
Embeddings, vector DBs, retrievers, RAG pipelines
6
Memory & Context Engineering
Buffer/summary/vector memory, chunking, re-ranking
7
Agents — Core of Modern AI Apps
ReAct, tool-calling, planner-executor agents
You Are Here
8
LangGraph — Stateful Agent Workflows
Nodes, edges, state, graph execution, cycles
9
Deep Agents & Autonomous Systems
Multi-step reasoning, self-reflection, planning
10
Multi-Agent Systems
Supervisor, swarm, debate, role-based collaboration
11
AI Application Design Patterns
RAG, chat+memory, workflow automation, agent loops
12
Ecosystem & Frameworks
LlamaIndex, Haystack, HuggingFace, vLLM
13
MCP Foundations & Architecture
Protocol design, Host/Client/Server, primitives, security
14
MCP in Production
Building servers, integrations, scaling, agent systems
15
Evaluation & LLMOps
Prompt eval, tracing, LangSmith, experiment tracking
16
Production AI Systems
APIs, queues, caching, streaming, scaling
17
Safety, Guardrails & Reliability
Input filtering, hallucination mitigation, prompt injection
18
Advanced Topics
Fine-tuning, tool learning, hybrid LLM+symbolic
19
Building Real AI Applications
Chatbot, document QA, coding assistant, full-stack
20
Future of AI Applications
Autonomous agents, self-improving, multi-modal, AI OS

In Parts 1 through 6, we built the foundations: LLM fundamentals, prompt engineering, LangChain chains, RAG pipelines, and memory systems. All of those are powerful, but they share a fundamental limitation — the developer decides the control flow at design time. A chain always executes the same sequence of steps, regardless of the input.

Agents fundamentally change this equation. An agent is a system where the LLM decides at runtime which actions to take, which tools to invoke, and in what order. The developer provides the tools and the reasoning framework; the agent figures out the rest. This is the bridge between static AI pipelines and truly intelligent, autonomous applications.

Key Insight: The difference between a chain and an agent is the difference between a GPS giving turn-by-turn directions (chain) and a human driver who can reroute around accidents, stop for gas, and make judgment calls (agent). Both get you to the destination, but only one can handle the unexpected.

1. Agents vs Chains

Understanding the precise distinction between agents and chains is critical. It is not a matter of complexity — it is a matter of who controls the execution flow.

1.1 Chain Limitations

Chains (covered in Part 4) follow a predetermined sequence. Consider a customer support chain:

# pip install langchain-openai langchain-core
# A chain: fixed sequence, developer-controlled flow

import os
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Set your API key via environment variable
# export OPENAI_API_KEY="sk-..."
os.environ.setdefault("OPENAI_API_KEY", os.getenv("OPENAI_API_KEY", ""))

# This chain ALWAYS: classify → retrieve → respond
# It cannot decide to skip retrieval or call an external API

classify_prompt = ChatPromptTemplate.from_template(
    "Classify this customer query into one of: billing, technical, general.\n"
    "Query: {query}\nCategory:"
)

response_prompt = ChatPromptTemplate.from_template(
    "You are a support agent. The query category is {category}.\n"
    "Relevant docs: {docs}\n"
    "Customer query: {query}\n"
    "Provide a helpful response."
)

llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Fixed pipeline — no runtime decision-making
chain = classify_prompt | llm | StrOutputParser()

# Run the chain
result = chain.invoke({"query": "Cancel my subscription"})
print(result)

# Problem: What if the customer says "Cancel my subscription"?
# The chain cannot look up their account, check billing status,
# or actually perform the cancellation — it can only generate text.

The chain above works for simple question-answering, but it cannot take action. When a customer says "Cancel my subscription," the chain can only generate text about cancellation — it cannot actually check their account, verify their identity, or process the cancellation.

1.2 What Makes an Agent

An agent has three defining characteristics that set it apart from a chain:

Characteristic Chain Agent
Control Flow Developer-defined at design time LLM decides at runtime
Tool Selection Fixed tools in fixed order Chooses which tools to use (or none)
Iteration Single pass (or fixed number of passes) Loops until the task is complete
Error Recovery Fails or follows predefined fallback Can reason about errors and retry differently
Observation Does not observe tool outputs to decide next step Observes each tool output and reasons about what to do next

1.3 The Agent Decision Loop

Every agent, regardless of implementation, follows this fundamental loop:

# The universal agent decision loop (pseudocode)
# This illustrates the concept — not meant to be run directly.
# See Section 4 for runnable LangChain implementations.

def execute_tool(tool_name, tool_args, tools):
    """Look up and invoke the named tool (placeholder)."""
    for t in tools:
        if t.name == tool_name:
            return t.invoke(tool_args)
    return f"Tool '{tool_name}' not found."

def agent_loop(llm, user_input, tools, max_iterations=10):
    """
    The fundamental loop that powers ALL agent architectures.

    1. THINK   — LLM reasons about the current state
    2. ACT     — LLM selects and invokes a tool (or returns final answer)
    3. OBSERVE — Agent captures tool output
    4. REPEAT  — Back to THINK with new observation
    """

    messages = [{"role": "user", "content": user_input}]

    for i in range(max_iterations):
        # THINK: LLM decides what to do
        response = llm.invoke(messages, tools=tools)

        # Check if the agent wants to use a tool
        if response.tool_calls:
            for tool_call in response.tool_calls:
                # ACT: Execute the selected tool
                tool_name = tool_call["name"]
                tool_args = tool_call["args"]
                tool_result = execute_tool(tool_name, tool_args, tools)

                # OBSERVE: Add the result to context
                messages.append({
                    "role": "tool",
                    "content": str(tool_result),
                    "tool_call_id": tool_call["id"]
                })
        else:
            # No tool call — the agent has its final answer
            return response.content

    return "Agent reached maximum iterations without a final answer."

# Key insight: The LLM is the "brain" that decides the control flow.
# The tools are the "hands" that interact with the world.
# The loop is the "nervous system" that connects them.
Key Insight: The agent loop is deceptively simple: Think, Act, Observe, Repeat. But the power comes from the LLM's ability to reason about observations. When a database query returns no results, the agent can try a different query. When an API returns an error, the agent can adjust its approach. This adaptive behavior is what makes agents fundamentally more capable than chains.

2. Tool Usage

Tools are the bridge between an LLM's reasoning ability and the real world. Without tools, an LLM can only generate text. With tools, it can search the web, query databases, execute code, send emails, and interact with any API.

2.1 API Tools

API tools connect agents to external services — weather data, search engines, financial data, and any REST or GraphQL endpoint.

# pip install langchain-core langchain-tavily requests
import os
import requests
from langchain_core.tools import tool
from langchain_tavily import TavilySearch

# API keys via environment variables
# export TAVILY_API_KEY="tvly-..."
# export WEATHER_API_KEY="your-openweathermap-key"

# Built-in search tool (requires TAVILY_API_KEY env var)
search_tool = TavilySearch(
    max_results=3,
    search_depth="advanced",
    include_answer=True
)

# Custom API tool with the @tool decorator
@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city. Use this when the user asks about weather conditions."""
    api_key = os.getenv("WEATHER_API_KEY")  # Set via environment variable
    url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"

    response = requests.get(url, timeout=10)
    if response.status_code != 200:
        return f"Error: Could not fetch weather for {city}. Status: {response.status_code}"

    data = response.json()
    return (
        f"Weather in {city}: {data['weather'][0]['description']}, "
        f"Temperature: {data['main']['temp']}C, "
        f"Humidity: {data['main']['humidity']}%, "
        f"Wind: {data['wind']['speed']} m/s"
    )

@tool
def get_stock_price(symbol: str) -> str:
    """Get the current stock price for a ticker symbol. Use for financial queries."""
    url = f"https://api.example.com/stock/{symbol}"
    response = requests.get(url, timeout=10)
    if response.status_code == 200:
        data = response.json()
        return f"{symbol}: ${data['price']:.2f} ({data['change']:+.2f}%)"
    return f"Error: Could not fetch stock price for {symbol}"

# Inspect tool metadata (LangChain uses this to tell the LLM about tools)
print(get_weather.name)         # "get_weather"
print(get_weather.description)  # "Get the current weather..."
print(get_weather.args_schema.model_json_schema())  # JSON schema for arguments

2.2 Database Tools

Database tools allow agents to query structured data directly, turning natural language questions into SQL queries.

# pip install langchain-community
from langchain_core.tools import tool
from langchain_community.utilities import SQLDatabase
from langchain_community.tools import (
    QuerySQLDatabaseTool,
    InfoSQLDatabaseTool,
    ListSQLDatabaseTool
)

# Connect to a database (replace with your database URI)
db = SQLDatabase.from_uri("sqlite:///company.db")

# Tool to list tables
list_tables = ListSQLDatabaseTool(db=db)

# Tool to get table schema/info
table_info = InfoSQLDatabaseTool(db=db)

# Tool to execute SQL queries
query_tool = QuerySQLDatabaseTool(db=db)

# Custom safe query tool with guardrails
@tool
def safe_sql_query(query: str) -> str:
    """Execute a READ-ONLY SQL query against the company database.
    Only SELECT statements are allowed. Never use DELETE, UPDATE, INSERT, or DROP.
    Always limit results to 50 rows maximum."""

    # Safety check — block destructive queries
    forbidden = ["DELETE", "UPDATE", "INSERT", "DROP", "ALTER", "TRUNCATE"]
    query_upper = query.upper().strip()

    for keyword in forbidden:
        if keyword in query_upper:
            return f"Error: {keyword} statements are not allowed. Only SELECT queries are permitted."

    # Enforce LIMIT to prevent huge result sets
    if "LIMIT" not in query_upper:
        query = query.rstrip(";") + " LIMIT 50;"

    try:
        result = db.run(query)
        return result if result else "Query returned no results."
    except Exception as e:
        return f"SQL Error: {str(e)}. Please check your query syntax."

2.3 Code Execution Tools

Code execution tools allow agents to write and run code, enabling mathematical computations, data analysis, and complex transformations.

# pip install langchain-experimental langchain-core
from langchain_experimental.tools import PythonREPLTool
from langchain_core.tools import tool
import subprocess
import tempfile

# LangChain's built-in Python REPL tool (use with caution in production)
python_repl = PythonREPLTool()

# Custom sandboxed code execution tool
@tool
def execute_python(code: str) -> str:
    """Execute Python code and return the output. Use for calculations,
    data analysis, or any task that requires computation.
    The code runs in a sandboxed environment with numpy and pandas available."""

    # Write code to a temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        # Prepend common imports
        full_code = "import numpy as np\nimport pandas as pd\nimport json\n\n" + code
        f.write(full_code)
        f.flush()

        try:
            result = subprocess.run(
                ['python', f.name],
                capture_output=True,
                text=True,
                timeout=30  # 30-second timeout
            )

            output = result.stdout
            if result.stderr:
                output += f"\nStderr: {result.stderr}"

            return output if output.strip() else "Code executed successfully (no output)."
        except subprocess.TimeoutExpired:
            return "Error: Code execution timed out (30 second limit)."
        except Exception as e:
            return f"Execution error: {str(e)}"

# Calculator tool (simpler alternative)
@tool
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression. Examples: '2 + 2', 'sqrt(144)', '3.14 * 5**2'.
    Supports basic arithmetic, exponents, and common math functions."""
    import math

    # Safe evaluation with limited scope
    allowed_names = {
        k: v for k, v in math.__dict__.items() if not k.startswith("__")
    }
    allowed_names.update({"abs": abs, "round": round, "min": min, "max": max})

    try:
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return f"{expression} = {result}"
    except Exception as e:
        return f"Error evaluating '{expression}': {str(e)}"

2.4 Building Custom Tools

The @tool decorator is the fastest path, but for complex tools, you should use the StructuredTool class or the BaseTool base class.

# pip install langchain-core pydantic
from langchain_core.tools import StructuredTool, BaseTool
from pydantic import BaseModel, Field
from typing import Optional, Type

# Method 1: StructuredTool with Pydantic schema
class SendEmailInput(BaseModel):
    to: str = Field(description="Recipient email address")
    subject: str = Field(description="Email subject line")
    body: str = Field(description="Email body content")
    cc: Optional[str] = Field(default=None, description="CC email address")

def send_email_func(to: str, subject: str, body: str, cc: Optional[str] = None) -> str:
    """Send an email to the specified recipient."""
    # In production, this would use SMTP or an email API
    recipients = f"To: {to}" + (f", CC: {cc}" if cc else "")
    return f"Email sent successfully. {recipients}, Subject: '{subject}'"

send_email_tool = StructuredTool.from_function(
    func=send_email_func,
    name="send_email",
    description="Send an email to a recipient. Use when the user asks to send or compose an email.",
    args_schema=SendEmailInput,
    return_direct=False  # True = return tool output directly without LLM processing
)

# Method 2: BaseTool subclass (full control)
class JiraTicketTool(BaseTool):
    name: str = "create_jira_ticket"
    description: str = (
        "Create a Jira ticket. Use when the user wants to create a bug report, "
        "feature request, or task in the project management system."
    )
    args_schema: Type[BaseModel] = None  # Will use the schema below

    class JiraInput(BaseModel):
        title: str = Field(description="Ticket title/summary")
        description: str = Field(description="Detailed description")
        priority: str = Field(
            default="Medium",
            description="Priority: Low, Medium, High, Critical"
        )
        ticket_type: str = Field(
            default="Task",
            description="Type: Bug, Task, Story, Epic"
        )

    args_schema: Type[BaseModel] = JiraInput

    def _run(self, title: str, description: str,
             priority: str = "Medium", ticket_type: str = "Task") -> str:
        # In production: call Jira REST API
        ticket_id = f"PROJ-{hash(title) % 10000}"
        return (
            f"Jira ticket created: {ticket_id}\n"
            f"Type: {ticket_type} | Priority: {priority}\n"
            f"Title: {title}"
        )

    async def _arun(self, **kwargs) -> str:
        """Async version for non-blocking execution."""
        return self._run(**kwargs)
Tool Design Principles

Writing Effective Tool Descriptions

The tool description is the single most important factor in whether an agent uses a tool correctly. Follow these principles:

  1. State WHEN to use the tool — "Use when the user asks about weather conditions"
  2. State what it RETURNS — "Returns current temperature, humidity, and conditions"
  3. State LIMITATIONS — "Only works for cities, not coordinates. Max 5 requests per minute"
  4. Include EXAMPLES — "Examples: 'New York', 'London', 'Tokyo'"
  5. Describe PARAMETER FORMAT — "City name as a string, e.g., 'San Francisco' (not 'SF')"
Tool Design Descriptions Agent UX

3. Agent Types

Different agent architectures use different strategies for deciding which tools to call and how to reason about their outputs. Understanding these types is essential for choosing the right approach for your application.

3.1 ReAct Agents

ReAct (Reasoning + Acting) is the foundational agent paradigm. The agent explicitly alternates between reasoning (thinking about what to do) and acting (using a tool), producing a visible chain of thought.

# ReAct agent trace — what happens internally
# User: "What is the population of France and is it larger than Germany?"

"""
Thought: I need to find the population of France and Germany, then compare them.
         Let me start with France.
Action: search
Action Input: "population of France 2024"
Observation: France has a population of approximately 68.17 million people (2024).

Thought: Good, I have France's population. Now I need Germany's population.
Action: search
Action Input: "population of Germany 2024"
Observation: Germany has a population of approximately 84.48 million people (2024).

Thought: I now have both populations.
         France: ~68.17 million
         Germany: ~84.48 million
         Germany's population is larger than France's by about 16.3 million.
         I can now give the final answer.
Final Answer: France has a population of approximately 68.17 million,
             while Germany has approximately 84.48 million.
             Germany's population is larger by about 16.3 million people.
"""

# The ReAct prompt template enforces this Thought/Action/Observation structure
REACT_PROMPT = """Answer the following questions as best you can.
You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}"""

3.2 OpenAI Function-Calling Agents

OpenAI function-calling agents use the native function-calling API, where the model outputs structured JSON specifying which function to call and with what arguments. This is more reliable than text-based ReAct parsing.

# pip install langchain-openai
# Assumes get_weather, calculator, search_tool are defined above (Section 2)
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

# OpenAI models natively support function/tool calling
# Requires OPENAI_API_KEY environment variable
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Bind tools to the model — tells the LLM what tools are available
tools = [get_weather, calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)

# The model returns structured tool calls (not text-based)
response = llm_with_tools.invoke([
    HumanMessage(content="What's the weather in Paris and what is 32C in Fahrenheit?")
])

# response.tool_calls is a structured list:
# [
#   {"name": "get_weather", "args": {"city": "Paris"}, "id": "call_abc123"},
#   {"name": "calculator", "args": {"expression": "32 * 9/5 + 32"}, "id": "call_def456"}
# ]
# Note: The model can call MULTIPLE tools in parallel!

for tc in response.tool_calls:
    print(f"Tool: {tc['name']}, Args: {tc['args']}")

# Important: When the model decides to call tools, response.content is EMPTY ('')
# and finish_reason is 'tool_calls' — the model produces structured calls, not text.
# To get a final text answer, you must:
#   1. Execute each tool call and collect results
#   2. Send results back as ToolMessage objects
#   3. Invoke the model again — it then synthesizes a text response
# The AgentExecutor (Section 4) automates this entire loop for you.

3.3 Tool-Calling Agents (Universal)

Tool-calling is the modern, provider-agnostic approach that works with OpenAI, Anthropic, Google, and other LLM providers. It supersedes the older "function-calling" terminology.

# pip install langchain-anthropic langchain-google-genai langchain-core
# Requires environment variables:
#   export ANTHROPIC_API_KEY="sk-ant-..."
#   export GOOGLE_API_KEY="AIza..."
import math
from langchain_core.tools import tool
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI

# Define tools inline so this block is self-contained
@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Weather in {city}: 22°C, partly cloudy"

@tool
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression."""
    allowed = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
    allowed.update({"abs": abs, "round": round, "min": min, "max": max})
    result = eval(expression, {"__builtins__": {}}, allowed)
    return f"{expression} = {result}"

# Tool-calling works across providers — same interface, different models
# Anthropic Claude
claude = ChatAnthropic(model="claude-sonnet-4-20250514")
claude_with_tools = claude.bind_tools([get_weather, calculator])

# Google Gemini
gemini = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
gemini_with_tools = gemini.bind_tools([get_weather, calculator])

# The interface is identical regardless of provider
response = claude_with_tools.invoke("What's 2^10?")
for tc in response.tool_calls:
    print(f"Tool: {tc['name']}, Args: {tc['args']}")

# Comparison: ReAct vs Function-Calling vs Tool-Calling
comparison = {
    "ReAct (text-based)": {
        "pros": "Visible reasoning, works with any LLM, interpretable",
        "cons": "Brittle text parsing, slower (verbose output), error-prone",
        "best_for": "Debugging, open-source models, educational purposes"
    },
    "Function-Calling (OpenAI)": {
        "pros": "Structured JSON output, reliable, parallel calls",
        "cons": "OpenAI-specific, less visible reasoning",
        "best_for": "OpenAI-based production systems"
    },
    "Tool-Calling (Universal)": {
        "pros": "Provider-agnostic, structured, modern standard",
        "cons": "Requires provider support (most major providers now support it)",
        "best_for": "New projects, multi-provider setups, production systems"
    }
}

for name, details in comparison.items():
    print(f"\n{name}: {details['best_for']}")
Common Mistake: Using ReAct text-based agents in production when tool-calling is available. Text-based ReAct requires fragile regex parsing of "Action:" and "Action Input:" from the LLM output. Tool-calling returns structured JSON, which is far more reliable. Always prefer tool-calling agents for production applications.

4. LangChain Implementation

LangChain provides several high-level APIs for building agents. We will cover the most important ones: create_react_agent, AgentExecutor, and the modern create_tool_calling_agent.

4.1 create_react_agent

Why langchain-classic? The AgentExecutor, create_react_agent, create_tool_calling_agent, and hub.pull() were moved from the main langchain package into langchain-classic starting with LangChain v0.3. This separate package contains legacy agent runtimes, chains, memory, and hub integrations. Install it with pip install langchain-classic. For new agent development, LangChain recommends LangGraph (covered in Part 8).
# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator, get_weather are defined above (Section 2)
from langchain_classic.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_classic import hub

# Pull the standard ReAct prompt from LangChain Hub
prompt = hub.pull("hwchase17/react")

# Create the LLM (requires OPENAI_API_KEY env var)
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Define tools (defined in Section 2 above)
tools = [search_tool, calculator, get_weather]

# Create the ReAct agent
agent = create_react_agent(
    llm=llm,
    tools=tools,
    prompt=prompt
)

# Wrap in AgentExecutor (the runtime that manages the loop)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,           # Print reasoning steps
    max_iterations=10,      # Safety limit on loops
    max_execution_time=60,  # 60-second timeout
    handle_parsing_errors=True,  # Graceful error recovery
    return_intermediate_steps=True  # Include reasoning in output
)

# Run the agent
result = agent_executor.invoke({
    "input": "What is the weather in Tokyo and how does the temperature "
             "convert to Fahrenheit?"
})

print(result["output"])
# Also available: result["intermediate_steps"] — full reasoning trace

4.2 AgentExecutor Deep Dive

The AgentExecutor is the runtime that manages the agent loop. Understanding its configuration options is crucial for building reliable agents.

# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator, safe_sql_query, get_weather from Sections 2.1-2.2
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Modern approach: create_tool_calling_agent (recommended for production)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant with access to tools. "
               "Always explain your reasoning before and after using tools. "
               "If a tool returns an error, explain the error and try an alternative approach."),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_tool, calculator, safe_sql_query, get_weather]

# Create tool-calling agent (structured, reliable)
agent = create_tool_calling_agent(llm, tools, prompt)

# Configure AgentExecutor with production settings
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,

    # --- Iteration Control ---
    max_iterations=15,        # Max reasoning loops (prevents infinite loops)
    max_execution_time=120,   # Total timeout in seconds
    early_stopping_method="generate",  # "force" or "generate"
    # "force" = hard stop, "generate" = ask LLM to summarize what it has so far

    # --- Error Handling ---
    handle_parsing_errors=True,  # Catch and retry on parse errors
    # Can also pass a function: handle_parsing_errors=my_error_handler

    # --- Output Control ---
    return_intermediate_steps=True,   # Include full reasoning trace
    # trim_intermediate_steps=10,     # Keep only last N steps (memory management)
)

# Invoke with chat history for conversational agents
result = executor.invoke({
    "input": "Find the top 5 customers by revenue from last quarter",
    "chat_history": []  # Previous messages for context
})

# Access results
print("Answer:", result["output"])
print("Steps:", len(result["intermediate_steps"]))
for step in result["intermediate_steps"]:
    action, observation = step
    print(f"  Tool: {action.tool}, Input: {action.tool_input}")
    print(f"  Result: {observation[:100]}...")

4.3 Structured Tool Agent — Full Example

Let us build a complete research assistant agent that combines search, calculation, and code execution.

# pip install langchain-classic langchain-openai langchain-tavily
# Full standalone example — copy-paste and run
import os
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain_tavily import TavilySearch

# API keys via environment variables
# export OPENAI_API_KEY="sk-..."
# export TAVILY_API_KEY="tvly-..."

# --- Define Tools ---
search = TavilySearch(max_results=5, search_depth="advanced")

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression. Supports arithmetic, exponents,
    and common math functions (sqrt, log, sin, cos, pi, e).
    Examples: '2**10', 'sqrt(144)', 'log(1000, 10)', 'pi * 5**2'"""
    import math
    safe_dict = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
    safe_dict.update({"abs": abs, "round": round})
    try:
        return str(eval(expression, {"__builtins__": {}}, safe_dict))
    except Exception as e:
        return f"Error: {e}. Please check your expression."

@tool
def run_python(code: str) -> str:
    """Execute Python code for data analysis or complex computation.
    Has access to: math, statistics, json, datetime, collections.
    Print results to see them. Code runs in an isolated environment."""
    import io, contextlib
    output = io.StringIO()
    safe_globals = {"__builtins__": __builtins__}
    try:
        with contextlib.redirect_stdout(output):
            exec(code, safe_globals)
        result = output.getvalue()
        return result if result.strip() else "Code executed (no printed output)."
    except Exception as e:
        return f"Error: {type(e).__name__}: {e}"

# --- Build Agent ---
system_prompt = """You are a research assistant with access to web search,
a calculator, and Python code execution.

Guidelines:
1. Always search for current information rather than relying on training data
2. Use the calculator for simple math, Python for complex analysis
3. Cite your sources when presenting search results
4. If a tool returns an error, explain what went wrong and try again
5. Break complex questions into smaller sub-tasks"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder("chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad")
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search, calculate, run_python]

agent = create_tool_calling_agent(llm, tools, prompt)
research_agent = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=12,
    max_execution_time=90,
    handle_parsing_errors=True,
    return_intermediate_steps=True
)

# --- Use the Agent ---
result = research_agent.invoke({
    "input": "What is the current GDP of Japan in USD? "
             "Calculate what percentage of US GDP that represents.",
    "chat_history": []
})

print(result["output"])
Architecture Pattern

Agent vs Chain Decision Framework

Use this decision framework when building new features:

  • Use a Chain when: The steps are known in advance, the workflow is linear, and you need predictable latency and cost
  • Use an Agent when: The steps depend on intermediate results, the user query could require different tools, or you need error recovery and adaptive behavior
  • Use Both: Many production systems use agents for the outer loop and chains for individual steps — an agent decides what to do, then invokes a chain for each task
Architecture Decision Chain vs Agent Production Pattern

5. Agent Memory

Agents need memory for two reasons: conversational continuity (remembering what the user said earlier) and task continuity (remembering what tools were called and what results were returned within a single complex task).

5.1 Conversational Agents

Conversational agents extend basic tool-using agents with multi-turn memory — they maintain a chat_history buffer that lets them understand follow-up questions, resolve pronouns ("what about its neighbor?"), and build on previous answers. Without conversation history, every user message would be interpreted in isolation, making natural dialogue impossible. The agent loop remains the same (observe → think → act), but the prompt now includes the full conversation, giving the LLM context to be a coherent conversational partner.

# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator are defined above (Sections 2.1, 2.3)
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

# Prompt with chat_history placeholder for multi-turn conversations
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant with tools. "
               "Use chat history to maintain context across turns."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad")
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_tool, calculator]

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Maintain conversation history
chat_history = []

def chat(user_input: str) -> str:
    """Send a message and maintain conversation history."""
    result = executor.invoke({
        "input": user_input,
        "chat_history": chat_history
    })

    # Update history
    chat_history.append(HumanMessage(content=user_input))
    chat_history.append(AIMessage(content=result["output"]))

    return result["output"]

# Multi-turn conversation
print(chat("What's the population of Brazil?"))
# Agent searches → "Brazil has ~216 million people"

print(chat("How does that compare to its neighbor Argentina?"))
# Agent uses chat_history to understand "its neighbor" refers to Brazil
# Searches → "Argentina has ~46 million, Brazil is ~4.7x larger"

print(chat("Calculate the population density of both if Brazil is 8.5M km² and Argentina is 2.8M km²"))
# Agent uses calculator with context from previous turns

5.2 Memory Strategies for Agents

Not all conversations fit in a single context window, so agents need strategies for managing memory over long interactions. Three common approaches trade off between fidelity and token efficiency: sliding window (keep the last N messages, simple but loses early context), summary memory (use an LLM to compress older messages into a running summary), and vector memory (embed all messages and retrieve the most semantically relevant ones for each new query). Production agents often combine multiple strategies.

# pip install langchain-openai langchain-community chromadb
# Three memory strategies for managing agent conversation history

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import AIMessage
from langchain_community.vectorstores import Chroma

# Strategy 1: Sliding Window (keep last N messages)
def sliding_window_memory(chat_history, max_messages=20):
    """Keep only the most recent messages to fit context window."""
    if len(chat_history) > max_messages:
        return chat_history[-max_messages:]
    return chat_history

# Strategy 2: Summary Memory (compress old messages)

summarizer = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def summary_memory(chat_history, summary_threshold=20):
    """Summarize old messages when history gets too long."""
    if len(chat_history) <= summary_threshold:
        return chat_history

    # Summarize the older half
    old_messages = chat_history[:summary_threshold // 2]
    recent_messages = chat_history[summary_threshold // 2:]

    old_text = "\n".join([f"{m.type}: {m.content}" for m in old_messages])
    summary = summarizer.invoke(
        f"Summarize this conversation concisely:\n{old_text}"
    )

    return [AIMessage(content=f"[Summary of earlier conversation: {summary.content}]")] + recent_messages

# Strategy 3: Relevant Memory (retrieve relevant past context)
# Uses Chroma and OpenAIEmbeddings imported above

class VectorMemory:
    """Store all messages in a vector store, retrieve relevant ones."""

    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.store = Chroma(embedding_function=self.embeddings)
        self.all_messages = []

    def add_message(self, role: str, content: str):
        self.all_messages.append({"role": role, "content": content})
        self.store.add_texts(
            texts=[content],
            metadatas=[{"role": role, "index": len(self.all_messages) - 1}]
        )

    def get_relevant_history(self, query: str, k: int = 5):
        """Retrieve the most relevant past messages for the current query."""
        docs = self.store.similarity_search(query, k=k)
        return [d.page_content for d in docs]
Key Insight: Agent memory is more complex than chain memory because agents produce intermediate reasoning steps (tool calls, observations) in addition to user/assistant messages. You need to decide whether to include these intermediate steps in the memory or only the final answers. Including intermediate steps gives the agent more context for future turns but consumes more tokens. A common production pattern is to summarize intermediate steps into a concise result before adding to memory.

6. Error Handling & Debugging

Agents are inherently less predictable than chains. They can enter infinite loops, call wrong tools, misinterpret observations, or generate invalid tool inputs. Robust error handling and debugging are essential.

6.1 Common Agent Failures

Agents fail in predictable ways, and understanding these failure modes is the first step to building robust systems. The five most common failures are: infinite loops (the agent keeps calling the same tool), wrong tool selection (choosing a tool that can’t answer the query), invalid arguments (passing malformed inputs to tools), hallucinated tool names (inventing tools that don’t exist), and context overflow (accumulating so much tool output that the context window overflows). The implementation below demonstrates a create_safe_tool wrapper that adds retry logic and error boundaries around any tool.

# Common agent failure modes and their solutions

# Failure 1: Infinite Loop — Agent keeps calling the same tool
# Symptom: Agent calls search -> gets result -> searches again -> same result -> ...
# Solution: max_iterations + clear stopping criteria in the prompt

# Failure 2: Wrong Tool Selection
# Symptom: Agent uses calculator when it should use search
# Solution: Better tool descriptions, few-shot examples in prompt

# Failure 3: Invalid Tool Arguments
# Symptom: Agent passes "What is Python?" to calculator
# Solution: Pydantic schemas for tool args, handle_parsing_errors=True

# Failure 4: Hallucinated Tool Names
# Symptom: Agent tries to call "web_browser" which doesn't exist
# Solution: List available tools explicitly in the system prompt

# Failure 5: Context Window Overflow
# Symptom: Agent accumulates too many observations and hits token limit
# Solution: trim_intermediate_steps, summary memory, observation truncation

# Production error handling wrapper
from langchain_core.tools import tool

def create_safe_tool(func, max_retries=2):
    """Wrap any tool function with retry logic and error handling."""
    original_tool = tool(func)
    original_run = original_tool._run

    def safe_run(*args, **kwargs):
        for attempt in range(max_retries + 1):
            try:
                result = original_run(*args, **kwargs)
                if result and not result.startswith("Error"):
                    return result
                if attempt < max_retries:
                    continue
                return result
            except Exception as e:
                if attempt < max_retries:
                    continue
                return f"Tool failed after {max_retries + 1} attempts: {str(e)}"
        return "Tool failed: unknown error"

    original_tool._run = safe_run
    return original_tool

6.2 Debugging Techniques

When an agent produces unexpected results, you need visibility into its reasoning chain. LangChain provides three levels of debugging: verbose mode (prints the prompt and parsed output at each step), debug mode (logs every LLM call, tool invocation, and chain event), and custom callback handlers (programmatic access to all events for custom logging, metrics, or alerting). The AgentDebugHandler below tracks step counts, tool usage, errors, and timing — essential telemetry for production agent monitoring.

# pip install langchain langchain-core
# Assumes 'executor' is an AgentExecutor from Section 4.2 above
import logging
from langchain.globals import set_verbose, set_debug

# Level 1: Verbose mode — see agent reasoning
set_verbose(True)

# Level 2: Debug mode — see every LLM call, prompt, and response
set_debug(True)

# Level 3: Custom callback handler for fine-grained logging
from langchain_core.callbacks import BaseCallbackHandler

class AgentDebugHandler(BaseCallbackHandler):
    """Custom callback handler for agent debugging."""

    def __init__(self):
        self.step_count = 0
        self.tool_calls = []
        self.errors = []

    def on_agent_action(self, action, **kwargs):
        self.step_count += 1
        self.tool_calls.append({
            "step": self.step_count,
            "tool": action.tool,
            "input": action.tool_input
        })
        print(f"\n--- Step {self.step_count} ---")
        print(f"Tool: {action.tool}")
        print(f"Input: {action.tool_input}")

    def on_tool_end(self, output, **kwargs):
        print(f"Output: {str(output)[:200]}...")

    def on_agent_finish(self, finish, **kwargs):
        print(f"\n--- Agent Finished in {self.step_count} steps ---")
        print(f"Tools used: {[tc['tool'] for tc in self.tool_calls]}")

    def on_tool_error(self, error, **kwargs):
        self.errors.append(str(error))
        print(f"TOOL ERROR: {error}")

# Use the debug handler
debug_handler = AgentDebugHandler()
result = executor.invoke(
    {"input": "What's the GDP growth rate of India?"},
    config={"callbacks": [debug_handler]}
)

# After execution, inspect the debug data
print(f"Total steps: {debug_handler.step_count}")
print(f"Tools called: {debug_handler.tool_calls}")
print(f"Errors: {debug_handler.errors}")

6.3 LangSmith Tracing

LangSmith is LangChain’s hosted observability platform that provides production-grade tracing for agent applications. Once configured via environment variables, it automatically captures the full execution tree — every LLM call (with prompts and completions), tool inputs and outputs, latency per step, token usage, and error traces. This is invaluable for debugging why an agent chose the wrong tool, identifying slow steps, and monitoring costs across thousands of production requests.

# pip install langsmith
# LangSmith provides production-grade tracing for agents
# Set up environment variables
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY", "")  # Your LangSmith API key
os.environ["LANGCHAIN_PROJECT"] = "agent-debugging"

# Every agent invocation is now automatically traced!
# You can view:
# - Full reasoning chain with Thought/Action/Observation
# - Each LLM call with prompt and response
# - Tool inputs and outputs
# - Latency for each step
# - Token usage and cost

# Programmatic access to traces
from langsmith import Client

client = Client()

# List recent runs for your project
runs = client.list_runs(
    project_name="agent-debugging",
    execution_order=1,  # Top-level runs only
    limit=10
)

for run in runs:
    print(f"Run: {run.name}")
    print(f"  Status: {run.status}")
    print(f"  Latency: {run.end_time - run.start_time}")
    print(f"  Tokens: {run.total_tokens}")
    print(f"  Cost: ${run.total_cost:.4f}")

    # Get child runs (individual LLM calls, tool calls)
    children = client.list_runs(parent_run_id=run.id)
    for child in children:
        print(f"    - {child.run_type}: {child.name} ({child.total_tokens} tokens)")
Production Checklist

Agent Reliability Checklist

Before deploying any agent to production, verify:

  1. Max iterations set — Prevent infinite loops (typically 10-15)
  2. Execution timeout set — Prevent runaway costs (30-120 seconds)
  3. Error handling enabledhandle_parsing_errors=True
  4. Tool descriptions are unambiguous — Test with edge cases
  5. Destructive tools have confirmation — Never auto-delete or auto-send without guardrails
  6. Tracing enabled — LangSmith or equivalent for observability
  7. Cost monitoring — Agents can be expensive; set per-request budgets
  8. Fallback behavior defined — What happens when the agent gives up?
Reliability Production Checklist

Exercises & Self-Assessment

Exercise 1

Build a Multi-Tool Agent

Create an agent with at least 4 custom tools:

  1. A web search tool (use Tavily or SerpAPI)
  2. A calculator tool for mathematical expressions
  3. A date/time tool that returns the current date, time, or calculates date differences
  4. A unit converter tool (temperature, distance, weight)

Test the agent with queries that require combining multiple tools: "What's the weather in London in Fahrenheit, and how many days until Christmas?"

Exercise 2

ReAct vs Tool-Calling Comparison

Build the same agent using both approaches:

  1. Build a ReAct agent with create_react_agent
  2. Build a tool-calling agent with create_tool_calling_agent
  3. Run the same 10 queries through both and compare: accuracy, latency, token usage, and failure modes
  4. Document which approach is more reliable and why
Exercise 3

Agent Error Recovery

Create tools that intentionally fail some percentage of the time, then build an agent that handles these failures gracefully:

  1. Create a tool that raises an exception 30% of the time
  2. Create a tool that returns "API rate limited" 20% of the time
  3. Observe how the agent responds to these failures
  4. Improve the system prompt to teach the agent better error recovery strategies
Exercise 4

Conversational Agent with Memory

Build a conversational research agent that maintains context across multiple turns:

  1. Implement sliding window memory with a configurable window size
  2. Test with a 10-turn conversation where later questions reference earlier answers
  3. Experiment with different window sizes (5, 10, 20) and observe how context loss affects quality
  4. Implement summary memory and compare against sliding window
Exercise 5

Reflective Questions

  1. Why is tool description quality more important than tool implementation quality for agent reliability?
  2. Explain the trade-off between max_iterations (more iterations = more capable but more expensive). How would you determine the right value for a production agent?
  3. When would you choose a chain over an agent, even if the agent could technically handle the task?
  4. How does the agent decision loop relate to the OODA loop (Observe, Orient, Decide, Act) from military strategy?
  5. What are the security implications of giving an agent access to a database tool or code execution tool?

Agent Specification Document Generator

Design and document an AI agent's architecture, tools, and behavior. Download as Word, Excel, PDF, or PowerPoint.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

You now understand the core concepts that power AI agents — the most important building block in modern AI applications. Here are the key takeaways from Part 7:

  • Agents vs Chains — Chains have developer-defined control flow; agents let the LLM decide what to do at runtime based on observations
  • The Agent Decision Loop — Think, Act, Observe, Repeat — is the universal pattern underlying all agent architectures
  • Tool Usage — APIs, databases, and code execution tools give agents the ability to interact with the real world beyond text generation
  • Agent Types — ReAct (text-based reasoning), function-calling (OpenAI-specific), and tool-calling (universal) each have distinct trade-offs
  • LangChain Implementationcreate_tool_calling_agent + AgentExecutor is the recommended production approach
  • Agent Memory — Sliding window, summary, and vector-based strategies each serve different use cases
  • Error Handling & Debugging — Max iterations, timeouts, parsing error handling, and LangSmith tracing are essential for reliability

Next in the Series

In Part 8: LangGraph — Stateful Agent Workflows, we move beyond simple agent loops to graph-based architectures. Learn how LangGraph's StateGraph enables complex workflows with conditional branching, cycles, persistence, subgraphs, and human-in-the-loop patterns that are impossible with AgentExecutor alone.

Technology