Introduction: From Chains to Agents
Series Overview: This is Part 7 of our 18-part AI Application Development Mastery series. We now enter the agent domain — where AI applications move from executing predefined chains to autonomously deciding what actions to take, which tools to use, and how to handle unexpected situations.
1
Foundations & Evolution of AI Apps
Pre-LLM era, transformers, LLM revolution
2
LLM Fundamentals for Developers
Tokens, context windows, sampling, API patterns
3
Prompt Engineering Mastery
Zero/few-shot, CoT, ReAct, structured outputs
4
LangChain Core Concepts
Chains, prompts, LLMs, tools, LCEL
5
Retrieval-Augmented Generation (RAG)
Embeddings, vector DBs, retrievers, RAG pipelines
6
Memory & Context Engineering
Buffer/summary/vector memory, chunking, re-ranking
7
Agents — Core of Modern AI Apps
ReAct, tool-calling, planner-executor agents
You Are Here
8
LangGraph — Stateful Agent Workflows
Nodes, edges, state, graph execution, cycles
9
Deep Agents & Autonomous Systems
Multi-step reasoning, self-reflection, planning
10
Multi-Agent Systems
Supervisor, swarm, debate, role-based collaboration
11
AI Application Design Patterns
RAG, chat+memory, workflow automation, agent loops
12
Ecosystem & Frameworks
LlamaIndex, Haystack, HuggingFace, vLLM
13
MCP Foundations & Architecture
Protocol design, Host/Client/Server, primitives, security
14
MCP in Production
Building servers, integrations, scaling, agent systems
15
Evaluation & LLMOps
Prompt eval, tracing, LangSmith, experiment tracking
16
Production AI Systems
APIs, queues, caching, streaming, scaling
17
Safety, Guardrails & Reliability
Input filtering, hallucination mitigation, prompt injection
18
Advanced Topics
Fine-tuning, tool learning, hybrid LLM+symbolic
19
Building Real AI Applications
Chatbot, document QA, coding assistant, full-stack
20
Future of AI Applications
Autonomous agents, self-improving, multi-modal, AI OS
In Parts 1 through 6, we built the foundations: LLM fundamentals, prompt engineering, LangChain chains, RAG pipelines, and memory systems. All of those are powerful, but they share a fundamental limitation — the developer decides the control flow at design time. A chain always executes the same sequence of steps, regardless of the input.
Agents fundamentally change this equation. An agent is a system where the LLM decides at runtime which actions to take, which tools to invoke, and in what order. The developer provides the tools and the reasoning framework; the agent figures out the rest. This is the bridge between static AI pipelines and truly intelligent, autonomous applications.
Key Insight: The difference between a chain and an agent is the difference between a GPS giving turn-by-turn directions (chain) and a human driver who can reroute around accidents, stop for gas, and make judgment calls (agent). Both get you to the destination, but only one can handle the unexpected.
1. Agents vs Chains
Understanding the precise distinction between agents and chains is critical. It is not a matter of complexity — it is a matter of who controls the execution flow.
1.1 Chain Limitations
Chains (covered in Part 4) follow a predetermined sequence. Consider a customer support chain:
# pip install langchain-openai langchain-core
# A chain: fixed sequence, developer-controlled flow
import os
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
# Set your API key via environment variable
# export OPENAI_API_KEY="sk-..."
os.environ.setdefault("OPENAI_API_KEY", os.getenv("OPENAI_API_KEY", ""))
# This chain ALWAYS: classify → retrieve → respond
# It cannot decide to skip retrieval or call an external API
classify_prompt = ChatPromptTemplate.from_template(
"Classify this customer query into one of: billing, technical, general.\n"
"Query: {query}\nCategory:"
)
response_prompt = ChatPromptTemplate.from_template(
"You are a support agent. The query category is {category}.\n"
"Relevant docs: {docs}\n"
"Customer query: {query}\n"
"Provide a helpful response."
)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Fixed pipeline — no runtime decision-making
chain = classify_prompt | llm | StrOutputParser()
# Run the chain
result = chain.invoke({"query": "Cancel my subscription"})
print(result)
# Problem: What if the customer says "Cancel my subscription"?
# The chain cannot look up their account, check billing status,
# or actually perform the cancellation — it can only generate text.
The chain above works for simple question-answering, but it cannot take action. When a customer says "Cancel my subscription," the chain can only generate text about cancellation — it cannot actually check their account, verify their identity, or process the cancellation.
1.2 What Makes an Agent
An agent has three defining characteristics that set it apart from a chain:
| Characteristic |
Chain |
Agent |
| Control Flow |
Developer-defined at design time |
LLM decides at runtime |
| Tool Selection |
Fixed tools in fixed order |
Chooses which tools to use (or none) |
| Iteration |
Single pass (or fixed number of passes) |
Loops until the task is complete |
| Error Recovery |
Fails or follows predefined fallback |
Can reason about errors and retry differently |
| Observation |
Does not observe tool outputs to decide next step |
Observes each tool output and reasons about what to do next |
1.3 The Agent Decision Loop
Every agent, regardless of implementation, follows this fundamental loop:
# The universal agent decision loop (pseudocode)
# This illustrates the concept — not meant to be run directly.
# See Section 4 for runnable LangChain implementations.
def execute_tool(tool_name, tool_args, tools):
"""Look up and invoke the named tool (placeholder)."""
for t in tools:
if t.name == tool_name:
return t.invoke(tool_args)
return f"Tool '{tool_name}' not found."
def agent_loop(llm, user_input, tools, max_iterations=10):
"""
The fundamental loop that powers ALL agent architectures.
1. THINK — LLM reasons about the current state
2. ACT — LLM selects and invokes a tool (or returns final answer)
3. OBSERVE — Agent captures tool output
4. REPEAT — Back to THINK with new observation
"""
messages = [{"role": "user", "content": user_input}]
for i in range(max_iterations):
# THINK: LLM decides what to do
response = llm.invoke(messages, tools=tools)
# Check if the agent wants to use a tool
if response.tool_calls:
for tool_call in response.tool_calls:
# ACT: Execute the selected tool
tool_name = tool_call["name"]
tool_args = tool_call["args"]
tool_result = execute_tool(tool_name, tool_args, tools)
# OBSERVE: Add the result to context
messages.append({
"role": "tool",
"content": str(tool_result),
"tool_call_id": tool_call["id"]
})
else:
# No tool call — the agent has its final answer
return response.content
return "Agent reached maximum iterations without a final answer."
# Key insight: The LLM is the "brain" that decides the control flow.
# The tools are the "hands" that interact with the world.
# The loop is the "nervous system" that connects them.
Key Insight: The agent loop is deceptively simple: Think, Act, Observe, Repeat. But the power comes from the LLM's ability to reason about observations. When a database query returns no results, the agent can try a different query. When an API returns an error, the agent can adjust its approach. This adaptive behavior is what makes agents fundamentally more capable than chains.
2. Tool Usage
Tools are the bridge between an LLM's reasoning ability and the real world. Without tools, an LLM can only generate text. With tools, it can search the web, query databases, execute code, send emails, and interact with any API.
API tools connect agents to external services — weather data, search engines, financial data, and any REST or GraphQL endpoint.
# pip install langchain-core langchain-tavily requests
import os
import requests
from langchain_core.tools import tool
from langchain_tavily import TavilySearch
# API keys via environment variables
# export TAVILY_API_KEY="tvly-..."
# export WEATHER_API_KEY="your-openweathermap-key"
# Built-in search tool (requires TAVILY_API_KEY env var)
search_tool = TavilySearch(
max_results=3,
search_depth="advanced",
include_answer=True
)
# Custom API tool with the @tool decorator
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city. Use this when the user asks about weather conditions."""
api_key = os.getenv("WEATHER_API_KEY") # Set via environment variable
url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
response = requests.get(url, timeout=10)
if response.status_code != 200:
return f"Error: Could not fetch weather for {city}. Status: {response.status_code}"
data = response.json()
return (
f"Weather in {city}: {data['weather'][0]['description']}, "
f"Temperature: {data['main']['temp']}C, "
f"Humidity: {data['main']['humidity']}%, "
f"Wind: {data['wind']['speed']} m/s"
)
@tool
def get_stock_price(symbol: str) -> str:
"""Get the current stock price for a ticker symbol. Use for financial queries."""
url = f"https://api.example.com/stock/{symbol}"
response = requests.get(url, timeout=10)
if response.status_code == 200:
data = response.json()
return f"{symbol}: ${data['price']:.2f} ({data['change']:+.2f}%)"
return f"Error: Could not fetch stock price for {symbol}"
# Inspect tool metadata (LangChain uses this to tell the LLM about tools)
print(get_weather.name) # "get_weather"
print(get_weather.description) # "Get the current weather..."
print(get_weather.args_schema.model_json_schema()) # JSON schema for arguments
Database tools allow agents to query structured data directly, turning natural language questions into SQL queries.
# pip install langchain-community
from langchain_core.tools import tool
from langchain_community.utilities import SQLDatabase
from langchain_community.tools import (
QuerySQLDatabaseTool,
InfoSQLDatabaseTool,
ListSQLDatabaseTool
)
# Connect to a database (replace with your database URI)
db = SQLDatabase.from_uri("sqlite:///company.db")
# Tool to list tables
list_tables = ListSQLDatabaseTool(db=db)
# Tool to get table schema/info
table_info = InfoSQLDatabaseTool(db=db)
# Tool to execute SQL queries
query_tool = QuerySQLDatabaseTool(db=db)
# Custom safe query tool with guardrails
@tool
def safe_sql_query(query: str) -> str:
"""Execute a READ-ONLY SQL query against the company database.
Only SELECT statements are allowed. Never use DELETE, UPDATE, INSERT, or DROP.
Always limit results to 50 rows maximum."""
# Safety check — block destructive queries
forbidden = ["DELETE", "UPDATE", "INSERT", "DROP", "ALTER", "TRUNCATE"]
query_upper = query.upper().strip()
for keyword in forbidden:
if keyword in query_upper:
return f"Error: {keyword} statements are not allowed. Only SELECT queries are permitted."
# Enforce LIMIT to prevent huge result sets
if "LIMIT" not in query_upper:
query = query.rstrip(";") + " LIMIT 50;"
try:
result = db.run(query)
return result if result else "Query returned no results."
except Exception as e:
return f"SQL Error: {str(e)}. Please check your query syntax."
2.3 Code Execution Tools
Code execution tools allow agents to write and run code, enabling mathematical computations, data analysis, and complex transformations.
# pip install langchain-experimental langchain-core
from langchain_experimental.tools import PythonREPLTool
from langchain_core.tools import tool
import subprocess
import tempfile
# LangChain's built-in Python REPL tool (use with caution in production)
python_repl = PythonREPLTool()
# Custom sandboxed code execution tool
@tool
def execute_python(code: str) -> str:
"""Execute Python code and return the output. Use for calculations,
data analysis, or any task that requires computation.
The code runs in a sandboxed environment with numpy and pandas available."""
# Write code to a temp file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
# Prepend common imports
full_code = "import numpy as np\nimport pandas as pd\nimport json\n\n" + code
f.write(full_code)
f.flush()
try:
result = subprocess.run(
['python', f.name],
capture_output=True,
text=True,
timeout=30 # 30-second timeout
)
output = result.stdout
if result.stderr:
output += f"\nStderr: {result.stderr}"
return output if output.strip() else "Code executed successfully (no output)."
except subprocess.TimeoutExpired:
return "Error: Code execution timed out (30 second limit)."
except Exception as e:
return f"Execution error: {str(e)}"
# Calculator tool (simpler alternative)
@tool
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression. Examples: '2 + 2', 'sqrt(144)', '3.14 * 5**2'.
Supports basic arithmetic, exponents, and common math functions."""
import math
# Safe evaluation with limited scope
allowed_names = {
k: v for k, v in math.__dict__.items() if not k.startswith("__")
}
allowed_names.update({"abs": abs, "round": round, "min": min, "max": max})
try:
result = eval(expression, {"__builtins__": {}}, allowed_names)
return f"{expression} = {result}"
except Exception as e:
return f"Error evaluating '{expression}': {str(e)}"
The @tool decorator is the fastest path, but for complex tools, you should use the StructuredTool class or the BaseTool base class.
# pip install langchain-core pydantic
from langchain_core.tools import StructuredTool, BaseTool
from pydantic import BaseModel, Field
from typing import Optional, Type
# Method 1: StructuredTool with Pydantic schema
class SendEmailInput(BaseModel):
to: str = Field(description="Recipient email address")
subject: str = Field(description="Email subject line")
body: str = Field(description="Email body content")
cc: Optional[str] = Field(default=None, description="CC email address")
def send_email_func(to: str, subject: str, body: str, cc: Optional[str] = None) -> str:
"""Send an email to the specified recipient."""
# In production, this would use SMTP or an email API
recipients = f"To: {to}" + (f", CC: {cc}" if cc else "")
return f"Email sent successfully. {recipients}, Subject: '{subject}'"
send_email_tool = StructuredTool.from_function(
func=send_email_func,
name="send_email",
description="Send an email to a recipient. Use when the user asks to send or compose an email.",
args_schema=SendEmailInput,
return_direct=False # True = return tool output directly without LLM processing
)
# Method 2: BaseTool subclass (full control)
class JiraTicketTool(BaseTool):
name: str = "create_jira_ticket"
description: str = (
"Create a Jira ticket. Use when the user wants to create a bug report, "
"feature request, or task in the project management system."
)
args_schema: Type[BaseModel] = None # Will use the schema below
class JiraInput(BaseModel):
title: str = Field(description="Ticket title/summary")
description: str = Field(description="Detailed description")
priority: str = Field(
default="Medium",
description="Priority: Low, Medium, High, Critical"
)
ticket_type: str = Field(
default="Task",
description="Type: Bug, Task, Story, Epic"
)
args_schema: Type[BaseModel] = JiraInput
def _run(self, title: str, description: str,
priority: str = "Medium", ticket_type: str = "Task") -> str:
# In production: call Jira REST API
ticket_id = f"PROJ-{hash(title) % 10000}"
return (
f"Jira ticket created: {ticket_id}\n"
f"Type: {ticket_type} | Priority: {priority}\n"
f"Title: {title}"
)
async def _arun(self, **kwargs) -> str:
"""Async version for non-blocking execution."""
return self._run(**kwargs)
Tool Design Principles
Writing Effective Tool Descriptions
The tool description is the single most important factor in whether an agent uses a tool correctly. Follow these principles:
- State WHEN to use the tool — "Use when the user asks about weather conditions"
- State what it RETURNS — "Returns current temperature, humidity, and conditions"
- State LIMITATIONS — "Only works for cities, not coordinates. Max 5 requests per minute"
- Include EXAMPLES — "Examples: 'New York', 'London', 'Tokyo'"
- Describe PARAMETER FORMAT — "City name as a string, e.g., 'San Francisco' (not 'SF')"
Tool Design
Descriptions
Agent UX
3. Agent Types
Different agent architectures use different strategies for deciding which tools to call and how to reason about their outputs. Understanding these types is essential for choosing the right approach for your application.
3.1 ReAct Agents
ReAct (Reasoning + Acting) is the foundational agent paradigm. The agent explicitly alternates between reasoning (thinking about what to do) and acting (using a tool), producing a visible chain of thought.
# ReAct agent trace — what happens internally
# User: "What is the population of France and is it larger than Germany?"
"""
Thought: I need to find the population of France and Germany, then compare them.
Let me start with France.
Action: search
Action Input: "population of France 2024"
Observation: France has a population of approximately 68.17 million people (2024).
Thought: Good, I have France's population. Now I need Germany's population.
Action: search
Action Input: "population of Germany 2024"
Observation: Germany has a population of approximately 84.48 million people (2024).
Thought: I now have both populations.
France: ~68.17 million
Germany: ~84.48 million
Germany's population is larger than France's by about 16.3 million.
I can now give the final answer.
Final Answer: France has a population of approximately 68.17 million,
while Germany has approximately 84.48 million.
Germany's population is larger by about 16.3 million people.
"""
# The ReAct prompt template enforces this Thought/Action/Observation structure
REACT_PROMPT = """Answer the following questions as best you can.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought: {agent_scratchpad}"""
3.2 OpenAI Function-Calling Agents
OpenAI function-calling agents use the native function-calling API, where the model outputs structured JSON specifying which function to call and with what arguments. This is more reliable than text-based ReAct parsing.
# pip install langchain-openai
# Assumes get_weather, calculator, search_tool are defined above (Section 2)
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
# OpenAI models natively support function/tool calling
# Requires OPENAI_API_KEY environment variable
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Bind tools to the model — tells the LLM what tools are available
tools = [get_weather, calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)
# The model returns structured tool calls (not text-based)
response = llm_with_tools.invoke([
HumanMessage(content="What's the weather in Paris and what is 32C in Fahrenheit?")
])
# response.tool_calls is a structured list:
# [
# {"name": "get_weather", "args": {"city": "Paris"}, "id": "call_abc123"},
# {"name": "calculator", "args": {"expression": "32 * 9/5 + 32"}, "id": "call_def456"}
# ]
# Note: The model can call MULTIPLE tools in parallel!
for tc in response.tool_calls:
print(f"Tool: {tc['name']}, Args: {tc['args']}")
# Important: When the model decides to call tools, response.content is EMPTY ('')
# and finish_reason is 'tool_calls' — the model produces structured calls, not text.
# To get a final text answer, you must:
# 1. Execute each tool call and collect results
# 2. Send results back as ToolMessage objects
# 3. Invoke the model again — it then synthesizes a text response
# The AgentExecutor (Section 4) automates this entire loop for you.
Tool-calling is the modern, provider-agnostic approach that works with OpenAI, Anthropic, Google, and other LLM providers. It supersedes the older "function-calling" terminology.
# pip install langchain-anthropic langchain-google-genai langchain-core
# Requires environment variables:
# export ANTHROPIC_API_KEY="sk-ant-..."
# export GOOGLE_API_KEY="AIza..."
import math
from langchain_core.tools import tool
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI
# Define tools inline so this block is self-contained
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"Weather in {city}: 22°C, partly cloudy"
@tool
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression."""
allowed = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
allowed.update({"abs": abs, "round": round, "min": min, "max": max})
result = eval(expression, {"__builtins__": {}}, allowed)
return f"{expression} = {result}"
# Tool-calling works across providers — same interface, different models
# Anthropic Claude
claude = ChatAnthropic(model="claude-sonnet-4-20250514")
claude_with_tools = claude.bind_tools([get_weather, calculator])
# Google Gemini
gemini = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
gemini_with_tools = gemini.bind_tools([get_weather, calculator])
# The interface is identical regardless of provider
response = claude_with_tools.invoke("What's 2^10?")
for tc in response.tool_calls:
print(f"Tool: {tc['name']}, Args: {tc['args']}")
# Comparison: ReAct vs Function-Calling vs Tool-Calling
comparison = {
"ReAct (text-based)": {
"pros": "Visible reasoning, works with any LLM, interpretable",
"cons": "Brittle text parsing, slower (verbose output), error-prone",
"best_for": "Debugging, open-source models, educational purposes"
},
"Function-Calling (OpenAI)": {
"pros": "Structured JSON output, reliable, parallel calls",
"cons": "OpenAI-specific, less visible reasoning",
"best_for": "OpenAI-based production systems"
},
"Tool-Calling (Universal)": {
"pros": "Provider-agnostic, structured, modern standard",
"cons": "Requires provider support (most major providers now support it)",
"best_for": "New projects, multi-provider setups, production systems"
}
}
for name, details in comparison.items():
print(f"\n{name}: {details['best_for']}")
Common Mistake: Using ReAct text-based agents in production when tool-calling is available. Text-based ReAct requires fragile regex parsing of "Action:" and "Action Input:" from the LLM output. Tool-calling returns structured JSON, which is far more reliable. Always prefer tool-calling agents for production applications.
4. LangChain Implementation
LangChain provides several high-level APIs for building agents. We will cover the most important ones: create_react_agent, AgentExecutor, and the modern create_tool_calling_agent.
4.1 create_react_agent
Why langchain-classic? The
AgentExecutor,
create_react_agent,
create_tool_calling_agent, and
hub.pull() were moved from the main
langchain package into
langchain-classic starting with LangChain v0.3. This separate package contains legacy agent runtimes, chains, memory, and hub integrations. Install it with
pip install langchain-classic. For new agent development, LangChain recommends
LangGraph (covered in Part 8).
# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator, get_weather are defined above (Section 2)
from langchain_classic.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_classic import hub
# Pull the standard ReAct prompt from LangChain Hub
prompt = hub.pull("hwchase17/react")
# Create the LLM (requires OPENAI_API_KEY env var)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Define tools (defined in Section 2 above)
tools = [search_tool, calculator, get_weather]
# Create the ReAct agent
agent = create_react_agent(
llm=llm,
tools=tools,
prompt=prompt
)
# Wrap in AgentExecutor (the runtime that manages the loop)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Print reasoning steps
max_iterations=10, # Safety limit on loops
max_execution_time=60, # 60-second timeout
handle_parsing_errors=True, # Graceful error recovery
return_intermediate_steps=True # Include reasoning in output
)
# Run the agent
result = agent_executor.invoke({
"input": "What is the weather in Tokyo and how does the temperature "
"convert to Fahrenheit?"
})
print(result["output"])
# Also available: result["intermediate_steps"] — full reasoning trace
4.2 AgentExecutor Deep Dive
The AgentExecutor is the runtime that manages the agent loop. Understanding its configuration options is crucial for building reliable agents.
# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator, safe_sql_query, get_weather from Sections 2.1-2.2
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# Modern approach: create_tool_calling_agent (recommended for production)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful AI assistant with access to tools. "
"Always explain your reasoning before and after using tools. "
"If a tool returns an error, explain the error and try an alternative approach."),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_tool, calculator, safe_sql_query, get_weather]
# Create tool-calling agent (structured, reliable)
agent = create_tool_calling_agent(llm, tools, prompt)
# Configure AgentExecutor with production settings
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
# --- Iteration Control ---
max_iterations=15, # Max reasoning loops (prevents infinite loops)
max_execution_time=120, # Total timeout in seconds
early_stopping_method="generate", # "force" or "generate"
# "force" = hard stop, "generate" = ask LLM to summarize what it has so far
# --- Error Handling ---
handle_parsing_errors=True, # Catch and retry on parse errors
# Can also pass a function: handle_parsing_errors=my_error_handler
# --- Output Control ---
return_intermediate_steps=True, # Include full reasoning trace
# trim_intermediate_steps=10, # Keep only last N steps (memory management)
)
# Invoke with chat history for conversational agents
result = executor.invoke({
"input": "Find the top 5 customers by revenue from last quarter",
"chat_history": [] # Previous messages for context
})
# Access results
print("Answer:", result["output"])
print("Steps:", len(result["intermediate_steps"]))
for step in result["intermediate_steps"]:
action, observation = step
print(f" Tool: {action.tool}, Input: {action.tool_input}")
print(f" Result: {observation[:100]}...")
Let us build a complete research assistant agent that combines search, calculation, and code execution.
# pip install langchain-classic langchain-openai langchain-tavily
# Full standalone example — copy-paste and run
import os
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain_tavily import TavilySearch
# API keys via environment variables
# export OPENAI_API_KEY="sk-..."
# export TAVILY_API_KEY="tvly-..."
# --- Define Tools ---
search = TavilySearch(max_results=5, search_depth="advanced")
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression. Supports arithmetic, exponents,
and common math functions (sqrt, log, sin, cos, pi, e).
Examples: '2**10', 'sqrt(144)', 'log(1000, 10)', 'pi * 5**2'"""
import math
safe_dict = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
safe_dict.update({"abs": abs, "round": round})
try:
return str(eval(expression, {"__builtins__": {}}, safe_dict))
except Exception as e:
return f"Error: {e}. Please check your expression."
@tool
def run_python(code: str) -> str:
"""Execute Python code for data analysis or complex computation.
Has access to: math, statistics, json, datetime, collections.
Print results to see them. Code runs in an isolated environment."""
import io, contextlib
output = io.StringIO()
safe_globals = {"__builtins__": __builtins__}
try:
with contextlib.redirect_stdout(output):
exec(code, safe_globals)
result = output.getvalue()
return result if result.strip() else "Code executed (no printed output)."
except Exception as e:
return f"Error: {type(e).__name__}: {e}"
# --- Build Agent ---
system_prompt = """You are a research assistant with access to web search,
a calculator, and Python code execution.
Guidelines:
1. Always search for current information rather than relying on training data
2. Use the calculator for simple math, Python for complex analysis
3. Cite your sources when presenting search results
4. If a tool returns an error, explain what went wrong and try again
5. Break complex questions into smaller sub-tasks"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
MessagesPlaceholder("chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad")
])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search, calculate, run_python]
agent = create_tool_calling_agent(llm, tools, prompt)
research_agent = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=12,
max_execution_time=90,
handle_parsing_errors=True,
return_intermediate_steps=True
)
# --- Use the Agent ---
result = research_agent.invoke({
"input": "What is the current GDP of Japan in USD? "
"Calculate what percentage of US GDP that represents.",
"chat_history": []
})
print(result["output"])
Architecture Pattern
Agent vs Chain Decision Framework
Use this decision framework when building new features:
- Use a Chain when: The steps are known in advance, the workflow is linear, and you need predictable latency and cost
- Use an Agent when: The steps depend on intermediate results, the user query could require different tools, or you need error recovery and adaptive behavior
- Use Both: Many production systems use agents for the outer loop and chains for individual steps — an agent decides what to do, then invokes a chain for each task
Architecture Decision
Chain vs Agent
Production Pattern
5. Agent Memory
Agents need memory for two reasons: conversational continuity (remembering what the user said earlier) and task continuity (remembering what tools were called and what results were returned within a single complex task).
5.1 Conversational Agents
Conversational agents extend basic tool-using agents with multi-turn memory — they maintain a chat_history buffer that lets them understand follow-up questions, resolve pronouns ("what about its neighbor?"), and build on previous answers. Without conversation history, every user message would be interpreted in isolation, making natural dialogue impossible. The agent loop remains the same (observe → think → act), but the prompt now includes the full conversation, giving the LLM context to be a coherent conversational partner.
# pip install langchain-classic langchain-openai
# Assumes search_tool, calculator are defined above (Sections 2.1, 2.3)
from langchain_classic.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
# Prompt with chat_history placeholder for multi-turn conversations
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant with tools. "
"Use chat history to maintain context across turns."),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad")
])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_tool, calculator]
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Maintain conversation history
chat_history = []
def chat(user_input: str) -> str:
"""Send a message and maintain conversation history."""
result = executor.invoke({
"input": user_input,
"chat_history": chat_history
})
# Update history
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=result["output"]))
return result["output"]
# Multi-turn conversation
print(chat("What's the population of Brazil?"))
# Agent searches → "Brazil has ~216 million people"
print(chat("How does that compare to its neighbor Argentina?"))
# Agent uses chat_history to understand "its neighbor" refers to Brazil
# Searches → "Argentina has ~46 million, Brazil is ~4.7x larger"
print(chat("Calculate the population density of both if Brazil is 8.5M km² and Argentina is 2.8M km²"))
# Agent uses calculator with context from previous turns
5.2 Memory Strategies for Agents
Not all conversations fit in a single context window, so agents need strategies for managing memory over long interactions. Three common approaches trade off between fidelity and token efficiency: sliding window (keep the last N messages, simple but loses early context), summary memory (use an LLM to compress older messages into a running summary), and vector memory (embed all messages and retrieve the most semantically relevant ones for each new query). Production agents often combine multiple strategies.
# pip install langchain-openai langchain-community chromadb
# Three memory strategies for managing agent conversation history
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import AIMessage
from langchain_community.vectorstores import Chroma
# Strategy 1: Sliding Window (keep last N messages)
def sliding_window_memory(chat_history, max_messages=20):
"""Keep only the most recent messages to fit context window."""
if len(chat_history) > max_messages:
return chat_history[-max_messages:]
return chat_history
# Strategy 2: Summary Memory (compress old messages)
summarizer = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def summary_memory(chat_history, summary_threshold=20):
"""Summarize old messages when history gets too long."""
if len(chat_history) <= summary_threshold:
return chat_history
# Summarize the older half
old_messages = chat_history[:summary_threshold // 2]
recent_messages = chat_history[summary_threshold // 2:]
old_text = "\n".join([f"{m.type}: {m.content}" for m in old_messages])
summary = summarizer.invoke(
f"Summarize this conversation concisely:\n{old_text}"
)
return [AIMessage(content=f"[Summary of earlier conversation: {summary.content}]")] + recent_messages
# Strategy 3: Relevant Memory (retrieve relevant past context)
# Uses Chroma and OpenAIEmbeddings imported above
class VectorMemory:
"""Store all messages in a vector store, retrieve relevant ones."""
def __init__(self):
self.embeddings = OpenAIEmbeddings()
self.store = Chroma(embedding_function=self.embeddings)
self.all_messages = []
def add_message(self, role: str, content: str):
self.all_messages.append({"role": role, "content": content})
self.store.add_texts(
texts=[content],
metadatas=[{"role": role, "index": len(self.all_messages) - 1}]
)
def get_relevant_history(self, query: str, k: int = 5):
"""Retrieve the most relevant past messages for the current query."""
docs = self.store.similarity_search(query, k=k)
return [d.page_content for d in docs]
Key Insight: Agent memory is more complex than chain memory because agents produce intermediate reasoning steps (tool calls, observations) in addition to user/assistant messages. You need to decide whether to include these intermediate steps in the memory or only the final answers. Including intermediate steps gives the agent more context for future turns but consumes more tokens. A common production pattern is to summarize intermediate steps into a concise result before adding to memory.
6. Error Handling & Debugging
Agents are inherently less predictable than chains. They can enter infinite loops, call wrong tools, misinterpret observations, or generate invalid tool inputs. Robust error handling and debugging are essential.
6.1 Common Agent Failures
Agents fail in predictable ways, and understanding these failure modes is the first step to building robust systems. The five most common failures are: infinite loops (the agent keeps calling the same tool), wrong tool selection (choosing a tool that can’t answer the query), invalid arguments (passing malformed inputs to tools), hallucinated tool names (inventing tools that don’t exist), and context overflow (accumulating so much tool output that the context window overflows). The implementation below demonstrates a create_safe_tool wrapper that adds retry logic and error boundaries around any tool.
# Common agent failure modes and their solutions
# Failure 1: Infinite Loop — Agent keeps calling the same tool
# Symptom: Agent calls search -> gets result -> searches again -> same result -> ...
# Solution: max_iterations + clear stopping criteria in the prompt
# Failure 2: Wrong Tool Selection
# Symptom: Agent uses calculator when it should use search
# Solution: Better tool descriptions, few-shot examples in prompt
# Failure 3: Invalid Tool Arguments
# Symptom: Agent passes "What is Python?" to calculator
# Solution: Pydantic schemas for tool args, handle_parsing_errors=True
# Failure 4: Hallucinated Tool Names
# Symptom: Agent tries to call "web_browser" which doesn't exist
# Solution: List available tools explicitly in the system prompt
# Failure 5: Context Window Overflow
# Symptom: Agent accumulates too many observations and hits token limit
# Solution: trim_intermediate_steps, summary memory, observation truncation
# Production error handling wrapper
from langchain_core.tools import tool
def create_safe_tool(func, max_retries=2):
"""Wrap any tool function with retry logic and error handling."""
original_tool = tool(func)
original_run = original_tool._run
def safe_run(*args, **kwargs):
for attempt in range(max_retries + 1):
try:
result = original_run(*args, **kwargs)
if result and not result.startswith("Error"):
return result
if attempt < max_retries:
continue
return result
except Exception as e:
if attempt < max_retries:
continue
return f"Tool failed after {max_retries + 1} attempts: {str(e)}"
return "Tool failed: unknown error"
original_tool._run = safe_run
return original_tool
6.2 Debugging Techniques
When an agent produces unexpected results, you need visibility into its reasoning chain. LangChain provides three levels of debugging: verbose mode (prints the prompt and parsed output at each step), debug mode (logs every LLM call, tool invocation, and chain event), and custom callback handlers (programmatic access to all events for custom logging, metrics, or alerting). The AgentDebugHandler below tracks step counts, tool usage, errors, and timing — essential telemetry for production agent monitoring.
# pip install langchain langchain-core
# Assumes 'executor' is an AgentExecutor from Section 4.2 above
import logging
from langchain.globals import set_verbose, set_debug
# Level 1: Verbose mode — see agent reasoning
set_verbose(True)
# Level 2: Debug mode — see every LLM call, prompt, and response
set_debug(True)
# Level 3: Custom callback handler for fine-grained logging
from langchain_core.callbacks import BaseCallbackHandler
class AgentDebugHandler(BaseCallbackHandler):
"""Custom callback handler for agent debugging."""
def __init__(self):
self.step_count = 0
self.tool_calls = []
self.errors = []
def on_agent_action(self, action, **kwargs):
self.step_count += 1
self.tool_calls.append({
"step": self.step_count,
"tool": action.tool,
"input": action.tool_input
})
print(f"\n--- Step {self.step_count} ---")
print(f"Tool: {action.tool}")
print(f"Input: {action.tool_input}")
def on_tool_end(self, output, **kwargs):
print(f"Output: {str(output)[:200]}...")
def on_agent_finish(self, finish, **kwargs):
print(f"\n--- Agent Finished in {self.step_count} steps ---")
print(f"Tools used: {[tc['tool'] for tc in self.tool_calls]}")
def on_tool_error(self, error, **kwargs):
self.errors.append(str(error))
print(f"TOOL ERROR: {error}")
# Use the debug handler
debug_handler = AgentDebugHandler()
result = executor.invoke(
{"input": "What's the GDP growth rate of India?"},
config={"callbacks": [debug_handler]}
)
# After execution, inspect the debug data
print(f"Total steps: {debug_handler.step_count}")
print(f"Tools called: {debug_handler.tool_calls}")
print(f"Errors: {debug_handler.errors}")
6.3 LangSmith Tracing
LangSmith is LangChain’s hosted observability platform that provides production-grade tracing for agent applications. Once configured via environment variables, it automatically captures the full execution tree — every LLM call (with prompts and completions), tool inputs and outputs, latency per step, token usage, and error traces. This is invaluable for debugging why an agent chose the wrong tool, identifying slow steps, and monitoring costs across thousands of production requests.
# pip install langsmith
# LangSmith provides production-grade tracing for agents
# Set up environment variables
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY", "") # Your LangSmith API key
os.environ["LANGCHAIN_PROJECT"] = "agent-debugging"
# Every agent invocation is now automatically traced!
# You can view:
# - Full reasoning chain with Thought/Action/Observation
# - Each LLM call with prompt and response
# - Tool inputs and outputs
# - Latency for each step
# - Token usage and cost
# Programmatic access to traces
from langsmith import Client
client = Client()
# List recent runs for your project
runs = client.list_runs(
project_name="agent-debugging",
execution_order=1, # Top-level runs only
limit=10
)
for run in runs:
print(f"Run: {run.name}")
print(f" Status: {run.status}")
print(f" Latency: {run.end_time - run.start_time}")
print(f" Tokens: {run.total_tokens}")
print(f" Cost: ${run.total_cost:.4f}")
# Get child runs (individual LLM calls, tool calls)
children = client.list_runs(parent_run_id=run.id)
for child in children:
print(f" - {child.run_type}: {child.name} ({child.total_tokens} tokens)")
Production Checklist
Agent Reliability Checklist
Before deploying any agent to production, verify:
- Max iterations set — Prevent infinite loops (typically 10-15)
- Execution timeout set — Prevent runaway costs (30-120 seconds)
- Error handling enabled —
handle_parsing_errors=True
- Tool descriptions are unambiguous — Test with edge cases
- Destructive tools have confirmation — Never auto-delete or auto-send without guardrails
- Tracing enabled — LangSmith or equivalent for observability
- Cost monitoring — Agents can be expensive; set per-request budgets
- Fallback behavior defined — What happens when the agent gives up?
Reliability
Production
Checklist
Exercises & Self-Assessment
Exercise 1
Build a Multi-Tool Agent
Create an agent with at least 4 custom tools:
- A web search tool (use Tavily or SerpAPI)
- A calculator tool for mathematical expressions
- A date/time tool that returns the current date, time, or calculates date differences
- A unit converter tool (temperature, distance, weight)
Test the agent with queries that require combining multiple tools: "What's the weather in London in Fahrenheit, and how many days until Christmas?"
Exercise 2
ReAct vs Tool-Calling Comparison
Build the same agent using both approaches:
- Build a ReAct agent with
create_react_agent
- Build a tool-calling agent with
create_tool_calling_agent
- Run the same 10 queries through both and compare: accuracy, latency, token usage, and failure modes
- Document which approach is more reliable and why
Exercise 3
Agent Error Recovery
Create tools that intentionally fail some percentage of the time, then build an agent that handles these failures gracefully:
- Create a tool that raises an exception 30% of the time
- Create a tool that returns "API rate limited" 20% of the time
- Observe how the agent responds to these failures
- Improve the system prompt to teach the agent better error recovery strategies
Exercise 4
Conversational Agent with Memory
Build a conversational research agent that maintains context across multiple turns:
- Implement sliding window memory with a configurable window size
- Test with a 10-turn conversation where later questions reference earlier answers
- Experiment with different window sizes (5, 10, 20) and observe how context loss affects quality
- Implement summary memory and compare against sliding window
Exercise 5
Reflective Questions
- Why is tool description quality more important than tool implementation quality for agent reliability?
- Explain the trade-off between max_iterations (more iterations = more capable but more expensive). How would you determine the right value for a production agent?
- When would you choose a chain over an agent, even if the agent could technically handle the task?
- How does the agent decision loop relate to the OODA loop (Observe, Orient, Decide, Act) from military strategy?
- What are the security implications of giving an agent access to a database tool or code execution tool?
Conclusion & Next Steps
You now understand the core concepts that power AI agents — the most important building block in modern AI applications. Here are the key takeaways from Part 7:
- Agents vs Chains — Chains have developer-defined control flow; agents let the LLM decide what to do at runtime based on observations
- The Agent Decision Loop — Think, Act, Observe, Repeat — is the universal pattern underlying all agent architectures
- Tool Usage — APIs, databases, and code execution tools give agents the ability to interact with the real world beyond text generation
- Agent Types — ReAct (text-based reasoning), function-calling (OpenAI-specific), and tool-calling (universal) each have distinct trade-offs
- LangChain Implementation —
create_tool_calling_agent + AgentExecutor is the recommended production approach
- Agent Memory — Sliding window, summary, and vector-based strategies each serve different use cases
- Error Handling & Debugging — Max iterations, timeouts, parsing error handling, and LangSmith tracing are essential for reliability
Next in the Series
In Part 8: LangGraph — Stateful Agent Workflows, we move beyond simple agent loops to graph-based architectures. Learn how LangGraph's StateGraph enables complex workflows with conditional branching, cycles, persistence, subgraphs, and human-in-the-loop patterns that are impossible with AgentExecutor alone.
Continue the Series
Part 8: LangGraph — Stateful Agent Workflows
Build stateful, cyclic agent workflows with LangGraph's StateGraph, nodes, edges, conditional routing, persistence, and human-in-the-loop.
Read Article
Part 9: Deep Agents & Autonomous Systems
Explore advanced agent architectures — Plan-and-Execute, Reflexion, LATS, self-reflection, and autonomy levels from L1 to L4.
Read Article
Part 10: Multi-Agent Systems
Build systems where multiple specialized agents collaborate — supervisor, swarm, debate, and role-based architectures.
Read Article