Introduction: Why Design Patterns Matter
Series Overview: This is Part 11 of our 18-part AI Application Development Mastery series. Having mastered agents and multi-agent systems, we now catalog the complete set of design patterns that underpin every AI application — giving you a systematic framework for architectural decisions.
1
Foundations & Evolution of AI Apps
Pre-LLM era, transformers, LLM revolution
2
LLM Fundamentals for Developers
Tokens, context windows, sampling, API patterns
3
Prompt Engineering Mastery
Zero/few-shot, CoT, ReAct, structured outputs
4
LangChain Core Concepts
Chains, prompts, LLMs, tools, LCEL
5
Retrieval-Augmented Generation (RAG)
Embeddings, vector DBs, retrievers, RAG pipelines
6
Memory & Context Engineering
Buffer/summary/vector memory, chunking, re-ranking
7
Agents — Core of Modern AI Apps
ReAct, tool-calling, planner-executor agents
8
LangGraph — Stateful Agent Workflows
Nodes, edges, state, graph execution, cycles
9
Deep Agents & Autonomous Systems
Multi-step reasoning, self-reflection, planning
10
Multi-Agent Systems
Supervisor, swarm, debate, role-based collaboration
11
AI Application Design Patterns
RAG, chat+memory, workflow automation, agent loops
You Are Here
12
Ecosystem & Frameworks
LlamaIndex, Haystack, HuggingFace, vLLM
13
MCP Foundations & Architecture
Protocol design, Host/Client/Server, primitives, security
14
MCP in Production
Building servers, integrations, scaling, agent systems
15
Evaluation & LLMOps
Prompt eval, tracing, LangSmith, experiment tracking
16
Production AI Systems
APIs, queues, caching, streaming, scaling
17
Safety, Guardrails & Reliability
Input filtering, hallucination mitigation, prompt injection
18
Advanced Topics
Fine-tuning, tool learning, hybrid LLM+symbolic
19
Building Real AI Applications
Chatbot, document QA, coding assistant, full-stack
20
Future of AI Applications
Autonomous agents, self-improving, multi-modal, AI OS
Just as the Gang of Four cataloged design patterns for object-oriented programming, the AI application ecosystem has developed a set of recurring architectural patterns that solve common problems. Understanding these patterns lets you make faster, better architectural decisions — instead of reinventing solutions, you can apply proven patterns and focus on what makes your application unique.
In this installment, we catalog every major AI application design pattern, organized by complexity tier. For each pattern, we explain what it is, when to use it, how to implement it, and which framework does it best.
Key Insight: Most AI applications are compositions of 2-3 patterns. A customer support bot combines the Chat + Memory pattern with the RAG pattern and possibly the Tool Use pattern. Understanding the building blocks lets you compose them effectively.
1. Core Patterns
Core patterns are the foundational building blocks. Every AI application uses at least one of these. They are simple to implement, well-understood, and form the basis for more complex patterns.
1.1 Prompt-Response Pattern
The simplest AI pattern: send a prompt to an LLM, receive a response. Despite its simplicity, this pattern powers a surprising number of production applications when combined with well-engineered prompts.
# Pattern: Prompt-Response
# Complexity: Low | Latency: Low | Cost: Low
# pip install langchain-openai
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
# Simple prompt-response with structured output
llm = ChatOpenAI(model="gpt-4", temperature=0)
# Email classifier — a pure prompt-response application
email_classifier = ChatPromptTemplate.from_messages([
("system", """You are an email classifier. Classify the
incoming email into exactly one category:
- URGENT: Requires immediate action
- SUPPORT: Customer support request
- SALES: Sales inquiry or lead
- INTERNAL: Internal team communication
- SPAM: Unwanted or irrelevant
Respond with JSON: {{"category": "...", "confidence": 0.0-1.0,
"reasoning": "..."}}"""),
("human", "{email_text}")
])
chain = email_classifier | llm
# Usage
result = chain.invoke({
"email_text": "Our production database is down and customers "
"cannot access their accounts. Please fix ASAP!"
})
# Output: {"category": "URGENT", "confidence": 0.95,
# "reasoning": "Production outage affecting customers"}
Use Prompt-Response when: Classification, summarization, translation, formatting, extraction from small inputs, and any task where the LLM's training data is sufficient and no external data is needed.
1.2 RAG (Retrieval-Augmented Generation) Pattern
The most important pattern in AI application development. RAG augments the LLM's knowledge with external data by retrieving relevant documents before generating a response.
# Pattern: RAG (Retrieval-Augmented Generation)
# Complexity: Medium | Latency: Medium | Cost: Medium
# pip install langchain-openai langchain-chroma
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
# Components
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(
collection_name="company_docs",
embedding_function=embeddings,
persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(
search_type="mmr", # Maximum Marginal Relevance
search_kwargs={"k": 5, "fetch_k": 20}
)
llm = ChatOpenAI(model="gpt-4", temperature=0)
# RAG prompt template
rag_prompt = ChatPromptTemplate.from_messages([
("system", """Answer the user's question using ONLY the
provided context. If the answer is not in the context,
say "I don't have information about that in our docs."
Context:
{context}"""),
("human", "{question}")
])
# RAG chain
def format_docs(docs):
return "\n\n---\n\n".join(
f"Source: {d.metadata.get('source', 'Unknown')}\n"
f"{d.page_content}" for d in docs
)
rag_chain = (
{"context": retriever | format_docs,
"question": RunnablePassthrough()}
| rag_prompt
| llm
)
# Usage
answer = rag_chain.invoke("What is our refund policy?")
The LLM decides which external tools to call, constructs the arguments, interprets the results, and formulates a response. This pattern gives LLMs the ability to take actions in the real world.
# Pattern: Tool Use
# Complexity: Medium | Latency: Medium-High | Cost: Medium
# pip install langchain langchain-openai
import os
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# In production: call a real weather API
return f"Weather in {city}: 72F, sunny, humidity 45%"
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression safely."""
allowed = set("0123456789+-*/.()")
if all(c in allowed or c == ' ' for c in expression):
return str(eval(expression))
return "Invalid expression"
@tool
def search_database(query: str) -> str:
"""Search the company database for information."""
# In production: query your actual database
return f"Database results for '{query}': 42 matching records"
# Create tool-calling agent
llm = ChatOpenAI(model="gpt-4", temperature=0)
tools = [get_weather, calculate, search_database]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant with access to tools. "
"Use them when needed to answer questions accurately."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# The LLM decides which tools to call
result = executor.invoke({
"input": "What is the weather in Tokyo and what is 15% of 2500?"
})
2. Intermediate Patterns
Intermediate patterns combine core patterns with state management, persistence, and more complex data flows. These patterns power most production AI applications.
2.1 Chat + Memory Pattern
Maintains conversation history across turns, enabling contextual multi-turn conversations. The LLM references previous messages to provide coherent, context-aware responses.
# Pattern: Chat + Memory
# Complexity: Medium | Latency: Low-Medium | Cost: Medium
# pip install langchain-openai
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
llm = ChatOpenAI(model="gpt-4", temperature=0.7)
# Prompt with memory placeholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful AI assistant. Use the "
"conversation history to provide contextual answers."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain = prompt | llm
# Session-based memory store
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
# Wrap chain with message history
chat_with_memory = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
# Multi-turn conversation
config = {"configurable": {"session_id": "user-123"}}
r1 = chat_with_memory.invoke(
{"input": "My name is Alice and I work at Acme Corp."},
config=config
)
r2 = chat_with_memory.invoke(
{"input": "What company do I work at?"}, # Uses memory
config=config
)
# r2 correctly answers "Acme Corp" from memory
2.2 Document QA Pattern
A specialized combination of RAG and Chat + Memory optimized for question-answering over a specific document corpus. Includes source citation, confidence scoring, and follow-up question handling.
# Pattern: Document QA
# Complexity: Medium | Latency: Medium | Cost: Medium
# Combines: RAG + Chat + Memory + Source Citation
# pip install langchain-openai langchain-chroma pydantic
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
class QAResponse(BaseModel):
answer: str = Field(description="The answer to the question")
sources: list[str] = Field(description="Source documents used")
confidence: float = Field(description="Confidence score 0-1")
follow_up: list[str] = Field(
description="Suggested follow-up questions"
)
llm = ChatOpenAI(model="gpt-4", temperature=0)
parser = JsonOutputParser(pydantic_object=QAResponse)
doc_qa_prompt = ChatPromptTemplate.from_messages([
("system", """You are a document QA assistant. Answer questions
using ONLY the provided context documents. Always cite which
source document(s) you used. Rate your confidence from 0 to 1.
Suggest 2-3 follow-up questions the user might ask.
{format_instructions}
Context Documents:
{context}"""),
("human", "{question}")
])
doc_qa_chain = (
doc_qa_prompt.partial(
format_instructions=parser.get_format_instructions()
)
| llm
| parser
)
# Returns structured response with sources, confidence, follow-ups
2.3 Workflow Automation Pattern
Chains multiple AI steps together with conditional logic, data transformations, and external system integrations. This pattern is the backbone of AI-powered business process automation.
# Pattern: Workflow Automation
# Complexity: Medium-High | Latency: High | Cost: Medium-High
# Best implemented with: n8n, Zapier, LangGraph
# pip install langgraph langchain-openai
import os
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
class WorkflowState(TypedDict):
email_text: str
classification: str
sentiment: str
response_draft: str
approved: bool
llm = ChatOpenAI(model="gpt-4", temperature=0)
def classify_email(state: WorkflowState) -> dict:
"""Step 1: Classify the incoming email."""
response = llm.invoke(
f"Classify this email as SUPPORT, SALES, or BILLING: "
f"{state['email_text']}"
)
return {"classification": response.content.strip()}
def analyze_sentiment(state: WorkflowState) -> dict:
"""Step 2: Analyze customer sentiment."""
response = llm.invoke(
f"Rate sentiment as POSITIVE, NEUTRAL, or NEGATIVE: "
f"{state['email_text']}"
)
return {"sentiment": response.content.strip()}
def draft_response(state: WorkflowState) -> dict:
"""Step 3: Draft an appropriate response."""
response = llm.invoke(
f"Draft a {state['sentiment'].lower()} tone response "
f"for this {state['classification']} email: "
f"{state['email_text']}"
)
return {"response_draft": response.content}
def route_by_sentiment(state: WorkflowState) -> str:
"""Route negative sentiment to human review."""
if state["sentiment"] == "NEGATIVE":
return "human_review"
return "auto_send"
# Build workflow graph
wf = StateGraph(WorkflowState)
wf.add_node("classify", classify_email)
wf.add_node("sentiment", analyze_sentiment)
wf.add_node("draft", draft_response)
wf.set_entry_point("classify")
wf.add_edge("classify", "sentiment")
wf.add_edge("sentiment", "draft")
wf.add_conditional_edges("draft", route_by_sentiment, {
"human_review": END, # Human reviews negative responses
"auto_send": END # Auto-send positive/neutral
})
workflow = wf.compile()
3. Advanced Patterns
Advanced patterns involve autonomous decision-making, iterative self-improvement, and multi-agent coordination. They are the most powerful but also the most complex and expensive to run.
3.1 Agent Loop Pattern
The agent receives a goal, then enters a think-act-observe loop, iteratively taking actions and refining its approach until the goal is achieved or a termination condition is met.
# Pattern: Agent Loop (ReAct / Think-Act-Observe)
# Complexity: High | Latency: High | Cost: High
# The agent iterates until the goal is achieved
# pip install langgraph langchain-openai
import os
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
import operator
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
class AgentLoopState(TypedDict):
messages: Annotated[list, operator.add]
goal: str
plan: str
iteration: int
completed: bool
llm = ChatOpenAI(model="gpt-4", temperature=0)
def think_node(state: AgentLoopState) -> dict:
"""Agent thinks about what to do next."""
messages = [
SystemMessage(content=f"""You are working on: {state['goal']}
Current plan: {state.get('plan', 'None')}
Iteration: {state['iteration']}
Analyze the current situation and decide:
1. What has been accomplished so far?
2. What remains to be done?
3. What is the next specific action?
4. Should we FINISH? (say GOAL_COMPLETE if yes)"""),
*state["messages"][-5:]
]
response = llm.invoke(messages)
completed = "GOAL_COMPLETE" in response.content
return {
"messages": [response],
"completed": completed,
"iteration": state["iteration"] + 1
}
def act_node(state: AgentLoopState) -> dict:
"""Agent takes an action based on its thinking."""
last_thought = state["messages"][-1].content
messages = [
SystemMessage(content="Execute the next action from "
"the plan. Provide concrete output."),
HumanMessage(content=f"Action to take: {last_thought}")
]
response = llm.invoke(messages)
return {"messages": [response]}
def should_continue(state: AgentLoopState) -> str:
if state.get("completed", False):
return "end"
if state.get("iteration", 0) >= 10:
return "end"
return "act"
# Build agent loop graph
loop = StateGraph(AgentLoopState)
loop.add_node("think", think_node)
loop.add_node("act", act_node)
loop.set_entry_point("think")
loop.add_conditional_edges("think", should_continue, {
"act": "act",
"end": END
})
loop.add_edge("act", "think") # Loop back
agent_loop = loop.compile()
3.2 Planning Pattern
The agent first creates an explicit plan (a sequence of steps), then executes the plan step-by-step, re-planning when necessary. This separates strategic thinking from tactical execution.
# Pattern: Planning (Plan-and-Execute)
# Complexity: High | Latency: High | Cost: High
# Separates planning from execution for better results
# pip install langgraph langchain-openai
import os
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
# Requires OPENAI_API_KEY environment variable
# export OPENAI_API_KEY="sk-..."
class PlanState(TypedDict):
goal: str
plan: list # List of step descriptions
current_step: int
step_results: dict
needs_replan: bool
llm = ChatOpenAI(model="gpt-4", temperature=0)
def planner_node(state: PlanState) -> dict:
"""Create or revise the plan."""
if state.get("needs_replan", False):
prompt = f"""The original plan failed at step
{state['current_step']}. Results so far:
{state['step_results']}
Revise the plan for goal: {state['goal']}
Return a JSON list of remaining steps."""
else:
prompt = f"""Create a step-by-step plan to achieve:
{state['goal']}
Return a JSON list of 3-7 specific, actionable steps.
Example: ["Research topic X", "Write outline", ...]"""
response = llm.invoke(prompt)
# Parse JSON list from response
import json
try:
plan = json.loads(response.content)
except json.JSONDecodeError:
plan = [response.content]
return {
"plan": plan,
"current_step": 0,
"needs_replan": False
}
def executor_node(state: PlanState) -> dict:
"""Execute the current step of the plan."""
step_idx = state["current_step"]
step = state["plan"][step_idx]
response = llm.invoke(
f"Execute this step thoroughly: {step}\n"
f"Previous results: {state.get('step_results', {})}"
)
results = state.get("step_results", {})
results[f"step_{step_idx}"] = response.content
return {
"step_results": results,
"current_step": step_idx + 1
}
def check_progress(state: PlanState) -> str:
if state["current_step"] >= len(state["plan"]):
return "done"
return "execute"
# Build plan-and-execute graph
plan_graph = StateGraph(PlanState)
plan_graph.add_node("planner", planner_node)
plan_graph.add_node("executor", executor_node)
plan_graph.set_entry_point("planner")
plan_graph.add_edge("planner", "executor")
plan_graph.add_conditional_edges("executor", check_progress, {
"execute": "executor",
"done": END
})
plan_execute = plan_graph.compile()
3.3 Multi-Agent Orchestration Pattern
Multiple specialized agents collaborate on a task, coordinated by a supervisor or through peer-to-peer communication. This pattern was covered extensively in Part 10, but here we place it in the context of the full pattern catalog.
# Pattern: Multi-Agent Orchestration
# Complexity: Very High | Latency: Very High | Cost: High
# See Part 10 for detailed implementations
# Summary of multi-agent sub-patterns:
MULTI_AGENT_PATTERNS = {
"supervisor": {
"description": "Central coordinator routes to specialists",
"agents": "3-8",
"best_for": "Quality-gated pipelines, dev workflows",
"framework": "LangGraph (best), CrewAI, AutoGen"
},
"swarm": {
"description": "Autonomous agents with handoff protocols",
"agents": "2-6",
"best_for": "Customer service routing, triage",
"framework": "LangGraph, OpenAI Swarm"
},
"debate": {
"description": "Adversarial agents improve via argumentation",
"agents": "2-4",
"best_for": "Analysis, evaluation, high-stakes decisions",
"framework": "LangGraph, AutoGen"
},
"hierarchical": {
"description": "Tree of managers delegating to workers",
"agents": "5-20+",
"best_for": "Large projects, enterprise automation",
"framework": "CrewAI (hierarchical process), LangGraph"
}
}
4. Pattern Selection Guide
Choosing the right pattern and framework is the most impactful architectural decision you will make. This section provides a systematic approach to pattern selection.
4.1 Framework-Pattern Matrix
This matrix shows which framework is best suited for each design pattern:
| Pattern |
LangChain |
LangGraph |
CrewAI |
n8n |
Zapier |
| Prompt-Response |
Excellent |
Overkill |
Overkill |
Good (AI node) |
Good (AI action) |
| RAG |
Excellent |
Good |
Limited |
Good (with plugins) |
Limited |
| Tool Use |
Excellent |
Excellent |
Good |
Excellent (native) |
Excellent (native) |
| Chat + Memory |
Excellent |
Excellent |
Good |
Good |
Limited |
| Document QA |
Excellent |
Good |
Limited |
Good |
Limited |
| Workflow Automation |
Good |
Excellent |
Good |
Excellent |
Excellent |
| Agent Loop |
Good |
Excellent |
Limited |
Limited |
Not supported |
| Planning |
Limited |
Excellent |
Good (planning=True) |
Not supported |
Not supported |
| Multi-Agent |
Limited |
Excellent |
Excellent |
Limited |
Not supported |
4.2 Decision Tree
Follow this decision tree to select the right pattern for your use case:
# AI Pattern Selection Decision Tree
START: What does your application need to do?
|
|-- Single LLM call sufficient?
| |-- YES: Do you need external data?
| | |-- NO --> Prompt-Response (LangChain/direct API)
| | |-- YES --> RAG Pattern (LangChain + vector DB)
| |-- NO: Continue...
|
|-- Does it need conversation history?
| |-- YES: Is it over a document corpus?
| | |-- YES --> Document QA (LangChain)
| | |-- NO --> Chat + Memory (LangChain)
| |-- NO: Continue...
|
|-- Does it need to take actions (APIs, DB, files)?
| |-- YES: Is it a single action or a chain?
| | |-- Single --> Tool Use (LangChain)
| | |-- Chain --> Workflow Automation (LangGraph/n8n)
| |-- NO: Continue...
|
|-- Does it need autonomous multi-step reasoning?
| |-- YES: Does it need explicit planning?
| | |-- YES --> Planning Pattern (LangGraph)
| | |-- NO --> Agent Loop (LangGraph)
| |-- NO: Continue...
|
|-- Does it need multiple specialized agents?
| |-- YES --> Multi-Agent Orchestration (LangGraph/CrewAI)
|
|-- Is the user non-technical?
| |-- YES --> n8n (self-hosted) or Zapier (cloud)
|
|-- None of the above?
--> Start with Prompt-Response and iterate
The Golden Rule of Pattern Selection: Always start with the simplest pattern that could possibly work. Upgrade to a more complex pattern only when you hit a specific limitation. A well-crafted prompt-response can often replace a complex agent loop at 1/10th the cost and latency.
Framework Selection Summary
| If you need... |
Use this framework |
Why |
| RAG, chains, basic agents |
LangChain |
Best ecosystem for retrieval and chain composition |
| Complex agents, stateful workflows, cycles |
LangGraph |
Graph-based architecture handles any topology |
| Role-based multi-agent teams |
CrewAI |
Intuitive team metaphor, task dependencies |
| No-code visual workflow (self-hosted) |
n8n |
Visual builder, 400+ integrations, AI nodes |
| No-code workflow (cloud, simplest) |
Zapier |
Easiest setup, 6000+ app integrations |
5. Anti-Patterns
Anti-patterns are common mistakes that seem reasonable but lead to poor outcomes. Recognizing them saves time, money, and frustration.
5.1 Common Mistakes
| Anti-Pattern |
What It Looks Like |
Why It Fails |
Better Approach |
| The God Prompt |
One massive prompt that handles all logic, edge cases, and formatting |
Exceeds context limits, becomes fragile, impossible to debug |
Break into chains: classify -> route -> handle -> format |
| Premature Agent-ification |
Using a ReAct agent loop for what a simple chain could do |
10x more tokens, unpredictable behavior, much higher latency |
Start with prompt-response, add agent only when needed |
| RAG Everything |
Putting all data into a vector DB even when not needed |
Retrieval noise degrades answers, embedding costs accumulate |
Use RAG only for large, dynamic knowledge bases. Small static data goes in the prompt. |
| Infinite Agent Loop |
No max iterations or termination condition on agent loops |
Runaway costs, hangs, and no useful output |
Always set max_iterations. Add explicit DONE/FINISH signals. |
| Memory Bloat |
Storing entire conversation history in context forever |
Context window overflow, irrelevant old context pollutes responses |
Use summary memory, sliding window, or vector memory for long conversations |
| Framework Overload |
Using LangChain + LangGraph + CrewAI + LlamaIndex in one project |
Dependency conflicts, debugging nightmare, team confusion |
Pick one primary framework. Add others only for specific capabilities. |
| Tool Explosion |
Giving an agent access to 50+ tools |
LLM cannot reliably choose from too many options, tool descriptions bloat context |
Limit to 5-10 tools per agent. Use a tool router if you need more. |
5.2 Pattern Smells
These signs indicate you may be using the wrong pattern:
# Pattern smell detection checklist
PATTERN_SMELLS = {
"High latency for simple tasks": {
"likely_cause": "Using Agent Loop for Prompt-Response tasks",
"fix": "Downgrade to a simpler pattern",
"metric": "If avg response > 10s for classification tasks"
},
"Inconsistent outputs": {
"likely_cause": "Missing structured output parsing",
"fix": "Add Pydantic output parser or function calling",
"metric": "If JSON parse failures > 5% of requests"
},
"Context window errors": {
"likely_cause": "Memory Bloat or God Prompt anti-pattern",
"fix": "Implement summary memory or break prompt into chains",
"metric": "If token count regularly exceeds 80% of limit"
},
"Agent loops without progress": {
"likely_cause": "Infinite Agent Loop anti-pattern",
"fix": "Add progress tracking and termination conditions",
"metric": "If agent takes > 5 iterations for simple tasks"
},
"High cost per query": {
"likely_cause": "Premature Agent-ification",
"fix": "Audit each agent step - can it be a simple chain?",
"metric": "If cost/query > $0.10 for routine operations"
},
"Retrieval returns irrelevant docs": {
"likely_cause": "Poor chunking, wrong embedding model, or "
"RAG Everything anti-pattern",
"fix": "Tune chunk size, use re-ranking, audit what needs RAG",
"metric": "If retrieval relevance score < 0.7 average"
}
}
The Most Expensive Anti-Pattern: Building a multi-agent system when a well-crafted prompt would suffice. A 5-agent system costs 5-20x more per query than a single prompt-response. Always justify the added complexity with measurable quality improvements.
6. Exercises & Self-Assessment
Exercise 1
Pattern Identification
Identify which design pattern(s) each of these real-world applications uses:
- ChatGPT — What patterns does the free version use? What about ChatGPT Plus with plugins?
- GitHub Copilot — Is it prompt-response, RAG, or something more complex?
- Notion AI — What patterns power its "Ask AI" feature vs its "Write with AI" feature?
- Perplexity AI — How does it combine search with generation?
- Cursor IDE — What pattern enables its multi-file code editing capability?
Exercise 2
Implement Three Patterns
Build these three applications, each using a different pattern:
- Prompt-Response: An email subject line generator that produces 5 variants from an email body
- RAG: A FAQ bot that answers questions from a set of at least 20 FAQ documents
- Agent Loop: A research agent that searches the web, evaluates sources, and writes a summary
Compare: lines of code, tokens per query, latency, and output quality.
Exercise 3
Anti-Pattern Audit
Review this hypothetical AI application and identify all anti-patterns:
- A customer support bot that uses a ReAct agent loop for every query (including "What are your hours?")
- It stores the entire conversation history (no limit) in the prompt
- It has 35 tools available including a calculator, web search, database query, email sender, and 31 others
- All company data (50,000 docs) is in one vector store with 2048-token chunks
- It uses LangChain + LlamaIndex + CrewAI + a custom framework simultaneously
For each anti-pattern, explain the problem and recommend a fix.
Exercise 4
Framework Selection Challenge
For each scenario, recommend a framework AND a design pattern. Justify both choices:
- A legal firm wants to search across 100,000 contracts and answer natural language questions with source citations
- A marketing team (non-technical) wants AI to auto-generate social posts from blog articles
- A DevOps team wants an AI that autonomously investigates production alerts, checks logs, and suggests fixes
- A startup wants to add an AI chatbot to their SaaS product that remembers previous conversations
- A research institute wants multiple AI perspectives to debate policy recommendations
Exercise 5
Reflective Questions
- Why is the "start simple, upgrade when needed" principle so important for AI applications specifically? How does it differ from traditional software development?
- How do you quantify when a prompt-response pattern is "not good enough" and you need to upgrade to RAG or an agent?
- What would a "design patterns" library for AI applications look like? How is it different from traditional GoF patterns?
- Can you combine anti-patterns to create something worse than each individually? Give an example.
- If you were building an AI application framework from scratch, which pattern would you make the easiest to implement and why?
Conclusion & Next Steps
You now have a complete catalog of AI application design patterns and a systematic framework for choosing the right one. Here are the key takeaways from Part 11:
- Core patterns — Prompt-Response, RAG, and Tool Use are the fundamental building blocks that every AI developer must master
- Intermediate patterns — Chat + Memory, Document QA, and Workflow Automation combine core patterns with state management and persistence for production-grade applications
- Advanced patterns — Agent Loops, Planning, and Multi-Agent Orchestration enable autonomous, multi-step reasoning but come with significantly higher cost and complexity
- Framework-pattern matrix — LangChain excels at RAG and chains, LangGraph at complex agents, CrewAI at multi-agent teams, n8n/Zapier at no-code workflow automation
- Start simple — The golden rule is to always start with the simplest pattern that could work, then upgrade only when you hit a specific limitation
- Anti-patterns — The God Prompt, Premature Agent-ification, RAG Everything, Infinite Agent Loop, Memory Bloat, Framework Overload, and Tool Explosion are the most common and costly mistakes
Next in the Series
In Part 12: Ecosystem & Frameworks, we provide a comprehensive deep-dive comparison of every major framework and tool in the AI application ecosystem — LangChain, LangGraph, AutoGen, CrewAI, LlamaIndex, n8n, Zapier, HuggingFace, MCP, vLLM, vector databases, and more.
Continue the Series
Part 12: Ecosystem & Frameworks
Comprehensive comparison of LangChain, LangGraph, AutoGen, CrewAI, LlamaIndex, n8n, Zapier, and the entire AI ecosystem.
Read Article
Part 13: Evaluation & LLMOps
Master prompt evaluation, tracing, LangSmith, experiment tracking, and the operational side of AI systems.
Read Article
Part 10: Multi-Agent Systems
Compare AutoGen vs CrewAI vs LangGraph multi-agent architectures with supervisor, swarm, debate, and hierarchical patterns.
Read Article