Back to AI App Dev Series

LangChain SDK Track Part 3: Memory & State Management

May 22, 2026 Wasil Zafar 35 min read

Master every LangChain memory implementation — ConversationBufferMemory, SummaryMemory, WindowMemory, VectorStoreRetrieverMemory, and EntityMemory. Build stateful multi-turn conversation applications with persistent context and learn when to use each memory type.

Table of Contents

  1. Buffer Memory
  2. Summary Memory
  3. Vector Memory
  4. Entity Memory
  5. Persistence Backends
  6. Production Patterns
  7. Summary & Next Steps
What You’ll Learn: Memory gives your LangChain applications the ability to remember past interactions — previous messages in a conversation, learned user preferences, or accumulated context across sessions. Without memory, every request starts from zero. This article covers the memory spectrum: from simple buffer memory (store everything) to summary memory (compress old context) to persistent storage (survive restarts). Think of it like the difference between a goldfish (no memory) and a personal assistant who knows your history.

1. Buffer Memory

SDK Track Note: This is the LangChain SDK Track — a hands-on companion to Foundation Track Part 6 (Memory & Context Engineering). Read that article first for the underlying concepts.

1.1 ConversationBufferMemory

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

model = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Stores the full conversation history
memory = ConversationBufferMemory(return_messages=True)

chain = ConversationChain(llm=model, memory=memory, verbose=True)

# Multi-turn conversation
response1 = chain.invoke({"input": "Hi! I'm working on a RAG system."})
print(response1["response"])

response2 = chain.invoke({"input": "What embedding model would you recommend?"})
print(response2["response"])

# Memory contains full history
print(memory.load_memory_variables({}))

1.2 ConversationBufferWindowMemory

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

model = ChatOpenAI(model="gpt-4o-mini")

# Keep only the last k exchanges
memory = ConversationBufferWindowMemory(k=5, return_messages=True)

chain = ConversationChain(llm=model, memory=memory)

# After 6+ exchanges, oldest messages are dropped
for i in range(8):
    chain.invoke({"input": f"Message number {i+1}"})

# Only last 5 exchanges remain in memory
history = memory.load_memory_variables({})
print(f"Messages in memory: {len(history['history'])}")
Real-World Application

Personalized Shopping Assistant

An e-commerce platform uses LangChain memory to create personalized shopping experiences: the assistant remembers past purchases, size preferences, style tastes, and budget constraints across sessions. After 3 interactions, recommendations become significantly more relevant. Result: 35% increase in conversion rate for returning users.

Persistent MemoryPersonalizationE-Commerce

2. Summary Memory

2.1 ConversationSummaryMemory

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain

model = ChatOpenAI(model="gpt-4o-mini")

# Maintains a running summary instead of raw messages
memory = ConversationSummaryMemory(llm=model, return_messages=True)

chain = ConversationChain(llm=model, memory=memory)

chain.invoke({"input": "I'm building a customer support chatbot for an e-commerce platform."})
chain.invoke({"input": "We handle about 10,000 queries per day across 3 languages."})
chain.invoke({"input": "Main issues are order tracking, returns, and product questions."})

# Memory contains a summary, not raw messages
summary = memory.load_memory_variables({})
print("Summary:", summary)

2.2 ConversationSummaryBufferMemory

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains import ConversationChain

model = ChatOpenAI(model="gpt-4o-mini")

# Hybrid: keeps recent messages raw, summarizes older ones
memory = ConversationSummaryBufferMemory(
    llm=model,
    max_token_limit=650,  # Summarize when buffer exceeds this
    return_messages=True
)

chain = ConversationChain(llm=model, memory=memory)

# First few messages stay as-is, older ones get summarized
for msg in ["Hi, I need help with deployment", "We use AWS ECS", "Running 3 services", "Need auto-scaling", "Budget is $500/month"]:
    chain.invoke({"input": msg})

print(memory.load_memory_variables({}))

Summary & Next Steps

This completes the LangChain SDK implementation for the concepts covered in Part 6: Memory & Context Engineering.

Try It Yourself: Build a ‘personal journaling assistant’ with 3 memory types: (1) ConversationBufferMemory for the current session, (2) ConversationSummaryMemory that compresses after 10 messages, (3) persistent memory using Redis that survives restarts. Have a 15-message conversation, then restart the app and verify it remembers key facts from the previous session.

Next in the LangChain SDK Track

Continue with LC Part 4: Agents & Tools.