Introduction
In the late 1940s, Toyota was a struggling car manufacturer with limited capital and a devastated post-war economy. Out of necessity, Taiichi Ohno and Shigeo Shingo developed the Toyota Production System (TPS) — a radically different approach to manufacturing that eliminated waste, respected workers, and achieved extraordinary quality at scale.
Fifty years later, Mary and Tom Poppendieck published Lean Software Development: An Agile Toolkit (2003), translating Toyota's principles into the software domain. Their insight was profound: software development shares the same fundamental challenges as manufacturing — variability, queues, feedback delays, and the temptation to overproduce.
Why LEAN Thinking Transforms Delivery
Most process frameworks (Scrum, SAFe, XP) prescribe practices. LEAN is different — it provides thinking tools. Rather than saying "have a daily standup," LEAN asks: "Where is the waste in your system? What is preventing flow?" This makes LEAN universally applicable, whether you run Scrum sprints, Kanban boards, or something entirely custom.
The core LEAN question is deceptively simple: What activities add value from the customer's perspective, and what activities do not? Everything that does not directly contribute to delivering customer value is waste — and waste should be eliminated or minimised.
The Seven Principles of Lean Software Development
The Poppendiecks distilled Toyota's philosophy into seven principles specifically adapted for software teams. These principles form a coherent system — they reinforce each other and create compounding benefits when applied together.
1. Eliminate Waste
The foundational principle. Waste (Japanese: muda) is anything that does not add value from the customer's perspective. In manufacturing, waste is visible — scrap metal on the floor, parts waiting in inventory. In software, waste is invisible — half-finished features in a branch, meetings that produce no decisions, handoff documents nobody reads.
The first step is learning to see waste. Most teams are so accustomed to their waste that it becomes invisible. Value stream mapping (covered in Section 4) makes waste visible and quantifiable.
2. Amplify Learning
Software development is fundamentally a learning process, not a production process. You are not assembling known components — you are discovering what the right solution looks like. This means:
- Short feedback cycles so you learn quickly whether you are on the right track
- Iterative development to refine understanding through building
- Pair programming and code reviews to spread knowledge
- Retrospectives to learn from experience
- Spikes and prototypes to reduce uncertainty before commitment
3. Decide as Late as Possible
Irreversible decisions made with incomplete information are expensive. LEAN advocates deferring commitment until the "last responsible moment" — the point at which not deciding becomes more costly than deciding with imperfect information.
This is not procrastination. It is strategic delay — keeping options open while gathering information. Examples:
- Choose your database after understanding access patterns (not before writing the first line of code)
- Decide on microservices vs monolith after understanding team boundaries
- Defer UI framework choice until user research clarifies interaction patterns
4. Deliver as Fast as Possible
Speed is not about working harder — it is about reducing cycle time. The faster you deliver, the sooner you get feedback, the less inventory accumulates, and the more responsive you are to change. Speed comes from:
- Eliminating queues and wait times
- Reducing batch sizes
- Automating repetitive work
- Removing handoffs between teams
5. Empower the Team
Toyota learned that the people closest to the work are the best positioned to improve it. In software, this means:
- Developers choose their tools and approaches
- Teams own their delivery pipeline end-to-end
- Decisions are pushed down to the lowest level with sufficient context
- Managers create conditions for success rather than directing work
6. Build Integrity In
Quality is not inspected into a product — it is built in from the start. In manufacturing, this means designing the process so defects cannot occur (poka-yoke). In software, this means:
- Test-driven development (TDD) — tests before code
- Continuous integration — catch problems immediately
- Refactoring — maintain conceptual integrity as the system grows
- Automated quality gates — prevent bad code from progressing
7. Optimize the Whole
Local optimisation often causes global degradation. A team that optimises its own throughput by batching large PRs may slow down every other team that depends on their changes. LEAN thinking requires systems thinking — optimising the entire value stream, not individual stations.
The Sub-Optimisation Trap
A platform team measured their success by "number of features shipped." They shipped 40 features in a quarter — a record. But downstream teams could only consume 12 of those features because the documentation was incomplete and APIs were inconsistent. The platform team optimised their throughput at the expense of system throughput. LEAN thinking would measure the platform team by features successfully adopted by consumers, not features shipped.
The Seven Wastes of Software
The Poppendiecks mapped Toyota's seven wastes of manufacturing to their software equivalents. Learning to recognise these wastes is the first step toward eliminating them.
| Manufacturing Waste | Software Equivalent | Examples | Impact |
|---|---|---|---|
| Inventory | Partially Done Work | Unmerged branches, undeployed features, half-written specs | Becomes stale, merge conflicts, delayed feedback |
| Over-production | Extra Features | Gold-plating, "just in case" features, unused configuration options | Maintenance burden, complexity, wasted effort |
| Extra Processing | Relearning | Lost knowledge, poor documentation, team member departure without handover | Repeated mistakes, slow onboarding, duplicated effort |
| Transportation | Handoffs | Dev → QA → Ops transitions, requirements thrown over walls, approval chains | Information loss, delays, context switching |
| Motion | Task Switching | Context switching between projects, interrupt-driven work, multi-tasking | Cognitive load, reduced focus, lower quality |
| Waiting | Delays | Waiting for code review, waiting for approvals, waiting for environments | Blocked developers, slow lead time, frustration |
| Defects | Defects | Bugs found in production, rework, misunderstood requirements | Rework, customer impact, firefighting |
mindmap
root((Seven Wastes))
Partially Done Work
Unmerged branches
Feature flags never removed
Specs without implementation
Extra Features
Gold-plating
Unused config options
Premature abstraction
Relearning
Lost tribal knowledge
No documentation
Repeated mistakes
Handoffs
Dev to QA walls
Approval chains
Ticket ping-pong
Task Switching
Multiple projects
Interrupt-driven culture
Slack/email overload
Delays
Waiting for review
Environment provisioning
Dependency on other teams
Defects
Production bugs
Misunderstood requirements
Integration failures
How to Identify Waste in Your Team
Waste identification requires observation and measurement. Here are practical techniques:
- Walk the board: For every item on your Kanban board, ask "Is someone actively working on this right now?" Items that are not being worked on are partially done work waste.
- Measure wait time: Track how long work items spend in each column. If items spend 3 days "In Review" but only 30 minutes of actual review time, you have 2.98 days of delay waste.
- Count handoffs: How many people must touch a feature between "idea" and "production"? Each handoff loses information and adds delay.
- Ask "Who uses this?": For every feature, report, or meeting — who actually uses the output? Features nobody uses are extra features waste.
Value Stream Mapping
A Value Stream Map (VSM) is a visual representation of the entire flow from customer request to delivered value. It shows every step, the time spent actively working, the time spent waiting, and the handoffs between people or teams.
Value Stream Mapping was originally developed by Mike Rother and John Shook in Learning to See (1999) for manufacturing. Karen Martin and Mike Osterling adapted it for knowledge work in Value Stream Mapping (2014).
Step-by-Step: Creating a Value Stream Map
- Define the scope: What is the start event (e.g., "feature request created") and end event (e.g., "feature live in production")?
- Walk the process: Identify every step the work item passes through. Include waiting states.
- Measure times: For each step, record process time (active work) and wait time (sitting idle).
- Identify handoffs: Mark every point where work transfers between people or teams.
- Calculate flow efficiency: Total process time ÷ total elapsed time × 100%.
- Identify bottlenecks: Where are the longest wait times? Where does WIP pile up?
flowchart LR
A["Feature Request\n(Wait: 5d)"] --> B["Prioritisation\n(Work: 1h | Wait: 3d)"]
B --> C["Design\n(Work: 4h | Wait: 2d)"]
C --> D["Development\n(Work: 16h | Wait: 1d)"]
D --> E["Code Review\n(Work: 1h | Wait: 2d)"]
E --> F["QA Testing\n(Work: 3h | Wait: 3d)"]
F --> G["Deployment\n(Work: 0.5h | Wait: 1d)"]
G --> H["Live in Production"]
Flow Efficiency
Flow efficiency is the ratio of value-adding time to total elapsed time:
Flow Efficiency = Process Time ÷ Total Lead Time × 100%
In the example above:
- Total process time: 1h + 4h + 16h + 1h + 3h + 0.5h = 25.5 hours
- Total elapsed time: 5d + 3d + 2d + 1d + 2d + 3d + 1d = 17 days = 136 hours
- Flow efficiency: 25.5 ÷ 136 = 18.75%
Flow & WIP Limits
Little's Law
John D.C. Little proved in 1961 that for any stable system:
Lead Time = WIP ÷ Throughput
This has profound implications for software delivery:
- If your throughput is 5 items/week and you have 20 items in progress, your lead time is 4 weeks.
- To halve your lead time without changing throughput, halve your WIP.
- The fastest way to improve lead time is to reduce WIP.
# Little's Law Calculator
def calculate_lead_time(wip: int, throughput: float) -> float:
"""
Little's Law: L = W / λ (Lead Time = WIP / Throughput)
Args:
wip: Number of items currently in progress
throughput: Items completed per time unit (e.g., per week)
Returns:
Average lead time in the same time unit as throughput
"""
if throughput <= 0:
raise ValueError("Throughput must be positive")
return wip / throughput
# Example: Team with 15 items in progress, completing 5 per week
current_wip = 15
weekly_throughput = 5.0
lead_time = calculate_lead_time(current_wip, weekly_throughput)
print(f"Current lead time: {lead_time} weeks") # 3.0 weeks
# If we reduce WIP to 8:
reduced_wip = 8
new_lead_time = calculate_lead_time(reduced_wip, weekly_throughput)
print(f"New lead time: {new_lead_time} weeks") # 1.6 weeks
print(f"Improvement: {((lead_time - new_lead_time) / lead_time) * 100:.0f}%") # 47%
Kanban WIP Limits
WIP limits are constraints placed on each stage of your workflow. They are the mechanism that turns a push system into a pull system. When a stage reaches its WIP limit, no new work can enter until existing work exits.
Setting WIP limits forces teams to:
- Stop starting, start finishing — complete existing work before taking on new work
- Expose bottlenecks — when a stage is full, upstream stages are blocked, making the constraint visible
- Collaborate — team members swarm on blocked items rather than starting new items
- Reduce multitasking — fewer items in progress means more focus per item
# Example Kanban Board with WIP Limits
board:
columns:
- name: "Backlog"
wip_limit: null # No limit on ideas
- name: "Ready"
wip_limit: 5 # Only 5 items refined and ready
- name: "In Dev"
wip_limit: 3 # Max 3 items being coded
- name: "In Review"
wip_limit: 3 # Max 3 items awaiting/in review
- name: "Testing"
wip_limit: 2 # Max 2 items being tested
- name: "Done"
wip_limit: null # No limit on completed items
# Rule: If "In Dev" is at WIP limit (3), no one pulls
# from "Ready" until a dev item moves to "In Review"
Pull Systems
Traditional software development uses a push system: managers assign work to developers based on priority lists, capacity planning, and sprint commitments. Work is pushed into the system regardless of whether the system can handle it.
LEAN advocates pull systems: work is only started when there is capacity to process it. Workers pull the next item when they finish their current item. This is the fundamental mechanism behind Kanban.
| Aspect | Push System | Pull System |
|---|---|---|
| Work assignment | Manager assigns work to people | People pull work when ready |
| WIP control | Grows unbounded | Constrained by limits |
| Overload signal | Team burnout (late signal) | WIP limit hit (early signal) |
| Bottleneck visibility | Hidden in queues | Exposed immediately |
| Lead time | Unpredictable | Stabilises over time |
Why does pull reduce overproduction? Because the system only produces what is needed, when it is needed. No more "building features in advance" that may never be used. No more stacking up a backlog of code reviews that creates merge conflicts.
Continuous Improvement (Kaizen)
Kaizen (改善) means "change for better" in Japanese. It is the philosophy of small, incremental, continuous improvements rather than large, disruptive transformations. In software teams, Kaizen manifests as:
- Retrospectives — regular team reflection on what to improve (Scrum's Sprint Retrospective is a Kaizen event)
- Process experiments — try a small change for one sprint, measure the impact, decide whether to keep it
- Improvement backlogs — treat process improvements as work items alongside feature work
- Gemba walks — managers observe actual work processes (in software: sit with developers, watch the deployment process)
The PDCA Cycle
PDCA (Plan-Do-Check-Act) is the scientific method applied to process improvement:
flowchart TD
P["PLAN\nIdentify problem\nAnalyse root cause\nHypothesise solution"] --> D["DO\nImplement change\nSmall scale first\nCollect data"]
D --> C["CHECK\nMeasure results\nCompare to baseline\nDid it work?"]
C --> A["ACT\nStandardise if successful\nAdjust if partial\nAbandon if failed"]
A --> P
A3 Problem Solving
The A3 report is a structured problem-solving format that fits on a single A3-sized sheet of paper. It forces clarity and brevity. The format:
- Background: Why is this problem worth solving?
- Current condition: What is happening now? (Data, not opinions)
- Goal: What should be happening? (Measurable target)
- Root cause analysis: Why is there a gap? (5 Whys, fishbone diagrams)
- Countermeasures: What changes will address root causes?
- Implementation plan: Who, what, when?
- Follow-up: How will we know it worked?
# A3 Problem Solving Template — YAML Format
title: "Code Review Bottleneck"
owner: "Platform Team"
date: "2026-05-14"
background: |
Code reviews are our largest source of delay.
Average wait time: 2.3 days per PR.
Team frustration score: 7/10.
current_condition:
metric: "Time from PR opened to first review"
baseline: "2.3 days average (measured over 4 weeks)"
data_source: "GitHub PR analytics"
goal:
target: "First review within 4 hours"
timeline: "Achieve within 6 weeks"
root_cause_analysis:
- why: "Reviews wait 2+ days"
because: "Reviewers batch reviews to end of day"
- why: "Reviewers batch reviews"
because: "Deep work is interrupted by review requests"
- why: "Reviews interrupt deep work"
because: "No dedicated review time in schedule"
countermeasures:
- action: "Establish 'review hour' — 10-11am daily"
owner: "All developers"
expected_impact: "Reviews started within 4h"
- action: "Reduce PR size to <200 lines"
owner: "All developers"
expected_impact: "Reviews take 15min not 45min"
follow_up:
check_date: "2026-06-25"
success_metric: "95th percentile first review < 4h"
Batch Size Reduction
Batch size is one of the most powerful levers in software delivery. Smaller batches:
- Flow through the system faster (Little's Law)
- Get feedback sooner
- Have lower risk (less change = less that can go wrong)
- Are easier to review (200-line PRs are reviewed in minutes, 2000-line PRs take days)
- Have fewer merge conflicts
- Are easier to roll back
The U-Curve of Batch Size
There is an economic tradeoff in batch size:
- Transaction cost: The overhead of processing a batch (creating a PR, running CI, deploying). This cost is fixed per batch, so smaller batches increase total transaction cost.
- Holding cost: The cost of carrying inventory (merge conflicts, delayed feedback, increased risk). This cost increases with batch size.
The optimal batch size minimises total cost (transaction + holding). The LEAN approach is to reduce transaction costs (faster CI, cheaper deployments, automated testing) so that the optimal batch size shrinks.
The ideal — single-piece flow — means each change flows through the entire system independently. In software, this is trunk-based development with feature flags: every commit is a potential release. Transaction costs must be near zero (fully automated CI/CD) for this to work.
Lean Metrics
LEAN teams measure flow, not busyness. The key metrics:
| Metric | Definition | Target Direction | How to Measure |
|---|---|---|---|
| Lead Time | Time from request to delivery | ↓ Lower is better | Ticket created → deployed to production |
| Cycle Time | Time from work started to work completed | ↓ Lower is better | "In Progress" → "Done" |
| Throughput | Items completed per time period | ↑ Higher is better | Count items entering "Done" per week |
| WIP | Items currently in progress | ↓ Lower is better | Count items between "In Progress" and "Done" |
| Flow Efficiency | Active time ÷ total elapsed time | ↑ Higher is better | Value stream mapping |
Cumulative Flow Diagrams (CFDs)
A CFD is the most powerful visualisation for LEAN teams. It shows the cumulative count of items in each workflow state over time. The vertical distance between bands shows WIP; the horizontal distance shows lead time; the slope of the "Done" band shows throughput.
# Generating a Cumulative Flow Diagram
import matplotlib.pyplot as plt
import numpy as np
# Simulated data: items in each state per day
days = np.arange(1, 31)
backlog = np.maximum(50 - days * 1.5, 10)
in_progress = np.minimum(days * 0.5, 8) + np.random.randint(0, 3, 30)
done = np.cumsum(np.random.poisson(2, 30))
# Stack the areas
fig, ax = plt.subplots(figsize=(12, 6))
ax.stackplot(days, done, in_progress, backlog,
labels=['Done', 'In Progress', 'Backlog'],
colors=['#3B9797', '#16476A', '#BF092F'],
alpha=0.8)
ax.set_xlabel('Day')
ax.set_ylabel('Cumulative Items')
ax.set_title('Cumulative Flow Diagram')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Case Studies
Toyota's Influence on Tech
Toyota's production system directly influenced: Kanban (David Anderson, 2010), Lean Startup (Eric Ries, 2011), Continuous Delivery (Jez Humble & Dave Farley, 2010), and DevOps (The Phoenix Project, Gene Kim, 2013). All trace their intellectual heritage to Taiichi Ohno's shop floor innovations. The core insight that crossed from manufacturing to software: flow trumps utilisation — it is better to have developers waiting for work than work waiting for developers.
Amazon's Two-Pizza Teams as Lean Units
Jeff Bezos's "two-pizza teams" (teams small enough to feed with two pizzas, ~6-8 people) are LEAN units in disguise. Each team owns a service end-to-end — they build it, deploy it, run it, and respond to its incidents. This eliminates handoffs (waste #4), enables fast decision-making (principle #3), and empowers the team (principle #5). The result: Amazon deploys to production every 11.7 seconds on average, with thousands of independent teams operating as autonomous LEAN value streams.
Spotify's Squad Model Through a Lean Lens
Spotify organised into "squads" (stream-aligned teams), "tribes" (collections of related squads), "chapters" (discipline communities), and "guilds" (cross-cutting interest groups). Through the LEAN lens: squads minimise handoffs by owning features end-to-end; tribes optimise the whole (principle #7) by aligning related work; chapters amplify learning (principle #2) by sharing expertise across squads. The model is not perfect — Spotify themselves have acknowledged it evolved significantly — but it demonstrates LEAN thinking at scale.
Exercises
Conclusion & Next Steps
LEAN thinking is the most powerful process framework available to software teams because it is not prescriptive — it gives you thinking tools to diagnose and improve any delivery system. The seven principles provide the philosophy; the seven wastes give you a taxonomy of problems; value stream mapping makes those problems visible; WIP limits and pull systems provide the mechanism for improvement; and Kaizen ensures you never stop getting better.
The most important shift is this: stop optimising for resource utilisation and start optimising for flow. A developer who is 100% utilised produces maximum WIP, maximum wait times, and minimum throughput. A developer who is 80% utilised has slack to respond to pull signals, help teammates, and maintain flow.
Next in the Series
In Part 41: Teams, Speed vs Quality & High-Performance Delivery, we explore the human side of delivery — team topologies, Conway's Law, the speed-vs-quality myth, psychological safety, and engineering culture patterns that scale.