Back to Software Engineering & Delivery Mastery Series

Part 25: DORA Metrics & Delivery Performance

May 13, 2026 Wasil Zafar 44 min read

"What gets measured gets managed." The DORA research program has identified exactly four metrics that predict software delivery performance and organisational outcomes. This article teaches you to measure, interpret, and improve each one — backed by a decade of industry research across 36,000+ professionals.

Table of Contents

  1. Introduction
  2. The Four Key Metrics
  3. Performance Tiers
  4. Measuring DORA Metrics
  5. Beyond DORA
  6. Improving Each Metric
  7. Anti-Patterns in Metrics
  8. Case Studies
  9. Building a Metrics Program
  10. Exercises
  11. Conclusion & Next Steps

Introduction

For decades, software teams struggled with a fundamental question: How do we know if we're doing well? Lines of code? Story points completed? Number of deploys? These metrics either incentivised wrong behaviour or measured activity rather than outcomes.

In 2014, Dr. Nicole Forsgren, Jez Humble, and Gene Kim began the DORA (DevOps Research and Assessment) research program. Over seven years and 36,000+ survey respondents across every industry, they identified exactly four metrics that reliably predict both software delivery performance and organisational performance (profitability, market share, customer satisfaction).

These aren't arbitrary KPIs. They're statistically validated predictors with proven causal relationships. Teams that excel at these four metrics deliver software faster, more reliably, and with fewer defects — and their organisations outperform competitors commercially.

Key Insight: The DORA research destroyed the myth that speed and stability are tradeoffs. Elite performers are simultaneously faster (multiple deploys per day) and more stable (lower failure rates, faster recovery). Speed and stability reinforce each other when the right practices are in place.

The Accelerate Research

The findings were published in Accelerate: The Science of Lean Software and DevOps (2018) and the annual State of DevOps Reports (2014–present, now published by Google Cloud DORA team). Key findings include:

  • Elite performers deploy 973x more frequently than low performers
  • Lead time for elite teams is 6,570x shorter than low performers
  • Change failure rate is 3x lower for elite performers
  • Recovery time is 6,570x faster for elite performers
  • High-performing teams are 2x more likely to exceed organisational goals
  • These capabilities predict commercial outcomes, not just technical ones

The research methodology uses cluster analysis to identify natural performance groups, structural equation modelling to establish causation, and survey-based measurement validated against objective data. It's the most rigorous empirical research in software engineering.

The Four Key Metrics

DORA measures delivery performance along two axes: throughput (how fast you deliver value) and stability (how reliably you deliver value). Two metrics measure each axis:

DORA Metrics: Throughput vs Stability
quadrantChart
    title DORA Metrics Framework
    x-axis "Low Throughput" --> "High Throughput"
    y-axis "Low Stability" --> "High Stability"
    quadrant-1 "Elite Performers"
    quadrant-2 "Reliable but Slow"
    quadrant-3 "Low Performers"
    quadrant-4 "Fast but Fragile"
    "Deployment Frequency": [0.85, 0.8]
    "Lead Time": [0.75, 0.7]
    "Change Failure Rate": [0.3, 0.85]
    "MTTR": [0.25, 0.75]
                            

Deployment Frequency (DF)

Definition: How often your organisation deploys code to production (or releases it to end users).

Deployment frequency is the clearest indicator of batch size. High deployment frequency means small batches, which means lower risk per change, faster feedback, and easier debugging when something goes wrong. If you deploy 100 lines of code and something breaks, you know exactly where to look. If you deploy 10,000 lines, good luck.

Performance Tier Deployment Frequency Batch Size Implication
Elite Multiple times per day (on demand) Individual commits, feature flags
High Once per day to once per week Small feature branches, few days work
Medium Once per week to once per month Sprint-sized releases
Low Once per month to once per six months Large batches, release trains
The Batch Size Paradox: Intuition says "deploy less often to reduce risk." The data proves the opposite. Deploying more frequently with smaller changes reduces risk because: (1) each change is simple enough to understand, (2) failures are isolated to a known change, (3) rollbacks are trivial, and (4) the feedback loop is tight enough to catch problems early.

Lead Time for Changes (LT)

Definition: The time from when code is committed to when it is running in production and available to users.

Lead time measures the efficiency of your entire delivery pipeline — from the moment a developer finishes writing code to the moment it creates value for users. It encompasses code review, CI pipeline execution, approval gates, staging environments, and deployment processes.

Performance Tier Lead Time Where Time is Spent
Elite Less than 1 hour Automated pipeline, trunk-based dev
High 1 day to 1 week Code review + automated deploy
Medium 1 week to 1 month Manual testing, approval committees
Low 1 month to 6 months Change advisory boards, manual deployments

Change Failure Rate (CFR)

Definition: The percentage of deployments that cause a failure in production — requiring a rollback, hotfix, patch, or incident response.

Change failure rate measures the quality of your delivery process. A low CFR means your testing, review, and deployment practices catch problems before they reach users. It doesn't mean you never fail — it means you fail infrequently and predictably.

Performance Tier Change Failure Rate What This Means
Elite 0–15% Extensive automated testing, canary deployments
High 16–30% Good test coverage, some gaps in integration
Medium 16–30% Variable testing, reactive quality practices
Low 46–60% Insufficient testing, large batches mask root causes

Mean Time to Recovery (MTTR)

Definition: How quickly a service is restored after an incident or degradation — from detection to resolution.

MTTR measures your resilience. Failures are inevitable; what matters is how quickly you detect, diagnose, and recover. Elite teams prioritise MTTR over MTBF (Mean Time Between Failures) because in complex systems, preventing all failures is impossible — but fast recovery is achievable.

Performance Tier MTTR Key Enablers
Elite Less than 1 hour Automated rollback, observability, runbooks
High Less than 1 day On-call rotation, basic monitoring, manual rollback
Medium 1 day to 1 week Escalation-heavy, limited observability
Low 1 week to 6 months No rollback capability, manual processes
MTTR vs MTBF: Traditional enterprise thinking optimises for MTBF — "prevent failures at all costs." This leads to heavy change approval processes that slow delivery and paradoxically increase failure rates (because large batches are harder to test). Modern thinking optimises for MTTR — "failures will happen; recover fast." This enables high deployment frequency while maintaining stability.

Performance Tiers

DORA identifies four performance clusters using statistical analysis. These aren't arbitrary divisions — they represent natural groupings found in the data:

Metric Elite High Medium Low
Deployment Frequency Multiple/day Daily–Weekly Weekly–Monthly Monthly–Biannually
Lead Time < 1 hour 1 day – 1 week 1 week – 1 month 1 month – 6 months
Change Failure Rate 0–15% 16–30% 16–30% 46–60%
MTTR < 1 hour < 1 day 1 day – 1 week 1 week – 6 months

The Widening Gap

Year over year, the State of DevOps Reports show that the gap between elite and low performers is growing. Elite teams continue accelerating while low performers stagnate. This creates a compounding effect — organisations that invest in delivery capabilities gain exponential advantages over those that don't.

The 2023 report found that elite performers now represent approximately 18% of respondents (up from 7% in 2018), suggesting that more teams are achieving elite status — but the bar keeps rising as elite teams continue improving.

Measuring DORA Metrics

Measurement must be automated and objective. Self-reported metrics are unreliable due to cognitive biases. Pull data from your systems:

DORA Metrics Measurement Pipeline
flowchart LR
    A[Git Repository] -->|Commit timestamps| E[Metrics Engine]
    B[CI/CD Platform] -->|Build/Deploy events| E
    C[Incident Manager] -->|Incident open/close| E
    D[Monitoring/Alerting] -->|Detection time| E
    E --> F[Dashboard]
    E --> G[Trend Analysis]
    E --> H[Team Comparisons]

    style E fill:#3B9797,color:#fff
    style F fill:#f8f9fa,stroke:#333
                            

Data Sources by Metric

Metric Data Source Calculation
DF CI/CD deployment events Count of successful production deploys / time period
LT Git + CI/CD timestamps Median(deploy_time - first_commit_time) per change
CFR Incident tracker + deploy logs Failed deploys / total deploys × 100%
MTTR Incident tracker Median(resolved_time - detected_time) across incidents

Tools & Dashboards

import datetime
import statistics

def calculate_dora_metrics(deploys, incidents, period_days=30):
    """
    Calculate DORA metrics from deployment and incident data.

    Args:
        deploys: List of dicts with 'timestamp', 'commit_time', 'caused_incident'
        incidents: List of dicts with 'detected_at', 'resolved_at'
        period_days: Measurement period in days
    """
    # 1. Deployment Frequency
    deployment_frequency = len(deploys) / period_days
    df_label = classify_df(deployment_frequency)

    # 2. Lead Time for Changes (median)
    lead_times = []
    for deploy in deploys:
        lt = (deploy['timestamp'] - deploy['commit_time']).total_seconds() / 3600
        lead_times.append(lt)  # in hours
    median_lead_time = statistics.median(lead_times) if lead_times else 0

    # 3. Change Failure Rate
    failed_deploys = sum(1 for d in deploys if d['caused_incident'])
    change_failure_rate = (failed_deploys / len(deploys) * 100) if deploys else 0

    # 4. Mean Time to Recovery (median)
    recovery_times = []
    for incident in incidents:
        rt = (incident['resolved_at'] - incident['detected_at']).total_seconds() / 3600
        recovery_times.append(rt)  # in hours
    median_mttr = statistics.median(recovery_times) if recovery_times else 0

    return {
        'deployment_frequency': {
            'value': round(deployment_frequency, 2),
            'unit': 'deploys/day',
            'tier': df_label
        },
        'lead_time': {
            'value': round(median_lead_time, 1),
            'unit': 'hours',
            'tier': classify_lt(median_lead_time)
        },
        'change_failure_rate': {
            'value': round(change_failure_rate, 1),
            'unit': '%',
            'tier': classify_cfr(change_failure_rate)
        },
        'mttr': {
            'value': round(median_mttr, 1),
            'unit': 'hours',
            'tier': classify_mttr(median_mttr)
        }
    }

def classify_df(deploys_per_day):
    if deploys_per_day >= 1:
        return 'Elite'
    elif deploys_per_day >= 1/7:
        return 'High'
    elif deploys_per_day >= 1/30:
        return 'Medium'
    return 'Low'

def classify_lt(hours):
    if hours < 1:
        return 'Elite'
    elif hours < 168:  # 1 week
        return 'High'
    elif hours < 720:  # 1 month
        return 'Medium'
    return 'Low'

def classify_cfr(percentage):
    if percentage <= 15:
        return 'Elite'
    elif percentage <= 30:
        return 'High'
    return 'Low'

def classify_mttr(hours):
    if hours < 1:
        return 'Elite'
    elif hours < 24:
        return 'High'
    elif hours < 168:
        return 'Medium'
    return 'Low'

# Example usage with sample data
deploys = [
    {'timestamp': datetime.datetime(2026, 5, 13, 14, 0),
     'commit_time': datetime.datetime(2026, 5, 13, 12, 30),
     'caused_incident': False},
    {'timestamp': datetime.datetime(2026, 5, 13, 16, 0),
     'commit_time': datetime.datetime(2026, 5, 13, 14, 45),
     'caused_incident': False},
    {'timestamp': datetime.datetime(2026, 5, 12, 10, 0),
     'commit_time': datetime.datetime(2026, 5, 12, 8, 0),
     'caused_incident': True},
]

incidents = [
    {'detected_at': datetime.datetime(2026, 5, 12, 10, 5),
     'resolved_at': datetime.datetime(2026, 5, 12, 10, 35)},
]

metrics = calculate_dora_metrics(deploys, incidents, period_days=7)
for name, data in metrics.items():
    print(f"{name}: {data['value']} {data['unit']} ({data['tier']})")

Purpose-built DORA measurement tools include:

  • Four Keys (Google) — Open-source DORA dashboard using BigQuery and Cloud Build events
  • LinearB — Git analytics platform with built-in DORA tracking
  • Sleuth — Deployment tracking focused specifically on DORA metrics
  • Jellyfish — Engineering management platform with DORA and business alignment
  • Faros AI — Open-source connector that aggregates data from 50+ tools
  • Backstage + plugins — Internal developer portal with DORA metric widgets

Beyond DORA

While the four key metrics remain the foundation, the research community has expanded measurement to address additional dimensions of software delivery and developer experience.

The Reliability Metric (5th DORA Metric)

In 2022, the DORA team added a fifth metric: Reliability — whether a team meets or exceeds its reliability targets (SLOs). This acknowledges that operational performance matters alongside delivery speed. A team deploying 100 times per day but consistently missing SLOs isn't performing well.

The SPACE Framework

Published by Forsgren, Storey, Maddila, Zimmermann, and others at Microsoft Research (2021), SPACE measures developer productivity across five dimensions:

Dimension What It Measures Example Metrics
Satisfaction Developer happiness and fulfilment Survey scores, eNPS, retention
Performance Outcomes and quality of work Customer impact, reliability, code quality
Activity Volume of actions (use carefully) PRs merged, deployments, code reviews completed
Communication Collaboration and knowledge flow PR review time, documentation, onboarding speed
Efficiency Minimal friction and interruptions Flow state time, context switches, wait time

Flow Metrics

Flow metrics (from Mik Kersten's Project to Product) measure value stream efficiency:

  • Flow Time — Total time from work item creation to delivery (includes wait time)
  • Flow Efficiency — Active work time / total flow time × 100% (typically 15–40%)
  • Flow Load — Work items in progress (correlates with lead time via Little's Law)
  • Flow Velocity — Number of items completed per time period
  • Flow Distribution — Ratio of features vs defects vs debt vs risk work

Improving Each Metric

Improving Deployment Frequency

The path from monthly to daily (or multiple daily) deployments requires changes across code, process, and culture:

  • Trunk-based development — Short-lived branches (hours, not weeks). Merge to main daily
  • Feature flags — Decouple deployment from release. Deploy dark features that can be enabled independently
  • Automated deployments — Every merge to main triggers production deployment (with gates)
  • Smaller batches — Break large features into independently deployable increments
  • Decouple services — Independent deployability means one team's changes don't block another's
# GitOps: Automated deployment on merge to main
# ArgoCD Application watching main branch
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-service
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/my-org/payment-service
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground

Improving Lead Time

Lead time is the sum of all wait times and processing times in your pipeline. To reduce it, identify and eliminate bottlenecks:

import datetime

# Lead time breakdown analysis
def analyse_lead_time(change):
    """Break down where time is spent in the delivery pipeline."""
    stages = {
        'coding': change['pr_opened'] - change['first_commit'],
        'review_wait': change['first_review'] - change['pr_opened'],
        'review_cycles': change['approved'] - change['first_review'],
        'ci_pipeline': change['ci_complete'] - change['approved'],
        'deploy_wait': change['deploy_start'] - change['ci_complete'],
        'deployment': change['deploy_complete'] - change['deploy_start'],
    }

    total = sum(stages.values(), datetime.timedelta())
    print(f"Total Lead Time: {total}")
    print(f"\nBreakdown:")
    for stage, duration in stages.items():
        pct = duration / total * 100
        bar = '█' * int(pct / 2)
        print(f"  {stage:20s}: {str(duration):>15s} ({pct:.0f}%) {bar}")

    # Identify biggest bottleneck
    bottleneck = max(stages, key=stages.get)
    print(f"\nBottleneck: {bottleneck} ({stages[bottleneck]})")
    return stages

# Example: typical team before optimisation
sample_change = {
    'first_commit': datetime.datetime(2026, 5, 10, 9, 0),
    'pr_opened': datetime.datetime(2026, 5, 10, 11, 0),
    'first_review': datetime.datetime(2026, 5, 11, 14, 0),   # Next day!
    'approved': datetime.datetime(2026, 5, 12, 10, 0),
    'ci_complete': datetime.datetime(2026, 5, 12, 10, 45),
    'deploy_start': datetime.datetime(2026, 5, 13, 9, 0),    # Next deploy window
    'deploy_complete': datetime.datetime(2026, 5, 13, 9, 15),
}

analyse_lead_time(sample_change)

Common lead time improvements:

  • Reduce review wait time — Set team SLAs for PR review (e.g., first review within 4 hours)
  • Parallelise CI — Run tests concurrently; aim for sub-10-minute pipelines
  • Eliminate manual gates — Replace change advisory boards with automated policy checks
  • Deploy continuously — Remove "deploy windows"; deploy any time CI passes

Improving Change Failure Rate

  • Comprehensive automated testing — Unit, integration, contract, and end-to-end tests in CI
  • Canary deployments — Route 1–5% of traffic to new version; monitor before full rollout
  • Progressive delivery — Ring-based deployments, starting with internal users
  • Feature flags with kill switches — Instant disable without redeployment
  • Smaller batch sizes — Fewer changes per deployment = fewer failure modes

Improving MTTR

  • Observability — Distributed tracing, structured logging, real-time dashboards
  • Automated rollback — One-click (or zero-click) revert to last known good state
  • Runbooks — Pre-written incident response procedures for known failure modes
  • Incident response automation — PagerDuty/Opsgenie escalation, auto-creation of war rooms
  • Chaos engineering — Practice recovery regularly so it's muscle memory during real incidents
# Argo Rollouts: Automated canary with auto-rollback
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-api
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 5       # 5% traffic to canary
        - pause: {duration: 5m}
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: payment-api
        - setWeight: 25      # 25% if analysis passes
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: latency-check
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100     # Full rollout
      # Auto-rollback if any analysis fails
      rollbackWindow:
        revisions: 2
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
    - name: success-rate
      interval: 60s
      successCondition: result[0] >= 0.99
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

Anti-Patterns in Metrics

Metrics are powerful — but they can be weaponised. Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." Here's how teams game DORA metrics and why it's destructive:

Anti-Pattern What It Looks Like Why It's Harmful
Empty Deploys Deploying trivial changes to inflate DF Measures activity, not value delivery
Skipping Reviews Auto-merging to reduce lead time Trades quality for speed numbers
Hiding Incidents Not reporting failures to keep CFR low Destroys learning and trust
Premature Closure Closing incidents before full resolution Artificially lowers MTTR, problems recur
Individual Rankings Using metrics to compare/rank developers Kills collaboration, incentivises gaming
Target Fixation Setting rigid targets ("deploy 10x/day or else") Context-free targets drive wrong behaviour
Critical Warning: DORA metrics are team-level and system-level metrics. They should never be used to evaluate individual developers. The moment you tie metrics to performance reviews, you incentivise gaming. Use metrics for learning and improvement, not judgement and punishment.

Case Studies

Case Study

Etsy: From Monthly Releases to 50+ Deploys Per Day

In 2009, Etsy deployed once every two weeks with a dedicated "deploy day" that took 4+ hours and frequently caused outages. By 2014, they averaged 50+ deploys per day with a change failure rate below 5%. How?

Key changes: (1) Eliminated the release branch — all developers committed to trunk. (2) Built "Deployinator" — a one-click deploy tool any engineer could use. (3) Adopted feature flags for everything — deploy code dark, enable separately. (4) Invested heavily in monitoring — every deploy correlated with real-time metrics dashboards. (5) Made deploy responsibility distributed — the person who wrote the code deploys it.

Result: Deployment time dropped from 4 hours to 15 minutes. Incident rate dropped 80%. Engineer satisfaction improved dramatically because "deploy fear" disappeared. Revenue grew 40% YoY during the transformation period.

Trunk-Based Dev Feature Flags Continuous Deployment
Case Study

Capital One: DORA-Driven Transformation in Financial Services

Capital One faced a common financial services challenge: heavy regulation demanding stability, but business requirements demanding speed. They adopted DORA metrics as their transformation compass.

Approach: (1) Baseline measurement across 300+ engineering teams. (2) Identified that 80% of lead time was wait time (approvals, manual testing, deploy windows). (3) Replaced manual change advisory boards with automated policy-as-code. (4) Invested in test automation to replace manual QA gates. (5) Moved to immutable infrastructure and blue-green deployments for safe rollbacks.

Result over 3 years: Deployment frequency improved from monthly to daily for most teams. Lead time dropped from 2 weeks to under 1 day. MTTR improved from days to under 2 hours. All while maintaining regulatory compliance and passing SOC2 audits with automated evidence collection.

Financial Services Policy as Code Automated Compliance

Building a Metrics Program

Implementing DORA metrics isn't a one-day project. It's a cultural shift that requires careful introduction:

Phase 1: Baseline (Weeks 1–4)

  • Identify data sources (CI/CD platform, incident tracker, source control)
  • Instrument measurement collection (automated, not manual)
  • Calculate current state for all four metrics
  • Share results transparently with the team (no blame, just facts)

Phase 2: Understand (Weeks 5–8)

  • Discuss what the numbers mean with the team
  • Identify the biggest constraint (usually lead time or MTTR)
  • Map the value stream to find where time is lost
  • Generate improvement hypotheses collaboratively

Phase 3: Improve (Ongoing)

  • Pick one metric to improve first (the biggest constraint)
  • Run time-boxed experiments (2–4 weeks)
  • Measure the impact of each change
  • Celebrate improvements publicly
  • Iterate — once one metric improves, address the next constraint
{
  "metrics_program": {
    "team": "payments-squad",
    "baseline_date": "2026-04-01",
    "baseline": {
      "deployment_frequency": "2 per week",
      "lead_time_hours": 72,
      "change_failure_rate_pct": 22,
      "mttr_hours": 4.5
    },
    "current_tier": "High",
    "target_tier": "Elite",
    "focus_metric": "lead_time",
    "improvement_hypothesis": "Reducing PR review wait time from 18h to 4h will cut lead time by 40%",
    "experiment": {
      "intervention": "Implement async code review SLA: first review within 4 hours",
      "start_date": "2026-05-13",
      "duration_weeks": 4,
      "success_criteria": "Median lead time drops below 24 hours"
    },
    "review_cadence": "Bi-weekly metrics retro"
  }
}
Best Practice: Start with team-owned dashboards, not management reports. When teams own their metrics and use them for self-improvement, adoption is organic. When metrics are imposed top-down for reporting, they become a compliance burden that teams resent and game.

Exercises

Exercise 1: Calculate DORA Metrics
Given the following data for a team over the past 30 days: 45 production deployments, median commit-to-deploy time of 6.5 hours, 4 deployments caused incidents, incidents resolved in 25min, 40min, 2hr, and 45min respectively. Calculate all four DORA metrics, classify the team's tier for each, and determine their overall performance category. What single improvement would have the biggest impact?
Exercise 2: Identify Bottlenecks
A team has the following lead time breakdown: Coding (2h), PR wait (22h), Review (3h), CI pipeline (45min), Deploy approval (8h), Deployment (10min). Total: ~36 hours. (1) Identify the two biggest bottlenecks. (2) Propose specific interventions for each. (3) Estimate the new lead time if your interventions succeed. (4) What tier would the team move to?
Exercise 3: Design an Improvement Plan
Your team is currently at "Medium" tier (weekly deploys, 2-week lead time, 25% CFR, 18-hour MTTR). Design a 12-week improvement plan targeting "High" tier. For each metric: (1) Define the target value, (2) List 2-3 specific technical or process changes, (3) Identify dependencies and risks, (4) Define how you'll measure progress weekly.
Exercise 4: Set Up Measurement
Choose your team's actual CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.) and design a measurement pipeline that automatically calculates DORA metrics. Document: (1) Which events/webhooks provide the data, (2) How you'll store historical data, (3) How you'll visualise trends (tool choice + dashboard mockup), (4) How you'll handle edge cases (reverts, hotfixes, maintenance deploys).

Conclusion & Next Steps

DORA metrics give you an evidence-based compass for software delivery improvement. They tell you where you are, where you can go, and — when measured over time — whether your investments in tooling, process, and culture are actually working.

The key lessons: speed and stability are not tradeoffs — they reinforce each other. Smaller batches reduce risk rather than increasing it. Metrics are for learning, not punishment. And the gap between elite and low performers is growing — the time to invest in delivery performance is now.

Remember: you can't improve what you don't measure, but you can destroy what you measure badly. Use DORA metrics with wisdom — as a mirror for self-reflection, not a weapon for judgement.

Next in the Series

In Part 26: Reliability & Observability, we'll build on MTTR by learning how to instrument systems for deep observability — distributed tracing, SLOs, error budgets, and the practices that make sub-hour recovery possible.