Part 25: DORA Metrics & Delivery Performance

Introduction

For decades, software teams struggled with a fundamental question: How do we know if we're doing well? Lines of code? Story points completed? Number of deploys? These metrics either incentivised wrong behaviour or measured activity rather than outcomes.

In 2014, Dr. Nicole Forsgren, Jez Humble, and Gene Kim began the DORA (DevOps Research and Assessment) research program. Over seven years and 36,000+ survey respondents across every industry, they identified exactly four metrics that reliably predict both software delivery performance and organisational performance (profitability, market share, customer satisfaction).

These aren't arbitrary KPIs. They're statistically validated predictors with proven causal relationships. Teams that excel at these four metrics deliver software faster, more reliably, and with fewer defects — and their organisations outperform competitors commercially.

                            
                            Key Insight: The DORA research destroyed the myth that speed and stability are tradeoffs. Elite performers are simultaneously faster (multiple deploys per day) and more stable (lower failure rates, faster recovery). Speed and stability reinforce each other when the right practices are in place.
                        

The Accelerate Research

The findings were published in Accelerate: The Science of Lean Software and DevOps (2018) and the annual State of DevOps Reports (2014–present, now published by Google Cloud DORA team). Key findings include:

Elite performers deploy 973x more frequently than low performers
Lead time for elite teams is 6,570x shorter than low performers
Change failure rate is 3x lower for elite performers
Recovery time is 6,570x faster for elite performers
High-performing teams are 2x more likely to exceed organisational goals
These capabilities predict commercial outcomes, not just technical ones

The research methodology uses cluster analysis to identify natural performance groups, structural equation modelling to establish causation, and survey-based measurement validated against objective data. It's the most rigorous empirical research in software engineering.

The Four Key Metrics

DORA measures delivery performance along two axes: throughput (how fast you deliver value) and stability (how reliably you deliver value). Two metrics measure each axis:

DORA Metrics: Throughput vs Stability

quadrantChart
    title DORA Metrics Framework
    x-axis "Low Throughput" --> "High Throughput"
    y-axis "Low Stability" --> "High Stability"
    quadrant-1 "Elite Performers"
    quadrant-2 "Reliable but Slow"
    quadrant-3 "Low Performers"
    quadrant-4 "Fast but Fragile"
    "Deployment Frequency": [0.85, 0.8]
    "Lead Time": [0.75, 0.7]
    "Change Failure Rate": [0.3, 0.85]
    "MTTR": [0.25, 0.75]

Deployment Frequency (DF)

Definition: How often your organisation deploys code to production (or releases it to end users).

Deployment frequency is the clearest indicator of batch size. High deployment frequency means small batches, which means lower risk per change, faster feedback, and easier debugging when something goes wrong. If you deploy 100 lines of code and something breaks, you know exactly where to look. If you deploy 10,000 lines, good luck.

Performance Tier	Deployment Frequency	Batch Size Implication
Elite	Multiple times per day (on demand)	Individual commits, feature flags
High	Once per day to once per week	Small feature branches, few days work
Medium	Once per week to once per month	Sprint-sized releases
Low	Once per month to once per six months	Large batches, release trains

                            
                            The Batch Size Paradox: Intuition says "deploy less often to reduce risk." The data proves the opposite. Deploying more frequently with smaller changes reduces risk because: (1) each change is simple enough to understand, (2) failures are isolated to a known change, (3) rollbacks are trivial, and (4) the feedback loop is tight enough to catch problems early.
                        

Lead Time for Changes (LT)

Definition: The time from when code is committed to when it is running in production and available to users.

Lead time measures the efficiency of your entire delivery pipeline — from the moment a developer finishes writing code to the moment it creates value for users. It encompasses code review, CI pipeline execution, approval gates, staging environments, and deployment processes.

Performance Tier	Lead Time	Where Time is Spent
Elite	Less than 1 hour	Automated pipeline, trunk-based dev
High	1 day to 1 week	Code review + automated deploy
Medium	1 week to 1 month	Manual testing, approval committees
Low	1 month to 6 months	Change advisory boards, manual deployments

Change Failure Rate (CFR)

Definition: The percentage of deployments that cause a failure in production — requiring a rollback, hotfix, patch, or incident response.

Change failure rate measures the quality of your delivery process. A low CFR means your testing, review, and deployment practices catch problems before they reach users. It doesn't mean you never fail — it means you fail infrequently and predictably.

Performance Tier	Change Failure Rate	What This Means
Elite	0–15%	Extensive automated testing, canary deployments
High	16–30%	Good test coverage, some gaps in integration
Medium	16–30%	Variable testing, reactive quality practices
Low	46–60%	Insufficient testing, large batches mask root causes

Mean Time to Recovery (MTTR)

Definition: How quickly a service is restored after an incident or degradation — from detection to resolution.

MTTR measures your resilience. Failures are inevitable; what matters is how quickly you detect, diagnose, and recover. Elite teams prioritise MTTR over MTBF (Mean Time Between Failures) because in complex systems, preventing all failures is impossible — but fast recovery is achievable.

Performance Tier	MTTR	Key Enablers
Elite	Less than 1 hour	Automated rollback, observability, runbooks
High	Less than 1 day	On-call rotation, basic monitoring, manual rollback
Medium	1 day to 1 week	Escalation-heavy, limited observability
Low	1 week to 6 months	No rollback capability, manual processes

                            
                            MTTR vs MTBF: Traditional enterprise thinking optimises for MTBF — "prevent failures at all costs." This leads to heavy change approval processes that slow delivery and paradoxically increase failure rates (because large batches are harder to test). Modern thinking optimises for MTTR — "failures will happen; recover fast." This enables high deployment frequency while maintaining stability.
                        

Performance Tiers

DORA identifies four performance clusters using statistical analysis. These aren't arbitrary divisions — they represent natural groupings found in the data:

Metric	Elite	High	Medium	Low
Deployment Frequency	Multiple/day	Daily–Weekly	Weekly–Monthly	Monthly–Biannually
Lead Time	< 1 hour	1 day – 1 week	1 week – 1 month	1 month – 6 months
Change Failure Rate	0–15%	16–30%	16–30%	46–60%
MTTR	< 1 hour	< 1 day	1 day – 1 week	1 week – 6 months

The Widening Gap

Year over year, the State of DevOps Reports show that the gap between elite and low performers is growing. Elite teams continue accelerating while low performers stagnate. This creates a compounding effect — organisations that invest in delivery capabilities gain exponential advantages over those that don't.

The 2023 report found that elite performers now represent approximately 18% of respondents (up from 7% in 2018), suggesting that more teams are achieving elite status — but the bar keeps rising as elite teams continue improving.

Measuring DORA Metrics

Measurement must be automated and objective. Self-reported metrics are unreliable due to cognitive biases. Pull data from your systems:

DORA Metrics Measurement Pipeline

flowchart LR
    A[Git Repository] -->|Commit timestamps| E[Metrics Engine]
    B[CI/CD Platform] -->|Build/Deploy events| E
    C[Incident Manager] -->|Incident open/close| E
    D[Monitoring/Alerting] -->|Detection time| E
    E --> F[Dashboard]
    E --> G[Trend Analysis]
    E --> H[Team Comparisons]

    style E fill:#3B9797,color:#fff
    style F fill:#f8f9fa,stroke:#333

Data Sources by Metric

Metric	Data Source	Calculation
DF	CI/CD deployment events	Count of successful production deploys / time period
LT	Git + CI/CD timestamps	Median(deploy_time - first_commit_time) per change
CFR	Incident tracker + deploy logs	Failed deploys / total deploys × 100%
MTTR	Incident tracker	Median(resolved_time - detected_time) across incidents

Tools & Dashboards

import datetime
import statistics

def calculate_dora_metrics(deploys, incidents, period_days=30):
    """
    Calculate DORA metrics from deployment and incident data.

    Args:
        deploys: List of dicts with 'timestamp', 'commit_time', 'caused_incident'
        incidents: List of dicts with 'detected_at', 'resolved_at'
        period_days: Measurement period in days
    """
    # 1. Deployment Frequency
    deployment_frequency = len(deploys) / period_days
    df_label = classify_df(deployment_frequency)

    # 2. Lead Time for Changes (median)
    lead_times = []
    for deploy in deploys:
        lt = (deploy['timestamp'] - deploy['commit_time']).total_seconds() / 3600
        lead_times.append(lt)  # in hours
    median_lead_time = statistics.median(lead_times) if lead_times else 0

    # 3. Change Failure Rate
    failed_deploys = sum(1 for d in deploys if d['caused_incident'])
    change_failure_rate = (failed_deploys / len(deploys) * 100) if deploys else 0

    # 4. Mean Time to Recovery (median)
    recovery_times = []
    for incident in incidents:
        rt = (incident['resolved_at'] - incident['detected_at']).total_seconds() / 3600
        recovery_times.append(rt)  # in hours
    median_mttr = statistics.median(recovery_times) if recovery_times else 0

    return {
        'deployment_frequency': {
            'value': round(deployment_frequency, 2),
            'unit': 'deploys/day',
            'tier': df_label
        },
        'lead_time': {
            'value': round(median_lead_time, 1),
            'unit': 'hours',
            'tier': classify_lt(median_lead_time)
        },
        'change_failure_rate': {
            'value': round(change_failure_rate, 1),
            'unit': '%',
            'tier': classify_cfr(change_failure_rate)
        },
        'mttr': {
            'value': round(median_mttr, 1),
            'unit': 'hours',
            'tier': classify_mttr(median_mttr)
        }
    }

def classify_df(deploys_per_day):
    if deploys_per_day >= 1:
        return 'Elite'
    elif deploys_per_day >= 1/7:
        return 'High'
    elif deploys_per_day >= 1/30:
        return 'Medium'
    return 'Low'

def classify_lt(hours):
    if hours < 1:
        return 'Elite'
    elif hours < 168:  # 1 week
        return 'High'
    elif hours < 720:  # 1 month
        return 'Medium'
    return 'Low'

def classify_cfr(percentage):
    if percentage <= 15:
        return 'Elite'
    elif percentage <= 30:
        return 'High'
    return 'Low'

def classify_mttr(hours):
    if hours < 1:
        return 'Elite'
    elif hours < 24:
        return 'High'
    elif hours < 168:
        return 'Medium'
    return 'Low'

# Example usage with sample data
deploys = [
    {'timestamp': datetime.datetime(2026, 5, 13, 14, 0),
     'commit_time': datetime.datetime(2026, 5, 13, 12, 30),
     'caused_incident': False},
    {'timestamp': datetime.datetime(2026, 5, 13, 16, 0),
     'commit_time': datetime.datetime(2026, 5, 13, 14, 45),
     'caused_incident': False},
    {'timestamp': datetime.datetime(2026, 5, 12, 10, 0),
     'commit_time': datetime.datetime(2026, 5, 12, 8, 0),
     'caused_incident': True},
]

incidents = [
    {'detected_at': datetime.datetime(2026, 5, 12, 10, 5),
     'resolved_at': datetime.datetime(2026, 5, 12, 10, 35)},
]

metrics = calculate_dora_metrics(deploys, incidents, period_days=7)
for name, data in metrics.items():
    print(f"{name}: {data['value']} {data['unit']} ({data['tier']})")

Purpose-built DORA measurement tools include:

Four Keys (Google) — Open-source DORA dashboard using BigQuery and Cloud Build events
LinearB — Git analytics platform with built-in DORA tracking
Sleuth — Deployment tracking focused specifically on DORA metrics
Jellyfish — Engineering management platform with DORA and business alignment
Faros AI — Open-source connector that aggregates data from 50+ tools
Backstage + plugins — Internal developer portal with DORA metric widgets

Beyond DORA

While the four key metrics remain the foundation, the research community has expanded measurement to address additional dimensions of software delivery and developer experience.

The Reliability Metric (5th DORA Metric)

In 2022, the DORA team added a fifth metric: Reliability — whether a team meets or exceeds its reliability targets (SLOs). This acknowledges that operational performance matters alongside delivery speed. A team deploying 100 times per day but consistently missing SLOs isn't performing well.

The SPACE Framework

Published by Forsgren, Storey, Maddila, Zimmermann, and others at Microsoft Research (2021), SPACE measures developer productivity across five dimensions:

Dimension	What It Measures	Example Metrics
Satisfaction	Developer happiness and fulfilment	Survey scores, eNPS, retention
Performance	Outcomes and quality of work	Customer impact, reliability, code quality
Activity	Volume of actions (use carefully)	PRs merged, deployments, code reviews completed
Communication	Collaboration and knowledge flow	PR review time, documentation, onboarding speed
Efficiency	Minimal friction and interruptions	Flow state time, context switches, wait time

Flow Metrics

Flow metrics (from Mik Kersten's Project to Product) measure value stream efficiency:

Flow Time — Total time from work item creation to delivery (includes wait time)
Flow Efficiency — Active work time / total flow time × 100% (typically 15–40%)
Flow Load — Work items in progress (correlates with lead time via Little's Law)
Flow Velocity — Number of items completed per time period
Flow Distribution — Ratio of features vs defects vs debt vs risk work

Improving Each Metric

Improving Deployment Frequency

The path from monthly to daily (or multiple daily) deployments requires changes across code, process, and culture:

Trunk-based development — Short-lived branches (hours, not weeks). Merge to main daily
Feature flags — Decouple deployment from release. Deploy dark features that can be enabled independently
Automated deployments — Every merge to main triggers production deployment (with gates)
Smaller batches — Break large features into independently deployable increments
Decouple services — Independent deployability means one team's changes don't block another's

# GitOps: Automated deployment on merge to main
# ArgoCD Application watching main branch
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-service
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/my-org/payment-service
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground

Improving Lead Time

Lead time is the sum of all wait times and processing times in your pipeline. To reduce it, identify and eliminate bottlenecks:

import datetime

# Lead time breakdown analysis
def analyse_lead_time(change):
    """Break down where time is spent in the delivery pipeline."""
    stages = {
        'coding': change['pr_opened'] - change['first_commit'],
        'review_wait': change['first_review'] - change['pr_opened'],
        'review_cycles': change['approved'] - change['first_review'],
        'ci_pipeline': change['ci_complete'] - change['approved'],
        'deploy_wait': change['deploy_start'] - change['ci_complete'],
        'deployment': change['deploy_complete'] - change['deploy_start'],
    }

    total = sum(stages.values(), datetime.timedelta())
    print(f"Total Lead Time: {total}")
    print(f"\nBreakdown:")
    for stage, duration in stages.items():
        pct = duration / total * 100
        bar = '█' * int(pct / 2)
        print(f"  {stage:20s}: {str(duration):>15s} ({pct:.0f}%) {bar}")

    # Identify biggest bottleneck
    bottleneck = max(stages, key=stages.get)
    print(f"\nBottleneck: {bottleneck} ({stages[bottleneck]})")
    return stages

# Example: typical team before optimisation
sample_change = {
    'first_commit': datetime.datetime(2026, 5, 10, 9, 0),
    'pr_opened': datetime.datetime(2026, 5, 10, 11, 0),
    'first_review': datetime.datetime(2026, 5, 11, 14, 0),   # Next day!
    'approved': datetime.datetime(2026, 5, 12, 10, 0),
    'ci_complete': datetime.datetime(2026, 5, 12, 10, 45),
    'deploy_start': datetime.datetime(2026, 5, 13, 9, 0),    # Next deploy window
    'deploy_complete': datetime.datetime(2026, 5, 13, 9, 15),
}

analyse_lead_time(sample_change)

Common lead time improvements:

Reduce review wait time — Set team SLAs for PR review (e.g., first review within 4 hours)
Parallelise CI — Run tests concurrently; aim for sub-10-minute pipelines
Eliminate manual gates — Replace change advisory boards with automated policy checks
Deploy continuously — Remove "deploy windows"; deploy any time CI passes

Improving Change Failure Rate

Comprehensive automated testing — Unit, integration, contract, and end-to-end tests in CI
Canary deployments — Route 1–5% of traffic to new version; monitor before full rollout
Progressive delivery — Ring-based deployments, starting with internal users
Feature flags with kill switches — Instant disable without redeployment
Smaller batch sizes — Fewer changes per deployment = fewer failure modes

Improving MTTR

Observability — Distributed tracing, structured logging, real-time dashboards
Automated rollback — One-click (or zero-click) revert to last known good state
Runbooks — Pre-written incident response procedures for known failure modes
Incident response automation — PagerDuty/Opsgenie escalation, auto-creation of war rooms
Chaos engineering — Practice recovery regularly so it's muscle memory during real incidents

# Argo Rollouts: Automated canary with auto-rollback
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-api
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 5       # 5% traffic to canary
        - pause: {duration: 5m}
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: payment-api
        - setWeight: 25      # 25% if analysis passes
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: latency-check
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100     # Full rollout
      # Auto-rollback if any analysis fails
      rollbackWindow:
        revisions: 2
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
    - name: success-rate
      interval: 60s
      successCondition: result[0] >= 0.99
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

Anti-Patterns in Metrics

Metrics are powerful — but they can be weaponised. Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." Here's how teams game DORA metrics and why it's destructive:

Anti-Pattern	What It Looks Like	Why It's Harmful
Empty Deploys	Deploying trivial changes to inflate DF	Measures activity, not value delivery
Skipping Reviews	Auto-merging to reduce lead time	Trades quality for speed numbers
Hiding Incidents	Not reporting failures to keep CFR low	Destroys learning and trust
Premature Closure	Closing incidents before full resolution	Artificially lowers MTTR, problems recur
Individual Rankings	Using metrics to compare/rank developers	Kills collaboration, incentivises gaming
Target Fixation	Setting rigid targets ("deploy 10x/day or else")	Context-free targets drive wrong behaviour

                            
                            Critical Warning: DORA metrics are team-level and system-level metrics. They should never be used to evaluate individual developers. The moment you tie metrics to performance reviews, you incentivise gaming. Use metrics for learning and improvement, not judgement and punishment.
                        

Case Studies

Case Study

Etsy: From Monthly Releases to 50+ Deploys Per Day

In 2009, Etsy deployed once every two weeks with a dedicated "deploy day" that took 4+ hours and frequently caused outages. By 2014, they averaged 50+ deploys per day with a change failure rate below 5%. How?

Key changes: (1) Eliminated the release branch — all developers committed to trunk. (2) Built "Deployinator" — a one-click deploy tool any engineer could use. (3) Adopted feature flags for everything — deploy code dark, enable separately. (4) Invested heavily in monitoring — every deploy correlated with real-time metrics dashboards. (5) Made deploy responsibility distributed — the person who wrote the code deploys it.

Result: Deployment time dropped from 4 hours to 15 minutes. Incident rate dropped 80%. Engineer satisfaction improved dramatically because "deploy fear" disappeared. Revenue grew 40% YoY during the transformation period.

Trunk-Based Dev Feature Flags Continuous Deployment

Case Study

Capital One: DORA-Driven Transformation in Financial Services

Capital One faced a common financial services challenge: heavy regulation demanding stability, but business requirements demanding speed. They adopted DORA metrics as their transformation compass.

Approach: (1) Baseline measurement across 300+ engineering teams. (2) Identified that 80% of lead time was wait time (approvals, manual testing, deploy windows). (3) Replaced manual change advisory boards with automated policy-as-code. (4) Invested in test automation to replace manual QA gates. (5) Moved to immutable infrastructure and blue-green deployments for safe rollbacks.

Result over 3 years: Deployment frequency improved from monthly to daily for most teams. Lead time dropped from 2 weeks to under 1 day. MTTR improved from days to under 2 hours. All while maintaining regulatory compliance and passing SOC2 audits with automated evidence collection.

Financial Services Policy as Code Automated Compliance

Building a Metrics Program

Implementing DORA metrics isn't a one-day project. It's a cultural shift that requires careful introduction:

Phase 1: Baseline (Weeks 1–4)

Identify data sources (CI/CD platform, incident tracker, source control)
Instrument measurement collection (automated, not manual)
Calculate current state for all four metrics
Share results transparently with the team (no blame, just facts)

Phase 2: Understand (Weeks 5–8)

Discuss what the numbers mean with the team
Identify the biggest constraint (usually lead time or MTTR)
Map the value stream to find where time is lost
Generate improvement hypotheses collaboratively

Phase 3: Improve (Ongoing)

Pick one metric to improve first (the biggest constraint)
Run time-boxed experiments (2–4 weeks)
Measure the impact of each change
Celebrate improvements publicly
Iterate — once one metric improves, address the next constraint

{
  "metrics_program": {
    "team": "payments-squad",
    "baseline_date": "2026-04-01",
    "baseline": {
      "deployment_frequency": "2 per week",
      "lead_time_hours": 72,
      "change_failure_rate_pct": 22,
      "mttr_hours": 4.5
    },
    "current_tier": "High",
    "target_tier": "Elite",
    "focus_metric": "lead_time",
    "improvement_hypothesis": "Reducing PR review wait time from 18h to 4h will cut lead time by 40%",
    "experiment": {
      "intervention": "Implement async code review SLA: first review within 4 hours",
      "start_date": "2026-05-13",
      "duration_weeks": 4,
      "success_criteria": "Median lead time drops below 24 hours"
    },
    "review_cadence": "Bi-weekly metrics retro"
  }
}

                            
                            Best Practice: Start with team-owned dashboards, not management reports. When teams own their metrics and use them for self-improvement, adoption is organic. When metrics are imposed top-down for reporting, they become a compliance burden that teams resent and game.
                        

Exercises

                            
                            Exercise 1: Calculate DORA Metrics

                            Given the following data for a team over the past 30 days: 45 production deployments, median commit-to-deploy time of 6.5 hours, 4 deployments caused incidents, incidents resolved in 25min, 40min, 2hr, and 45min respectively. Calculate all four DORA metrics, classify the team's tier for each, and determine their overall performance category. What single improvement would have the biggest impact?

                            
                            Exercise 2: Identify Bottlenecks

                            A team has the following lead time breakdown: Coding (2h), PR wait (22h), Review (3h), CI pipeline (45min), Deploy approval (8h), Deployment (10min). Total: ~36 hours. (1) Identify the two biggest bottlenecks. (2) Propose specific interventions for each. (3) Estimate the new lead time if your interventions succeed. (4) What tier would the team move to?

                            
                            Exercise 3: Design an Improvement Plan

                            Your team is currently at "Medium" tier (weekly deploys, 2-week lead time, 25% CFR, 18-hour MTTR). Design a 12-week improvement plan targeting "High" tier. For each metric: (1) Define the target value, (2) List 2-3 specific technical or process changes, (3) Identify dependencies and risks, (4) Define how you'll measure progress weekly.

                            
                            Exercise 4: Set Up Measurement

                            Choose your team's actual CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.) and design a measurement pipeline that automatically calculates DORA metrics. Document: (1) Which events/webhooks provide the data, (2) How you'll store historical data, (3) How you'll visualise trends (tool choice + dashboard mockup), (4) How you'll handle edge cases (reverts, hotfixes, maintenance deploys).

Conclusion & Next Steps

DORA metrics give you an evidence-based compass for software delivery improvement. They tell you where you are, where you can go, and — when measured over time — whether your investments in tooling, process, and culture are actually working.

The key lessons: speed and stability are not tradeoffs — they reinforce each other. Smaller batches reduce risk rather than increasing it. Metrics are for learning, not punishment. And the gap between elite and low performers is growing — the time to invest in delivery performance is now.

Remember: you can't improve what you don't measure, but you can destroy what you measure badly. Use DORA metrics with wisdom — as a mirror for self-reflection, not a weapon for judgement.

Next in the Series

In Part 26: Reliability & Observability, we'll build on MTTR by learning how to instrument systems for deep observability — distributed tracing, SLOs, error budgets, and the practices that make sub-hour recovery possible.

Previous Part 24: Secure CI/CD Pipelines & Secret Management Next Part 26: Reliability & Observability

Cookie Consent

Part 25: DORA Metrics & Delivery Performance

Table of Contents

Introduction

The Accelerate Research

The Four Key Metrics

Deployment Frequency (DF)

Lead Time for Changes (LT)

Change Failure Rate (CFR)

Mean Time to Recovery (MTTR)

Performance Tiers

The Widening Gap

Measuring DORA Metrics

Data Sources by Metric

Tools & Dashboards

Beyond DORA

The Reliability Metric (5th DORA Metric)

The SPACE Framework

Flow Metrics

Improving Each Metric

Improving Deployment Frequency

Improving Lead Time

Improving Change Failure Rate

Improving MTTR

Anti-Patterns in Metrics

Case Studies

Etsy: From Monthly Releases to 50+ Deploys Per Day

Capital One: DORA-Driven Transformation in Financial Services

Building a Metrics Program

Phase 1: Baseline (Weeks 1–4)

Phase 2: Understand (Weeks 5–8)

Phase 3: Improve (Ongoing)

Exercises

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 25: DORA Metrics & Delivery Performance

Table of Contents

Introduction

The Accelerate Research

The Four Key Metrics

Deployment Frequency (DF)

Lead Time for Changes (LT)

Change Failure Rate (CFR)

Mean Time to Recovery (MTTR)

Performance Tiers

The Widening Gap

Measuring DORA Metrics

Data Sources by Metric

Tools & Dashboards

Beyond DORA

The Reliability Metric (5th DORA Metric)

The SPACE Framework

Flow Metrics

Improving Each Metric

Improving Deployment Frequency

Improving Lead Time

Improving Change Failure Rate

Improving MTTR

Anti-Patterns in Metrics

Case Studies

Etsy: From Monthly Releases to 50+ Deploys Per Day

Capital One: DORA-Driven Transformation in Financial Services

Building a Metrics Program

Phase 1: Baseline (Weeks 1–4)

Phase 2: Understand (Weeks 5–8)

Phase 3: Improve (Ongoing)

Exercises

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 26: Reliability & Observability

Part 1: Software Delivery Mental Models

Part 14: CI/CD Pipeline Design