Back to Software Engineering & Delivery Mastery Series

Part 15: CI/CD Pipeline Architecture & Optimization

May 13, 2026 Wasil Zafar 44 min read

Pipelines are the backbone of delivery. This article teaches you to design DAG-based architectures, parallelise ruthlessly, build reusable templates, and optimise until your feedback loop runs in under 10 minutes — at any scale.

Table of Contents

  1. Introduction
  2. Pipeline-as-Code
  3. Pipeline Anatomy
  4. DAG-Based Pipelines
  5. Parallelization Strategies
  6. Caching & Artifact Sharing
  7. Pipeline Templates & Reusability
  8. Environment Promotion
  9. Monorepo CI
  10. Pipeline Optimization
  11. Pipeline Security
  12. Pipeline Observability
  13. Exercises
  14. Conclusion & Next Steps

Introduction — Why Pipeline Architecture Matters

A CI/CD pipeline is more than a script that runs tests. It is the assembly line of modern software delivery — a carefully designed system that transforms source code into production-ready artifacts while providing fast, reliable feedback to developers.

The difference between a well-architected pipeline and a naive one is dramatic:

Characteristic Naive Pipeline Well-Architected Pipeline
Duration 25–45 minutes 5–10 minutes
Execution Sequential (one step after another) Parallel (independent jobs run simultaneously)
Caching None (downloads everything every run) Aggressive (dependencies, layers, artifacts)
Reusability Copy-paste between repos Shared templates and components
Cost $200+/dev/month $30–60/dev/month
Feedback quality All-or-nothing (pass/fail after 30 min) Progressive (lint errors in 30s, test failures in 3 min)

Impact on Developer Productivity

Research from the DORA team (Google) and studies by Abi Noda (DX) consistently show that CI pipeline speed is one of the strongest predictors of developer satisfaction and team performance. Every minute saved in the pipeline compounds across dozens of builds per developer per day, hundreds of developers, and thousands of days per year.

The Compound Effect: A team of 30 developers, each triggering 4 builds per day. Reducing pipeline time from 20 minutes to 8 minutes saves 12 minutes × 4 builds × 30 devs = 24 hours of wait time per day. Over a year, that is 6,000+ engineering hours reclaimed — equivalent to 3 full-time engineers.

Pipeline-as-Code

The first architectural decision is where the pipeline definition lives. The modern answer is unambiguous: the pipeline configuration lives in the repository, alongside the application code it builds.

Why Pipelines Live in the Repo

  • Version control — pipeline changes are tracked in Git history, reviewed in PRs, and can be rolled back
  • Branch-specific — different branches can have different pipelines (a release branch may include signing steps)
  • Self-documenting — new developers read the pipeline to understand how the project is built and deployed
  • Portable — the repo carries its own build instructions; clone and it works
  • Testable — pipeline changes go through the same CI process as application code

Configuration Formats

Platform File Format Language
GitHub Actions .github/workflows/*.yml YAML Declarative + expressions
GitLab CI .gitlab-ci.yml YAML Declarative + rules
Azure Pipelines azure-pipelines.yml YAML Declarative + templates
Jenkins Jenkinsfile Groovy Imperative (scripted) or declarative
CircleCI .circleci/config.yml YAML Declarative + orbs
Dagger dagger.json + code Go/Python/TS Imperative (real programming language)

The industry has largely converged on YAML for pipeline configuration. While YAML has limitations (no IDE autocomplete without schemas, whitespace sensitivity, limited expressiveness), its widespread adoption means extensive documentation and community examples.

Pipeline Anatomy — Stages, Jobs, Steps

Every CI/CD pipeline is a hierarchy of three structural levels:

Pipeline Structural Hierarchy
flowchart TB
    subgraph Pipeline["Pipeline / Workflow"]
        subgraph Stage1["Stage: Build & Test"]
            subgraph Job1["Job: Lint"]
                S1[Step: Checkout]
                S2[Step: Install deps]
                S3[Step: Run ESLint]
            end
            subgraph Job2["Job: Unit Tests"]
                S4[Step: Checkout]
                S5[Step: Install deps]
                S6[Step: Run Jest]
                S7[Step: Upload coverage]
            end
        end
        subgraph Stage2["Stage: Package"]
            subgraph Job3["Job: Docker Build"]
                S8[Step: Checkout]
                S9[Step: Build image]
                S10[Step: Push to registry]
            end
        end
    end
                            
  • Pipeline/Workflow — the top-level container, triggered by an event (push, PR, schedule). Contains one or more stages.
  • Stage — a logical grouping representing a phase (build, test, deploy). Stages run sequentially by default.
  • Job — a unit of work that runs on a single agent. Jobs within a stage can run in parallel.
  • Step — a single command or action within a job. Steps always run sequentially within their job.

Sequential vs Parallel Execution

The critical insight: jobs are the unit of parallelism. Each job runs on its own agent (machine/container), so multiple jobs execute simultaneously. Steps within a job always run sequentially because they share the same filesystem and process space.

# Two jobs running in parallel (no dependency between them)
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run lint

  type-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx tsc --noEmit

  # This job waits for both lint and type-check to pass
  test:
    runs-on: ubuntu-latest
    needs: [lint, type-check]   # DAG dependency
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test

DAG-Based Pipelines

A Directed Acyclic Graph (DAG) is a graph where edges have direction and no cycles exist — you cannot follow edges and return to where you started. This is the ideal model for pipeline execution because:

  • Each node (job) has explicit dependencies on other nodes
  • Jobs with no dependencies between them execute in parallel
  • The scheduler computes the critical path — the longest chain of sequential dependencies
  • Total pipeline time equals the critical path length, not the sum of all jobs
DAG Pipeline: Maximum Parallelism
flowchart LR
    A[Checkout] --> B[Lint]
    A --> C[Type Check]
    A --> D[Security Scan]
    B --> E[Unit Tests]
    C --> E
    D --> F[Integration Tests]
    E --> G[Build Docker Image]
    F --> G
    G --> H[Push to Registry]
    H --> I[Deploy to Staging]
    I --> J[Smoke Tests]
    J --> K[Deploy to Production]
                            

In this DAG, Lint, Type Check, and Security Scan all run in parallel (they depend only on Checkout). Unit Tests waits for both Lint and Type Check. Integration Tests waits only for Security Scan. The Build step waits for both test types to complete.

DAG Syntax Across Platforms

# GitHub Actions: "needs" keyword defines DAG edges
jobs:
  lint:
    runs-on: ubuntu-latest
    steps: [...]
  
  type-check:
    runs-on: ubuntu-latest
    steps: [...]
  
  unit-tests:
    needs: [lint, type-check]    # Waits for both
    runs-on: ubuntu-latest
    steps: [...]
  
  integration-tests:
    needs: [lint]                 # Only waits for lint
    runs-on: ubuntu-latest
    steps: [...]
  
  build:
    needs: [unit-tests, integration-tests]  # Waits for both test jobs
    runs-on: ubuntu-latest
    steps: [...]
# GitLab CI: "needs" keyword (similar to GitHub)
stages:
  - validate
  - test
  - build

lint:
  stage: validate
  script: npm run lint

type-check:
  stage: validate
  script: npx tsc --noEmit

unit-tests:
  stage: test
  needs: [lint, type-check]
  script: npm test

integration-tests:
  stage: test
  needs: [lint]
  script: npm run test:integration

build:
  stage: build
  needs: [unit-tests, integration-tests]
  script: docker build -t app .
Critical Path Analysis: The total pipeline time is determined by the longest path through the DAG. If lint takes 1 min, type-check takes 2 min, unit tests take 4 min, and build takes 3 min, the critical path is: Checkout → Type Check → Unit Tests → Build = 9 min. The lint and security scan complete in the background without adding to total time. Optimise the critical path first — everything else is free parallelism.

Parallelization Strategies

Beyond the DAG structure (which parallelises different jobs), you can parallelise within a single job. The two primary techniques are test splitting and matrix builds.

Test Splitting

When a test suite takes 15 minutes on a single agent, split it across multiple agents to achieve near-linear speedup:

# CircleCI: Time-based test splitting across 4 containers
test:
  parallelism: 4
  steps:
    - checkout
    - run:
        name: Run tests (split by timing data)
        command: |
          TESTS=$(circleci tests glob "src/**/*.test.ts" | \
                  circleci tests split --split-by=timings)
          npx jest $TESTS

Time-based splitting uses historical execution data to distribute tests evenly. If Test A takes 10s and Test B takes 60s, they go to different containers — ensuring each container finishes at roughly the same time.

Matrix Builds

Matrix builds test your code across multiple combinations of parameters — OS versions, language versions, dependency versions — in parallel:

# GitHub Actions: Matrix strategy
jobs:
  test:
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        node-version: [18, 20, 22]
      fail-fast: false    # Don't cancel other jobs if one fails
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm ci
      - run: npm test

This configuration spawns 9 parallel jobs (3 OS × 3 Node versions). Each combination runs independently, providing comprehensive compatibility testing without increasing total wall-clock time.

Fan-Out/Fan-In Pattern

The fan-out/fan-in pattern splits work across parallel jobs (fan-out), then aggregates results in a single downstream job (fan-in):

Fan-Out/Fan-In: Parallel Test Execution
flowchart LR
    A[Build] --> B[Test Shard 1]
    A --> C[Test Shard 2]
    A --> D[Test Shard 3]
    A --> E[Test Shard 4]
    B --> F[Aggregate Results]
    C --> F
    D --> F
    E --> F
    F --> G[Report Coverage]
    G --> H[Deploy]
                            
# Fan-out: 4 parallel test shards
jobs:
  test-shard:
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: |
          npx jest --shard=${{ matrix.shard }}/4 --coverage \
            --coverageDirectory=coverage-${{ matrix.shard }}
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.shard }}
          path: coverage-${{ matrix.shard }}/

  # Fan-in: merge coverage from all shards
  merge-coverage:
    needs: [test-shard]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: coverage-*
          merge-multiple: true
      - run: npx istanbul-merge --out combined.json coverage-*/coverage-final.json
      - run: npx istanbul report --include combined.json text-summary

Caching & Artifact Sharing

Caching and artifacts serve different purposes but both eliminate redundant work:

Mechanism Purpose Lifetime Example
Cache Reuse data between pipeline runs Days to weeks (invalidated by key change) node_modules/, Docker layers
Artifact Pass data between jobs in one run Duration of the pipeline run Build output, test reports, coverage

Advanced Caching Patterns

# Multi-layer cache strategy
- name: Cache node_modules
  uses: actions/cache@v4
  with:
    path: node_modules
    key: deps-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      deps-${{ runner.os }}-

- name: Cache Next.js build
  uses: actions/cache@v4
  with:
    path: .next/cache
    key: next-${{ runner.os }}-${{ hashFiles('**/*.ts', '**/*.tsx') }}
    restore-keys: |
      next-${{ runner.os }}-

- name: Cache Playwright browsers
  uses: actions/cache@v4
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

Docker Layer Caching in CI

# BuildKit cache with GitHub Actions cache backend
- name: Build and push Docker image
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/org/app:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

The type=gha cache backend stores Docker build layers in GitHub's cache infrastructure. On subsequent builds, only layers whose inputs changed are rebuilt — often reducing a 5-minute Docker build to 30 seconds.

Pipeline Templates & Reusability

When an organisation has dozens or hundreds of repositories, copy-pasting pipeline configuration creates maintenance nightmares. Pipeline templates provide shared, versioned, reusable pipeline logic.

GitHub Actions: Reusable Workflows

# .github/workflows/reusable-node-ci.yml (in a shared repo)
name: Reusable Node.js CI

on:
  workflow_call:
    inputs:
      node-version:
        description: 'Node.js version'
        required: false
        default: '20'
        type: string
      run-e2e:
        description: 'Run E2E tests'
        required: false
        default: false
        type: boolean

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ inputs.node-version }}
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm test
      - if: inputs.run-e2e
        run: npm run test:e2e
# Consuming the reusable workflow (in any repo)
name: CI
on: [push, pull_request]

jobs:
  ci:
    uses: org/shared-workflows/.github/workflows/reusable-node-ci.yml@v2
    with:
      node-version: '22'
      run-e2e: true

GitLab CI: Components (CI/CD Catalog)

# Using a shared component from the CI/CD Catalog
include:
  - component: gitlab.com/org/ci-components/node-test@1.2.0
    inputs:
      node_version: "20"
      coverage_threshold: 80

Jenkins: Shared Libraries

// vars/standardPipeline.groovy (in shared library repo)
def call(Map config = [:]) {
    pipeline {
        agent { docker { image "node:${config.nodeVersion ?: '20'}-alpine" } }
        stages {
            stage('Install') { steps { sh 'npm ci' } }
            stage('Lint')    { steps { sh 'npm run lint' } }
            stage('Test')    { steps { sh 'npm test' } }
            stage('Build')   { steps { sh 'npm run build' } }
        }
    }
}
// Jenkinsfile (in application repo)
@Library('shared-pipelines') _
standardPipeline(nodeVersion: '22')
Case Study

Spotify's Golden Paths

Spotify's platform engineering team created "Golden Paths" — opinionated, pre-configured pipeline templates for common service types (Node.js microservice, Python ML model, Java backend). Teams that adopt a Golden Path get a fully configured CI/CD pipeline, deployment to Kubernetes, observability, and security scanning out of the box. Compliance with organisational standards is automatic because the template encodes those standards. Teams can "eject" and customise if needed, but 85% never do. This reduced new service onboarding from 2 weeks to 15 minutes.

Platform Engineering Golden Path Templates

Environment Promotion

In a mature delivery pipeline, artifacts are promoted through a series of environments, gaining confidence at each stage. The same artifact (same Docker image, same binary) moves from dev → staging → production — never rebuilt.

Environment Promotion Pipeline
flowchart LR
    A[Build & Test] --> B[Dev Environment]
    B --> C{Automated\nSmoke Tests}
    C -->|Pass| D[Staging Environment]
    D --> E{Integration\nTests + QA}
    E -->|Pass| F[Approval Gate]
    F -->|Approved| G[Production\nCanary 5%]
    G --> H{Metrics\nHealthy?}
    H -->|Yes| I[Production 100%]
    H -->|No| J[Rollback]
                            

Key Principles

  • Build once, deploy many — the artifact is immutable; only configuration changes per environment
  • Promotion, not rebuilding — staging uses the exact image that will run in production
  • Approval gates — human or automated checks between critical environments
  • Environment parity — staging mirrors production as closely as possible
# GitHub Actions: Environment with approval gate
jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - run: kubectl set image deployment/app app=ghcr.io/org/app:${{ github.sha }}

  deploy-production:
    needs: [deploy-staging]
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://app.example.com
    steps:
      - run: kubectl set image deployment/app app=ghcr.io/org/app:${{ github.sha }}

Monorepo CI

Monorepos — single repositories containing multiple services or packages — present unique CI challenges. The naive approach (run all tests for every change) does not scale. Smart monorepo CI uses change detection to build only what is affected.

Path-Based Filtering

# GitHub Actions: only run backend CI when backend files change
name: Backend CI
on:
  push:
    paths:
      - 'services/backend/**'
      - 'packages/shared-utils/**'    # shared dependency
      - 'package-lock.json'           # dependency changes
    paths-ignore:
      - 'services/backend/README.md'

Affected Package Detection

# Using Nx to detect affected projects
jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      affected: ${{ steps.affected.outputs.projects }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0    # Full history for diff
      - run: npm ci
      - id: affected
        run: |
          AFFECTED=$(npx nx show projects --affected --base=origin/main --json)
          echo "projects=$AFFECTED" >> $GITHUB_OUTPUT

  build:
    needs: [detect-changes]
    if: needs.detect-changes.outputs.affected != '[]'
    strategy:
      matrix:
        project: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx nx build ${{ matrix.project }}
      - run: npx nx test ${{ matrix.project }}

Monorepo Build Tools Comparison

Tool Language Affected Detection Remote Cache Best For
Nx TypeScript/JS Dependency graph + Git diff Nx Cloud (free tier) JS/TS monorepos
Turborepo TypeScript/JS Package dependency + file hashing Vercel Remote Cache Simpler JS monorepos
Bazel Any (polyglot) Fine-grained target graph Remote execution + cache Large polyglot monorepos
Pants Python/Go/Java Dependency inference Remote cache Python-heavy monorepos
Remote Cache: Monorepo tools like Nx and Bazel support remote caching — when Developer A builds a package, the result is cached remotely. When Developer B (or CI) builds the same package with the same inputs, the cached result is downloaded instead of rebuilt. This can reduce CI times by 80%+ in large monorepos.

Pipeline Optimization — Sub-10-Minute Target

The goal: every developer gets feedback within 10 minutes of pushing. Here is a systematic approach to achieving this, applied to a real pipeline that started at 28 minutes.

Before: Sequential Pipeline (28 minutes)

# BEFORE: Everything sequential, no caching
jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4           # 15s
      - run: npm install                     # 4 min (no lock file, no cache)
      - run: npm run lint                    # 2 min
      - run: npm run type-check             # 1.5 min
      - run: npm test                        # 8 min
      - run: npm run test:integration       # 6 min
      - run: npm run test:e2e              # 4 min
      - run: docker build -t app .          # 3 min
      # Total: ~28 minutes

After: Optimised DAG Pipeline (7 minutes)

# AFTER: Parallel DAG, aggressive caching, test splitting
jobs:
  lint:                                      # 45s (cached deps)
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run lint && npm run type-check   # Combined: 1.5 min

  unit-tests:                                # 2.5 min (4 shards × 2 min each)
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npx jest --shard=${{ matrix.shard }}/4

  integration-tests:                         # 3 min (cached, parallel)
    runs-on: ubuntu-latest
    services:
      postgres: { image: 'postgres:16' }
      redis: { image: 'redis:7' }
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run test:integration

  build:                                     # 45s (Docker layer cache)
    needs: [lint, unit-tests, integration-tests]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/build-push-action@v5
        with:
          cache-from: type=gha
          cache-to: type=gha,mode=max
          tags: app:${{ github.sha }}
  # Critical path: max(lint, unit-tests, integration-tests) + build
  # = max(1.5min, 2.5min, 3min) + 0.75min = ~4 min
  # With overhead: ~7 min total

Optimization Techniques Summary

Technique Typical Saving Effort Applies To
Dependency caching 2–5 min Low (5 lines of YAML) Every pipeline
DAG parallelisation 30–60% of total time Medium (restructure jobs) Pipelines with independent stages
Test splitting/sharding Near-linear speedup Medium (test runner support needed) Test suites >5 min
Docker layer caching 2–4 min per build Low (BuildKit + cache flags) Container builds
Incremental builds 50–90% of build time High (build tool support needed) Compiled languages, monorepos
Test selection 60–80% of test time High (coverage mapping needed) Large test suites with good coverage data
Larger runners 20–40% (more CPU/RAM) Low (change runner label) CPU-bound builds, parallel tests
Case Study

Stripe's Pipeline Optimization Journey

Stripe's Ruby monorepo contained 20+ years of code and a test suite that took over 2 hours to run sequentially. Their CI team achieved a 12-minute P50 feedback time through aggressive parallelisation (200+ test shards), intelligent test selection (only running tests affected by changed files using coverage data), and remote build caching. The key insight: they measured the developer wait time (when a developer is blocked on CI), not just the total CI compute time. A 2-hour test suite split across 200 runners completes in under 10 minutes — the compute cost is high but developer time is more valuable.

Test Splitting Monorepo At Scale

Pipeline Security

Pipelines are high-value targets — they have access to production credentials, signing keys, and deployment permissions. A compromised pipeline can deploy malicious code or exfiltrate secrets.

Key Security Principles

  • OIDC tokens over static secrets — use short-lived, automatically-rotated tokens for cloud provider authentication
  • Least-privilege runners — CI agents should only have permissions they need (no admin access)
  • Pinned actions/images — reference actions by SHA, not mutable tags, to prevent supply chain attacks
  • Secret scanning — detect accidentally committed credentials before they reach production
  • Signed pipelines — verify pipeline configuration has not been tampered with
# OIDC authentication to AWS (no static credentials)
jobs:
  deploy:
    permissions:
      id-token: write       # Required for OIDC
      contents: read
    runs-on: ubuntu-latest
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/ci-deploy
          aws-region: us-east-1
          # No access key or secret key — uses GitHub OIDC token
      - run: aws ecs update-service --cluster prod --service app --force-new-deployment
# Pin actions to commit SHA (not mutable tag)
steps:
  # BAD: tag can be moved to a compromised version
  - uses: actions/checkout@v4

  # GOOD: pinned to exact commit (immutable)
  - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
Supply Chain Risk: In 2022, the codecov/codecov-action GitHub Action was compromised — attackers modified the action to exfiltrate environment variables (including CI secrets) from repositories using it. Pinning actions to SHA hashes prevents this class of attack because the hash changes if the action code is modified. This is covered in depth in Part 24 (Security).

Pipeline Observability

Just as production systems need observability, CI/CD pipelines need monitoring, metrics, and alerting. Without observability, pipeline degradation goes unnoticed until developers complain.

Key Pipeline Metrics

  • P50/P95 pipeline duration — median and tail latency of the feedback loop
  • Queue depth — number of jobs waiting for a runner
  • Failure rate by job — which jobs fail most often (flakiness indicators)
  • Re-run rate — how often developers retry failed pipelines (high = flakiness)
  • Cost per pipeline run — cloud CI billing broken down by workflow
  • Cache hit rate — percentage of cache lookups that find usable cached data

Observability Tools

Tool Integration Key Features
Datadog CI Visibility GitHub Actions, GitLab, Jenkins Pipeline traces, test performance, flaky test detection
Grafana + Prometheus Any (via exporters) Custom dashboards, alerting rules, long-term trends
BuildPulse GitHub Actions, CircleCI Flaky test detection and quarantine
Swarmia / LinearB GitHub, GitLab DORA metrics, cycle time, CI wait time
# Alert when P95 pipeline duration exceeds threshold
# (Prometheus alerting rule)
groups:
  - name: ci-alerts
    rules:
      - alert: CIPipelineSlow
        expr: |
          histogram_quantile(0.95,
            rate(ci_pipeline_duration_seconds_bucket{workflow="ci"}[1h])
          ) > 900
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "CI pipeline P95 exceeds 15 minutes"
          description: "The CI pipeline is degrading. Current P95: {{ $value }}s"

Exercises

Exercise 1

Design a Pipeline DAG

You are building CI for a full-stack application with: a React frontend, a Node.js API, a PostgreSQL database, and end-to-end tests that need all three running. Design a DAG that maximises parallelism. Draw the graph (Mermaid or whiteboard), label each node with estimated duration, and calculate the critical path. How much faster is your DAG compared to a sequential pipeline?

Architecture DAG Design
Exercise 2

Optimize a Slow Pipeline

Given this sequential pipeline: Install (3 min) → Lint (2 min) → Unit Tests (10 min) → Integration Tests (8 min) → E2E Tests (5 min) → Build (2 min) → Push (1 min). Total: 31 minutes. Apply the optimization techniques from this article (caching, parallelisation, test splitting, DAG structure) to bring it under 10 minutes. Show your work: which techniques apply, estimated time savings, and the resulting DAG.

Optimization Practical
Exercise 3

Write a Matrix Build

Create a GitHub Actions workflow that tests a Python library across: Python 3.10, 3.11, and 3.12; Ubuntu and Windows; with and without optional dependencies (numpy). This produces 12 combinations (3 × 2 × 2). Include: (a) the matrix configuration, (b) conditional installation of optional deps, (c) a fail-fast: false strategy, and (d) a summary job that runs after all matrix jobs complete.

Matrix GitHub Actions
Exercise 4

Create a Reusable Pipeline Template

Design a reusable GitHub Actions workflow for your organisation's standard Node.js microservice. It should accept inputs for: Node version, whether to run E2E tests, whether to build a Docker image, and the target environment. The workflow should implement: caching, linting, testing, optional Docker build, and artifact upload. Write both the reusable workflow file and an example caller workflow.

Templates Reusability

Conclusion & Next Steps

Pipeline architecture is where CI/CD theory meets engineering practice. The concepts in this article — DAG execution, parallelisation, caching, templates, monorepo detection — are what separate a 30-minute sequential pipeline from a 7-minute optimised one.

Key takeaways:

  • Pipeline-as-code means your CI configuration is versioned, reviewed, and portable — just like application code
  • DAG execution is the foundation of parallelism — independent jobs should always run simultaneously
  • The critical path determines wall-clock time — optimise the longest chain first, everything else is free
  • Matrix builds provide comprehensive coverage without proportional time increase
  • Caching is mandatory — dependencies, build artifacts, and Docker layers should all be cached
  • Templates prevent drift — shared workflows ensure consistency across dozens of repositories
  • Environment promotion ensures the same artifact that passed tests reaches production
  • Monorepo CI requires intelligent change detection — never build everything on every change
  • Security is non-negotiable — OIDC, pinned actions, and least-privilege runners protect the pipeline
  • Observe your pipelines — pipeline metrics are as important as production metrics

Next in the Series

In Part 16: Deployment Strategies, we move from building artifacts to putting them into production safely — blue-green deployments, canary releases, rolling updates, feature flags, and the progressive delivery patterns that make zero-downtime deployments routine.