Back to Software Engineering & Delivery Mastery Series

Part 8: Implementation, Buy vs Build & Deployment Planning

May 13, 2026 Wasil Zafar 34 min read

The moment architecture diagrams become production code. This article covers the critical decisions that determine whether your implementation will be a clean execution or a years-long technical debt trap — build vs buy, coding standards, technical debt management, and deployment planning.

Table of Contents

  1. Introduction
  2. Build vs Buy vs Open Source
  3. Implementation Strategies
  4. Coding Standards & Conventions
  5. Technical Debt
  6. Deployment Planning
  7. Migration & Cutover Strategies
  8. Exercises
  9. Conclusion & Next Steps

Introduction — From Design to Code

Architecture diagrams are hypotheses. Implementation is where those hypotheses meet reality. The transition from design to code is one of the most critical phases in software delivery — decisions made here compound over months and years, creating either a clean, maintainable codebase or an unmovable ball of mud.

This article addresses the strategic decisions you face before and during implementation: Should you build this component yourself? Should you buy a commercial off-the-shelf (COTS) product? Should you adopt an open-source library? How will you deploy what you build? And how will you undo mistakes?

Why Implementation Decisions Persist

Architecture decisions can be revisited. Requirements can change. But implementation decisions have a unique property: they create inertia. Once you've built on a framework, integrated a vendor product, or established a deployment pattern, changing course carries exponentially increasing cost.

Key Insight: Every implementation decision is a bet on the future. Build vs buy is not a one-time choice — it's a spectrum of trade-offs that must be revisited as your team, scale, and market change. The best engineering teams make these decisions explicitly, documenting the reasoning so future teams can evaluate whether the original assumptions still hold.

Consider this: a team that chooses to use Auth0 for authentication in 2024 might find the decision perfectly sound. But if their user base grows from 10,000 to 10 million, the cost calculus shifts dramatically. The implementation decision that saved 3 months of development time might now cost $500,000/year in licensing fees. This is why we think about these decisions structurally, not just in the moment.

Build vs Buy vs Open Source

The build-vs-buy decision is one of the oldest and most consequential in software engineering. Get it right, and you focus your team's effort on your core differentiator. Get it wrong, and you either waste years rebuilding commodity infrastructure or lock yourself into a vendor that controls your roadmap.

The Three Options

Option Description Best When Risk
Build Custom Your team writes and maintains the component Core differentiator, unique requirements, competitive advantage High cost, long timeline, ongoing maintenance burden
Buy (COTS) Purchase commercial off-the-shelf software Commodity capability, need for support/SLA, compliance requirements Vendor lock-in, licensing costs, limited customisation
Open Source Adopt community-maintained software Standard problem, active community, need for flexibility No SLA, maintenance responsibility, security patching on you

Total Cost of Ownership (TCO)

The most common mistake in build-vs-buy is comparing build cost against license cost. This ignores the majority of the expense. True TCO includes:

  • Initial development/purchase cost — The obvious one
  • Integration cost — Connecting the solution to your existing systems
  • Training cost — Getting your team productive with the tool
  • Ongoing maintenance — Bug fixes, security patches, version upgrades
  • Opportunity cost — What your team can't build while maintaining this
  • Migration/exit cost — What it costs to switch away in 3-5 years
  • Scaling cost — How costs change at 10x or 100x volume
Warning: Teams systematically underestimate the maintenance cost of custom-built software. A common ratio: for every 1 unit of effort to build, plan 4-10 units of effort to maintain over the software's lifetime. If you build it, you own it — forever.

Vendor Lock-in

Vendor lock-in occurs when switching away from a product becomes prohibitively expensive. It manifests in several forms:

  • Data lock-in — Your data is stored in proprietary formats
  • API lock-in — Your code calls vendor-specific APIs with no standard equivalent
  • Workflow lock-in — Your processes are built around vendor-specific features
  • Contract lock-in — Multi-year agreements with penalty clauses
  • Knowledge lock-in — Your team only knows the vendor's way of doing things

Mitigation strategies include abstraction layers (wrapping vendor APIs behind your own interfaces), data export provisions in contracts, and multi-vendor architectures where feasible.

Decision Flowchart

Build vs Buy Decision Framework
flowchart TD
    A[New Capability Needed] --> B{Is this a core differentiator?}
    B -->|Yes| C{Do you have the team expertise?}
    B -->|No| D{Does a mature COTS product exist?}
    C -->|Yes| E[BUILD CUSTOM]
    C -->|No| F{Can you hire/train in time?}
    F -->|Yes| E
    F -->|No| G[Consider OSS + Custom Extensions]
    D -->|Yes| H{Is TCO acceptable at your scale?}
    D -->|No| I{Does a mature OSS project exist?}
    H -->|Yes| J[BUY COTS]
    H -->|No| I
    I -->|Yes| K{Is the community active and healthy?}
    I -->|No| E
    K -->|Yes| L[ADOPT OPEN SOURCE]
    K -->|No| E
                            
Case Study

Spotify's Build-vs-Buy Journey

Spotify famously built its own internal developer platform ("Backstage") because no COTS product met its needs for developer experience at scale. They initially tried off-the-shelf wikis and portals but found them insufficient for 2,000+ microservices. After building Backstage internally, they open-sourced it in 2020. Today it's a CNCF project used by hundreds of companies. The lesson: sometimes "build" evolves into "open source" — you build it, others maintain it with you.

Developer Platform Open Source Build→OSS

Implementation Strategies

Once the build-vs-buy decision is made, the next question is how to implement. The two fundamental approaches — big bang and incremental — have radically different risk profiles.

Big Bang vs Incremental

Approach Description Risk Profile Best For
Big Bang Build everything, deploy all at once High risk: all-or-nothing. Hard to debug failures Small systems, regulatory constraints requiring atomic cutover
Incremental Build and deploy in slices, each delivering value Low risk per slice. Easier rollback. Early feedback Most projects. Especially large systems and team handoffs

The Strangler Fig Pattern

Named after the strangler fig tree that grows around a host tree until the host dies, this pattern is the gold standard for legacy system replacement. Instead of rewriting the entire system (a notoriously risky approach), you:

  1. Intercept — Place a facade/proxy in front of the legacy system
  2. Implement — Build new functionality in the new system
  3. Redirect — Route traffic for completed features to the new system
  4. Retire — Once all traffic is redirected, decommission the legacy system
Strangler Fig Pattern — Progressive Migration
flowchart LR
    subgraph Phase1[Phase 1: Intercept]
        U1[Users] --> P1[Proxy/Facade]
        P1 --> L1[Legacy System]
    end
    subgraph Phase2[Phase 2: Partial Migration]
        U2[Users] --> P2[Proxy/Facade]
        P2 -->|Feature A| N2[New System]
        P2 -->|Features B,C| L2[Legacy System]
    end
    subgraph Phase3[Phase 3: Complete]
        U3[Users] --> P3[Proxy/Facade]
        P3 --> N3[New System]
    end
                            

Feature Toggling During Implementation

Feature toggles (also called feature flags) allow you to deploy code to production without activating it. This decouples deployment from release and enables several powerful patterns:

  • Release toggles — Hide incomplete features behind a flag until ready
  • Experiment toggles — A/B test new features with a subset of users
  • Ops toggles — Circuit breakers to disable problematic features without redeploy
  • Permission toggles — Enable features for specific user segments (beta testers, enterprise tier)
// Feature toggle example — simple boolean flag
const FEATURE_FLAGS = {
    NEW_CHECKOUT_FLOW: process.env.FF_NEW_CHECKOUT === 'true',
    DARK_MODE: process.env.FF_DARK_MODE === 'true',
    AI_RECOMMENDATIONS: process.env.FF_AI_RECS === 'true'
};

// Usage in application code
function renderCheckout(cart) {
    if (FEATURE_FLAGS.NEW_CHECKOUT_FLOW) {
        return renderNewCheckout(cart);
    }
    return renderLegacyCheckout(cart);
}

console.log('Feature flags loaded:', FEATURE_FLAGS);
Key Insight: Feature toggles are temporary by design. Every toggle should have an expiration date and an owner. Toggles that linger become "toggle debt" — invisible branches in your code that increase complexity exponentially. Set a policy: remove toggles within 2 sprints of full rollout.

Coding Standards & Conventions

Coding standards are the grammar rules of a codebase. They ensure that code written by 10 different developers looks like it was written by one person. This isn't about aesthetics — it's about cognitive load. Consistent code is faster to read, review, and debug.

Why They Matter for Teams

  • Reduced cognitive load — Patterns become predictable; less mental energy spent parsing style
  • Faster code reviews — Reviewers focus on logic, not formatting debates
  • Easier onboarding — New team members learn one style, not five
  • Better tooling — Automated formatters and linters enforce consistency without human effort
  • Git history clarity — No noise commits that just reformat code

Linters, Formatters & Pre-commit Hooks

Modern teams enforce coding standards through automated tooling, not code review comments. The standard stack:

# JavaScript/TypeScript — ESLint + Prettier
# Install tools
npm install --save-dev eslint prettier eslint-config-prettier

# Create ESLint config
echo '{
  "extends": ["eslint:recommended", "prettier"],
  "rules": {
    "no-unused-vars": "error",
    "no-console": "warn"
  }
}' > .eslintrc.json

# Create Prettier config
echo '{
  "semi": true,
  "singleQuote": true,
  "tabWidth": 2,
  "trailingComma": "es5"
}' > .prettierrc

# Run linter
npx eslint src/ --fix
echo "Linting complete"
# Python — Black + Ruff (replaces flake8 + isort)
# Install tools
pip install black ruff

# Format code with Black (opinionated, zero config)
black src/

# Lint with Ruff (extremely fast, Rust-based)
ruff check src/ --fix

echo "Python formatting and linting complete"
# Go — gofmt is built into the language
# Go enforces formatting at the language level
# No configuration needed — there's only one style

gofmt -w .
echo "Go formatting complete (one true style)"
# Pre-commit hook setup using Husky (Node.js)
# Install Husky
npm install --save-dev husky lint-staged

# Initialize Husky
npx husky init

# Add pre-commit hook
echo 'npx lint-staged' > .husky/pre-commit

# Configure lint-staged in package.json
# "lint-staged": {
#   "*.{js,ts}": ["eslint --fix", "prettier --write"],
#   "*.{json,md}": ["prettier --write"]
# }

echo "Pre-commit hooks configured"

Technical Debt

Ward Cunningham introduced the "technical debt" metaphor in 1992: just as financial debt lets you buy something today and pay later (with interest), technical debt lets you ship faster today at the cost of slower development later. The metaphor is powerful because it reframes shortcuts as financial decisions — sometimes taking on debt is the right business choice.

Martin Fowler's Technical Debt Quadrant

Not all technical debt is equal. Martin Fowler's quadrant classifies debt along two axes: deliberate vs inadvertent and reckless vs prudent.

Reckless Prudent
Deliberate "We don't have time for design" — Worst kind. Knowingly cutting corners without a plan to repay "We must ship now and deal with consequences" — Strategic choice with awareness of cost
Inadvertent "What's layering?" — Debt from ignorance. Team doesn't know better practices exist "Now we know how we should have done it" — Learning debt. Gained knowledge reveals better approaches

Managing Technical Debt

Like financial debt, technical debt must be tracked, prioritised, and paid down systematically:

  1. Make it visible — Track debt items in the backlog alongside features. Use labels/tags to categorise
  2. Quantify the interest — How much does this debt slow us down? "This shortcut costs us 2 hours per sprint in workarounds"
  3. Allocate capacity — Reserve 15-20% of sprint capacity for debt reduction. Some teams use "tech debt sprints" every 4th sprint
  4. Prevent accumulation — Code review gates, automated quality checks, definition-of-done that includes "no new debt without a repayment ticket"
  5. Know when to declare bankruptcy — Sometimes debt is so deep that incremental paydown is hopeless. A full rewrite (strangler fig style) may be cheaper
Case Study

Twitter's Fail Whale — Technical Debt at Scale

Twitter's early architecture (a monolithic Ruby on Rails application) accumulated massive technical debt as the platform grew. The "Fail Whale" error page became iconic because the system couldn't handle load. Rather than incrementally fixing the monolith, Twitter undertook a multi-year migration to a distributed JVM-based architecture. The lesson: inadvertent prudent debt (they didn't know better at the time) compounded until it nearly killed the platform. By the time they addressed it, the cost was hundreds of engineer-years.

Scaling Rewrite Debt Bankruptcy

Deployment Planning

Deployment planning should begin before the first line of code is written. How you deploy determines your architecture constraints, your testing strategy, and your ability to recover from failures. Teams that "figure out deployment later" almost always regret it.

Environment Strategy

A well-designed environment strategy provides confidence that what works in lower environments will work in production:

Environment Purpose Data Who Uses It
Local/Dev Individual developer testing Mock/seed data Individual developer
Integration Verify components work together Synthetic test data Development team
Staging/Pre-prod Production mirror for final validation Anonymised production copy QA, product, stakeholders
Production Live system serving real users Real user data End users

Database Migration Strategy

Database changes are the hardest part of deployment because they're often irreversible. Key principles:

  • Forward-only migrations — Never edit a migration that's been applied. Create a new migration to fix issues
  • Backward-compatible changes — Add columns (nullable), don't rename or delete in the same deploy
  • Expand-contract pattern — Deploy 1: add new column. Deploy 2: migrate data and update code. Deploy 3: remove old column
  • Migration testing — Run migrations against a copy of production data before deploying

Rollback Mechanisms

Every deployment plan must answer: "If this goes wrong, how do we undo it in under 5 minutes?" Common rollback strategies:

  • Blue-green deployment — Keep the previous version running; switch a load balancer back to "blue"
  • Canary rollback — If the canary (small traffic slice) shows errors, stop the rollout and revert
  • Database rollback — For schema changes, have a reverse migration ready (tested!)
  • Feature flag kill switch — Disable the new feature without redeploying code
  • Immutable deployments — Don't update servers in place; deploy new servers and destroy old ones if needed
Deployment Flow with Rollback Decision Points
flowchart TD
    A[Deploy to Staging] --> B{Staging Tests Pass?}
    B -->|No| C[Fix & Redeploy to Staging]
    B -->|Yes| D[Deploy Canary - 5% Traffic]
    D --> E{Error Rate Normal?}
    E -->|No| F[ROLLBACK: Remove Canary]
    E -->|Yes| G[Expand to 25% Traffic]
    G --> H{Latency & Errors OK?}
    H -->|No| F
    H -->|Yes| I[Expand to 100% Traffic]
    I --> J{30-min Bake Period OK?}
    J -->|No| K[ROLLBACK: Blue-Green Switch]
    J -->|Yes| L[Deployment Complete ✓]
    F --> M[Investigate & Fix]
    K --> M
                            

Migration & Cutover Strategies

When replacing an existing system (or upgrading a major component), the cutover strategy determines how users transition from old to new. Each approach has different risk, cost, and complexity profiles.

Approaches Compared

Strategy Description Risk Cost
Parallel Running Run old and new simultaneously, compare outputs Lowest — fallback is always available Highest — double infrastructure cost
Phased Rollout Migrate users/features in batches Medium — blast radius is controlled Medium — both systems maintained during transition
Big Bang Cutover Switch everyone at once (usually during maintenance window) Highest — no partial rollback possible Lowest — clean break, no dual maintenance

Data Migration Challenges

Data migration is where most system replacements fail. Common challenges:

  • Schema mismatch — Old system's data model doesn't map cleanly to the new one
  • Data quality issues — Legacy data contains nulls, duplicates, and inconsistencies that the new system rejects
  • Volume — Moving terabytes of data takes time; the system can't be offline for hours
  • Consistency — Users continue writing to the old system while migration runs
  • Rollback complexity — If migration fails halfway, both systems have partial data

The most robust approach combines CDC (Change Data Capture) with dual-write patterns: migrate historical data in bulk, then stream live changes from old to new system until ready to cut over.

Backward Compatibility

During any migration period, you must maintain backward compatibility for:

  • APIs — Old clients must still work. Version your APIs and maintain old versions during transition
  • Data formats — Messages in queues, files on disk, cached data must be readable by both old and new code
  • Contracts — Downstream systems that depend on your output must not break
Rule of Thumb: Never make a breaking change and a deployment in the same step. First deploy code that handles both old and new formats. Then migrate data. Then remove support for the old format in a subsequent deploy. This "expand-contract" approach eliminates migration downtime.

Exercises

Exercise 1

Build-vs-Buy Analysis

Your team needs an email notification system. You've identified three options: (a) Build a custom email service with templates, scheduling, and tracking, (b) Buy SendGrid/Mailgun at $0.001/email, or (c) Use an open-source solution like Postal. You send ~500,000 emails/month. Perform a 3-year TCO analysis for all three options, considering your team has 2 backend engineers with no email delivery expertise.

TCO Decision Analysis
Exercise 2

Technical Debt Inventory

Take a codebase you work with (or an open-source project). Identify 5 examples of technical debt. For each, classify it in Fowler's quadrant (deliberate/inadvertent × reckless/prudent), estimate the "interest" it charges per sprint, and propose a repayment plan with effort estimate.

Debt Classification Estimation
Exercise 3

Deployment Runbook

Write a deployment runbook for a web application with a PostgreSQL database. The deployment includes a schema migration (adding a new table and a foreign key to an existing table). Include: pre-deployment checks, deployment steps, verification steps, rollback procedure, and communication plan. Assume 15 minutes of acceptable downtime.

Runbook Database Migration
Exercise 4

Strangler Fig Migration Plan

You have a legacy monolithic e-commerce system handling: product catalogue, shopping cart, checkout, payments, and order history. Design a strangler fig migration plan. Which module do you migrate first and why? What does the proxy layer look like? How do you handle shared database tables? Create a 6-month timeline with phases.

Migration Planning Strangler Fig

Conclusion & Next Steps

Implementation is where the rubber meets the road. The decisions covered in this article — build vs buy, incremental vs big bang, coding standards, technical debt management, and deployment planning — form the bridge between architecture (what we want to build) and operations (what is actually running in production).

Key takeaways:

  • Build vs buy is not binary — It's a spectrum from fully custom to fully vendor-managed, with open source in between
  • TCO matters more than initial cost — Include maintenance, opportunity cost, and exit cost in every analysis
  • Incremental beats big bang — Almost always. The strangler fig pattern is your friend
  • Technical debt is a tool — Used wisely (prudent, deliberate) it accelerates delivery. Left unmanaged, it kills velocity
  • Deployment planning starts at design time — Not after code is written
  • Rollback is non-negotiable — Every deployment must have a tested rollback path

Next in the Series

In Part 9: Git & Version Control Foundations, we'll master the tool that makes all of this possible — Git. From first principles through branching, merging, rebasing, and conflict resolution, you'll build a deep understanding of the distributed version control system that underpins modern software delivery.