Part 8: Implementation, Buy vs Build & Deployment Planning

Introduction — From Design to Code

Architecture diagrams are hypotheses. Implementation is where those hypotheses meet reality. The transition from design to code is one of the most critical phases in software delivery — decisions made here compound over months and years, creating either a clean, maintainable codebase or an unmovable ball of mud.

This article addresses the strategic decisions you face before and during implementation: Should you build this component yourself? Should you buy a commercial off-the-shelf (COTS) product? Should you adopt an open-source library? How will you deploy what you build? And how will you undo mistakes?

Why Implementation Decisions Persist

Architecture decisions can be revisited. Requirements can change. But implementation decisions have a unique property: they create inertia. Once you've built on a framework, integrated a vendor product, or established a deployment pattern, changing course carries exponentially increasing cost.

                            
                            Key Insight: Every implementation decision is a bet on the future. Build vs buy is not a one-time choice — it's a spectrum of trade-offs that must be revisited as your team, scale, and market change. The best engineering teams make these decisions explicitly, documenting the reasoning so future teams can evaluate whether the original assumptions still hold.
                        

Consider this: a team that chooses to use Auth0 for authentication in 2024 might find the decision perfectly sound. But if their user base grows from 10,000 to 10 million, the cost calculus shifts dramatically. The implementation decision that saved 3 months of development time might now cost $500,000/year in licensing fees. This is why we think about these decisions structurally, not just in the moment.

Build vs Buy vs Open Source

The build-vs-buy decision is one of the oldest and most consequential in software engineering. Get it right, and you focus your team's effort on your core differentiator. Get it wrong, and you either waste years rebuilding commodity infrastructure or lock yourself into a vendor that controls your roadmap.

The Three Options

Option	Description	Best When	Risk
Build Custom	Your team writes and maintains the component	Core differentiator, unique requirements, competitive advantage	High cost, long timeline, ongoing maintenance burden
Buy (COTS)	Purchase commercial off-the-shelf software	Commodity capability, need for support/SLA, compliance requirements	Vendor lock-in, licensing costs, limited customisation
Open Source	Adopt community-maintained software	Standard problem, active community, need for flexibility	No SLA, maintenance responsibility, security patching on you

Total Cost of Ownership (TCO)

The most common mistake in build-vs-buy is comparing build cost against license cost. This ignores the majority of the expense. True TCO includes:

Initial development/purchase cost — The obvious one
Integration cost — Connecting the solution to your existing systems
Training cost — Getting your team productive with the tool
Ongoing maintenance — Bug fixes, security patches, version upgrades
Opportunity cost — What your team can't build while maintaining this
Migration/exit cost — What it costs to switch away in 3-5 years
Scaling cost — How costs change at 10x or 100x volume

                            
                            Warning: Teams systematically underestimate the maintenance cost of custom-built software. A common ratio: for every 1 unit of effort to build, plan 4-10 units of effort to maintain over the software's lifetime. If you build it, you own it — forever.
                        

Vendor Lock-in

Vendor lock-in occurs when switching away from a product becomes prohibitively expensive. It manifests in several forms:

Data lock-in — Your data is stored in proprietary formats
API lock-in — Your code calls vendor-specific APIs with no standard equivalent
Workflow lock-in — Your processes are built around vendor-specific features
Contract lock-in — Multi-year agreements with penalty clauses
Knowledge lock-in — Your team only knows the vendor's way of doing things

Mitigation strategies include abstraction layers (wrapping vendor APIs behind your own interfaces), data export provisions in contracts, and multi-vendor architectures where feasible.

Decision Flowchart

Build vs Buy Decision Framework

flowchart TD
    A[New Capability Needed] --> B{Is this a core differentiator?}
    B -->|Yes| C{Do you have the team expertise?}
    B -->|No| D{Does a mature COTS product exist?}
    C -->|Yes| E[BUILD CUSTOM]
    C -->|No| F{Can you hire/train in time?}
    F -->|Yes| E
    F -->|No| G[Consider OSS + Custom Extensions]
    D -->|Yes| H{Is TCO acceptable at your scale?}
    D -->|No| I{Does a mature OSS project exist?}
    H -->|Yes| J[BUY COTS]
    H -->|No| I
    I -->|Yes| K{Is the community active and healthy?}
    I -->|No| E
    K -->|Yes| L[ADOPT OPEN SOURCE]
    K -->|No| E

Case Study

Spotify's Build-vs-Buy Journey

Spotify famously built its own internal developer platform ("Backstage") because no COTS product met its needs for developer experience at scale. They initially tried off-the-shelf wikis and portals but found them insufficient for 2,000+ microservices. After building Backstage internally, they open-sourced it in 2020. Today it's a CNCF project used by hundreds of companies. The lesson: sometimes "build" evolves into "open source" — you build it, others maintain it with you.

Developer Platform Open Source Build→OSS

Implementation Strategies

Once the build-vs-buy decision is made, the next question is how to implement. The two fundamental approaches — big bang and incremental — have radically different risk profiles.

Big Bang vs Incremental

Approach	Description	Risk Profile	Best For
Big Bang	Build everything, deploy all at once	High risk: all-or-nothing. Hard to debug failures	Small systems, regulatory constraints requiring atomic cutover
Incremental	Build and deploy in slices, each delivering value	Low risk per slice. Easier rollback. Early feedback	Most projects. Especially large systems and team handoffs

The Strangler Fig Pattern

Named after the strangler fig tree that grows around a host tree until the host dies, this pattern is the gold standard for legacy system replacement. Instead of rewriting the entire system (a notoriously risky approach), you:

Intercept — Place a facade/proxy in front of the legacy system
Implement — Build new functionality in the new system
Redirect — Route traffic for completed features to the new system
Retire — Once all traffic is redirected, decommission the legacy system

Strangler Fig Pattern — Progressive Migration

flowchart LR
    subgraph Phase1[Phase 1: Intercept]
        U1[Users] --> P1[Proxy/Facade]
        P1 --> L1[Legacy System]
    end
    subgraph Phase2[Phase 2: Partial Migration]
        U2[Users] --> P2[Proxy/Facade]
        P2 -->|Feature A| N2[New System]
        P2 -->|Features B,C| L2[Legacy System]
    end
    subgraph Phase3[Phase 3: Complete]
        U3[Users] --> P3[Proxy/Facade]
        P3 --> N3[New System]
    end

Feature Toggling During Implementation

Feature toggles (also called feature flags) allow you to deploy code to production without activating it. This decouples deployment from release and enables several powerful patterns:

Release toggles — Hide incomplete features behind a flag until ready
Experiment toggles — A/B test new features with a subset of users
Ops toggles — Circuit breakers to disable problematic features without redeploy
Permission toggles — Enable features for specific user segments (beta testers, enterprise tier)

// Feature toggle example — simple boolean flag
const FEATURE_FLAGS = {
    NEW_CHECKOUT_FLOW: process.env.FF_NEW_CHECKOUT === 'true',
    DARK_MODE: process.env.FF_DARK_MODE === 'true',
    AI_RECOMMENDATIONS: process.env.FF_AI_RECS === 'true'
};

// Usage in application code
function renderCheckout(cart) {
    if (FEATURE_FLAGS.NEW_CHECKOUT_FLOW) {
        return renderNewCheckout(cart);
    }
    return renderLegacyCheckout(cart);
}

console.log('Feature flags loaded:', FEATURE_FLAGS);

                            
                            Key Insight: Feature toggles are temporary by design. Every toggle should have an expiration date and an owner. Toggles that linger become "toggle debt" — invisible branches in your code that increase complexity exponentially. Set a policy: remove toggles within 2 sprints of full rollout.
                        

Coding Standards & Conventions

Coding standards are the grammar rules of a codebase. They ensure that code written by 10 different developers looks like it was written by one person. This isn't about aesthetics — it's about cognitive load. Consistent code is faster to read, review, and debug.

Why They Matter for Teams

Reduced cognitive load — Patterns become predictable; less mental energy spent parsing style
Faster code reviews — Reviewers focus on logic, not formatting debates
Easier onboarding — New team members learn one style, not five
Better tooling — Automated formatters and linters enforce consistency without human effort
Git history clarity — No noise commits that just reformat code

Linters, Formatters & Pre-commit Hooks

Modern teams enforce coding standards through automated tooling, not code review comments. The standard stack:

# JavaScript/TypeScript — ESLint + Prettier
# Install tools
npm install --save-dev eslint prettier eslint-config-prettier

# Create ESLint config
echo '{
  "extends": ["eslint:recommended", "prettier"],
  "rules": {
    "no-unused-vars": "error",
    "no-console": "warn"
  }
}' > .eslintrc.json

# Create Prettier config
echo '{
  "semi": true,
  "singleQuote": true,
  "tabWidth": 2,
  "trailingComma": "es5"
}' > .prettierrc

# Run linter
npx eslint src/ --fix
echo "Linting complete"

# Python — Black + Ruff (replaces flake8 + isort)
# Install tools
pip install black ruff

# Format code with Black (opinionated, zero config)
black src/

# Lint with Ruff (extremely fast, Rust-based)
ruff check src/ --fix

echo "Python formatting and linting complete"

# Go — gofmt is built into the language
# Go enforces formatting at the language level
# No configuration needed — there's only one style

gofmt -w .
echo "Go formatting complete (one true style)"

# Pre-commit hook setup using Husky (Node.js)
# Install Husky
npm install --save-dev husky lint-staged

# Initialize Husky
npx husky init

# Add pre-commit hook
echo 'npx lint-staged' > .husky/pre-commit

# Configure lint-staged in package.json
# "lint-staged": {
#   "*.{js,ts}": ["eslint --fix", "prettier --write"],
#   "*.{json,md}": ["prettier --write"]
# }

echo "Pre-commit hooks configured"

Technical Debt

Ward Cunningham introduced the "technical debt" metaphor in 1992: just as financial debt lets you buy something today and pay later (with interest), technical debt lets you ship faster today at the cost of slower development later. The metaphor is powerful because it reframes shortcuts as financial decisions — sometimes taking on debt is the right business choice.

Martin Fowler's Technical Debt Quadrant

Not all technical debt is equal. Martin Fowler's quadrant classifies debt along two axes: deliberate vs inadvertent and reckless vs prudent.

	Reckless	Prudent
Deliberate	"We don't have time for design" — Worst kind. Knowingly cutting corners without a plan to repay	"We must ship now and deal with consequences" — Strategic choice with awareness of cost
Inadvertent	"What's layering?" — Debt from ignorance. Team doesn't know better practices exist	"Now we know how we should have done it" — Learning debt. Gained knowledge reveals better approaches

Managing Technical Debt

Like financial debt, technical debt must be tracked, prioritised, and paid down systematically:

Make it visible — Track debt items in the backlog alongside features. Use labels/tags to categorise
Quantify the interest — How much does this debt slow us down? "This shortcut costs us 2 hours per sprint in workarounds"
Allocate capacity — Reserve 15-20% of sprint capacity for debt reduction. Some teams use "tech debt sprints" every 4th sprint
Prevent accumulation — Code review gates, automated quality checks, definition-of-done that includes "no new debt without a repayment ticket"
Know when to declare bankruptcy — Sometimes debt is so deep that incremental paydown is hopeless. A full rewrite (strangler fig style) may be cheaper

Case Study

Twitter's Fail Whale — Technical Debt at Scale

Twitter's early architecture (a monolithic Ruby on Rails application) accumulated massive technical debt as the platform grew. The "Fail Whale" error page became iconic because the system couldn't handle load. Rather than incrementally fixing the monolith, Twitter undertook a multi-year migration to a distributed JVM-based architecture. The lesson: inadvertent prudent debt (they didn't know better at the time) compounded until it nearly killed the platform. By the time they addressed it, the cost was hundreds of engineer-years.

Scaling Rewrite Debt Bankruptcy

Deployment Planning

Deployment planning should begin before the first line of code is written. How you deploy determines your architecture constraints, your testing strategy, and your ability to recover from failures. Teams that "figure out deployment later" almost always regret it.

Environment Strategy

A well-designed environment strategy provides confidence that what works in lower environments will work in production:

Environment	Purpose	Data	Who Uses It
Local/Dev	Individual developer testing	Mock/seed data	Individual developer
Integration	Verify components work together	Synthetic test data	Development team
Staging/Pre-prod	Production mirror for final validation	Anonymised production copy	QA, product, stakeholders
Production	Live system serving real users	Real user data	End users

Database Migration Strategy

Database changes are the hardest part of deployment because they're often irreversible. Key principles:

Forward-only migrations — Never edit a migration that's been applied. Create a new migration to fix issues
Backward-compatible changes — Add columns (nullable), don't rename or delete in the same deploy
Expand-contract pattern — Deploy 1: add new column. Deploy 2: migrate data and update code. Deploy 3: remove old column
Migration testing — Run migrations against a copy of production data before deploying

Rollback Mechanisms

Every deployment plan must answer: "If this goes wrong, how do we undo it in under 5 minutes?" Common rollback strategies:

Blue-green deployment — Keep the previous version running; switch a load balancer back to "blue"
Canary rollback — If the canary (small traffic slice) shows errors, stop the rollout and revert
Database rollback — For schema changes, have a reverse migration ready (tested!)
Feature flag kill switch — Disable the new feature without redeploying code
Immutable deployments — Don't update servers in place; deploy new servers and destroy old ones if needed

Deployment Flow with Rollback Decision Points

flowchart TD
    A[Deploy to Staging] --> B{Staging Tests Pass?}
    B -->|No| C[Fix & Redeploy to Staging]
    B -->|Yes| D[Deploy Canary - 5% Traffic]
    D --> E{Error Rate Normal?}
    E -->|No| F[ROLLBACK: Remove Canary]
    E -->|Yes| G[Expand to 25% Traffic]
    G --> H{Latency & Errors OK?}
    H -->|No| F
    H -->|Yes| I[Expand to 100% Traffic]
    I --> J{30-min Bake Period OK?}
    J -->|No| K[ROLLBACK: Blue-Green Switch]
    J -->|Yes| L[Deployment Complete ✓]
    F --> M[Investigate & Fix]
    K --> M

Migration & Cutover Strategies

When replacing an existing system (or upgrading a major component), the cutover strategy determines how users transition from old to new. Each approach has different risk, cost, and complexity profiles.

Approaches Compared

Strategy	Description	Risk	Cost
Parallel Running	Run old and new simultaneously, compare outputs	Lowest — fallback is always available	Highest — double infrastructure cost
Phased Rollout	Migrate users/features in batches	Medium — blast radius is controlled	Medium — both systems maintained during transition
Big Bang Cutover	Switch everyone at once (usually during maintenance window)	Highest — no partial rollback possible	Lowest — clean break, no dual maintenance

Data Migration Challenges

Data migration is where most system replacements fail. Common challenges:

Schema mismatch — Old system's data model doesn't map cleanly to the new one
Data quality issues — Legacy data contains nulls, duplicates, and inconsistencies that the new system rejects
Volume — Moving terabytes of data takes time; the system can't be offline for hours
Consistency — Users continue writing to the old system while migration runs
Rollback complexity — If migration fails halfway, both systems have partial data

The most robust approach combines CDC (Change Data Capture) with dual-write patterns: migrate historical data in bulk, then stream live changes from old to new system until ready to cut over.

Backward Compatibility

During any migration period, you must maintain backward compatibility for:

APIs — Old clients must still work. Version your APIs and maintain old versions during transition
Data formats — Messages in queues, files on disk, cached data must be readable by both old and new code
Contracts — Downstream systems that depend on your output must not break

                            
                            Rule of Thumb: Never make a breaking change and a deployment in the same step. First deploy code that handles both old and new formats. Then migrate data. Then remove support for the old format in a subsequent deploy. This "expand-contract" approach eliminates migration downtime.
                        

Exercises

Exercise 1

Build-vs-Buy Analysis

Your team needs an email notification system. You've identified three options: (a) Build a custom email service with templates, scheduling, and tracking, (b) Buy SendGrid/Mailgun at $0.001/email, or (c) Use an open-source solution like Postal. You send ~500,000 emails/month. Perform a 3-year TCO analysis for all three options, considering your team has 2 backend engineers with no email delivery expertise.

TCO Decision Analysis

Exercise 2

Technical Debt Inventory

Take a codebase you work with (or an open-source project). Identify 5 examples of technical debt. For each, classify it in Fowler's quadrant (deliberate/inadvertent × reckless/prudent), estimate the "interest" it charges per sprint, and propose a repayment plan with effort estimate.

Debt Classification Estimation

Exercise 3

Deployment Runbook

Write a deployment runbook for a web application with a PostgreSQL database. The deployment includes a schema migration (adding a new table and a foreign key to an existing table). Include: pre-deployment checks, deployment steps, verification steps, rollback procedure, and communication plan. Assume 15 minutes of acceptable downtime.

Runbook Database Migration

Exercise 4

Strangler Fig Migration Plan

You have a legacy monolithic e-commerce system handling: product catalogue, shopping cart, checkout, payments, and order history. Design a strangler fig migration plan. Which module do you migrate first and why? What does the proxy layer look like? How do you handle shared database tables? Create a 6-month timeline with phases.

Migration Planning Strangler Fig

Conclusion & Next Steps

Implementation is where the rubber meets the road. The decisions covered in this article — build vs buy, incremental vs big bang, coding standards, technical debt management, and deployment planning — form the bridge between architecture (what we want to build) and operations (what is actually running in production).

Key takeaways:

Build vs buy is not binary — It's a spectrum from fully custom to fully vendor-managed, with open source in between
TCO matters more than initial cost — Include maintenance, opportunity cost, and exit cost in every analysis
Incremental beats big bang — Almost always. The strangler fig pattern is your friend
Technical debt is a tool — Used wisely (prudent, deliberate) it accelerates delivery. Left unmanaged, it kills velocity
Deployment planning starts at design time — Not after code is written
Rollback is non-negotiable — Every deployment must have a tested rollback path

Next in the Series

In Part 9: Git & Version Control Foundations, we'll master the tool that makes all of this possible — Git. From first principles through branching, merging, rebasing, and conflict resolution, you'll build a deep understanding of the distributed version control system that underpins modern software delivery.

Previous Part 7: Modularity, Coupling & Cohesion Next Part 9: Git & Version Control Foundations

Cookie Consent

Part 8: Implementation, Buy vs Build & Deployment Planning

Table of Contents

Introduction — From Design to Code

Why Implementation Decisions Persist

Build vs Buy vs Open Source

The Three Options

Total Cost of Ownership (TCO)

Vendor Lock-in

Decision Flowchart

Spotify's Build-vs-Buy Journey

Implementation Strategies

Big Bang vs Incremental

The Strangler Fig Pattern

Feature Toggling During Implementation

Coding Standards & Conventions

Why They Matter for Teams

Linters, Formatters & Pre-commit Hooks

Technical Debt

Martin Fowler's Technical Debt Quadrant

Managing Technical Debt

Twitter's Fail Whale — Technical Debt at Scale

Deployment Planning

Environment Strategy

Database Migration Strategy

Rollback Mechanisms

Migration & Cutover Strategies

Approaches Compared

Data Migration Challenges

Backward Compatibility

Exercises

Build-vs-Buy Analysis

Technical Debt Inventory

Deployment Runbook

Strangler Fig Migration Plan

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 8: Implementation, Buy vs Build & Deployment Planning

Table of Contents

Introduction — From Design to Code

Why Implementation Decisions Persist

Build vs Buy vs Open Source

The Three Options

Total Cost of Ownership (TCO)

Vendor Lock-in

Decision Flowchart

Spotify's Build-vs-Buy Journey

Implementation Strategies

Big Bang vs Incremental

The Strangler Fig Pattern

Feature Toggling During Implementation

Coding Standards & Conventions

Why They Matter for Teams

Linters, Formatters & Pre-commit Hooks

Technical Debt

Martin Fowler's Technical Debt Quadrant

Managing Technical Debt

Twitter's Fail Whale — Technical Debt at Scale

Deployment Planning

Environment Strategy

Database Migration Strategy

Rollback Mechanisms

Migration & Cutover Strategies

Approaches Compared

Data Migration Challenges

Backward Compatibility

Exercises

Build-vs-Buy Analysis

Technical Debt Inventory

Deployment Runbook

Strangler Fig Migration Plan

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 7: Modularity, Coupling & Cohesion

Part 9: Git & Version Control Foundations

Part 6: Architecture & Design Patterns