Part 8: Infrastructure as Code

The IaC Revolution

Imagine you need to build an identical copy of your production environment for a new development team. With traditional infrastructure management, this might take weeks of manual work — logging into consoles, clicking through wizards, running commands from memory, and praying you didn't miss a security group rule. With Infrastructure as Code, you run a single command and have an identical environment in minutes.

                            
                            Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes or interactive tools. It applies software engineering practices — version control, code review, testing, and CI/CD — to infrastructure management.
                        

Why IaC Exists: The Problems It Solves

Before IaC, infrastructure was managed through a combination of manual processes that introduced serious problems:

Problems with Manual Infrastructure Management

flowchart TD
    A[Manual Infrastructure] --> B[Snowflake Servers]
    A --> C[Configuration Drift]
    A --> D[Tribal Knowledge]
    A --> E[Slow Provisioning]
    A --> F[No Audit Trail]
    B --> G[Every server is unique]
    C --> H[Environments diverge over time]
    D --> I[Only one person knows how it works]
    E --> J[Days/weeks to provision]
    F --> K[Who changed what and when?]
    style A fill:#BF092F,color:#fff
    style B fill:#132440,color:#fff
    style C fill:#132440,color:#fff
    style D fill:#132440,color:#fff
    style E fill:#132440,color:#fff
    style F fill:#132440,color:#fff

                            
                            Snowflake Servers: When infrastructure is configured manually, each server becomes unique — like a snowflake. Reproducing one is nearly impossible because nobody documented every patch, tweak, and hotfix applied over months or years.
                        

Problem	Manual Approach	IaC Solution
Reproducibility	Follow a wiki/runbook (often outdated)	Run code — same result every time
Consistency	Humans make mistakes	Code is deterministic
Speed	Hours/days per environment	Minutes per environment
Documentation	Separate docs (drift from reality)	Code IS the documentation
Collaboration	Screen sharing, tribal knowledge	Code reviews, pull requests
Disaster Recovery	Rebuild from memory/backups	Re-apply code to new region
Audit Trail	Manual change logs (if any)	Git history shows every change

Key Benefits of IaC

IaC Benefits Ecosystem

mindmap
  root((IaC Benefits))
    Version Control
      Git history
      Branching
      Code review
      Rollback
    Reproducibility
      Identical environments
      Disaster recovery
      Multi-region
      Dev/Staging/Prod parity
    Automation
      CI/CD pipelines
      Self-service
      Scheduled deployments
      Auto-scaling rules
    Collaboration
      Pull requests
      Team ownership
      Knowledge sharing
      Onboarding
    Testing
      Validation
      Linting
      Security scanning
      Cost estimation

                            
                            The Source Code Analogy: Just as application source code defines how your software behaves, IaC defines how your infrastructure behaves. You write it, review it, test it, version it, and deploy it through automated pipelines — exactly like application code.
                        

Declarative vs Imperative

The most fundamental decision in IaC is choosing between two paradigms: declarative (describe the desired end state) and imperative (describe the steps to reach that state). This distinction shapes everything from how you think about infrastructure to which tools you choose.

Declarative: Describe WHAT You Want

In the declarative approach, you define the desired state of your infrastructure. The tool figures out how to make reality match that state. You say "I want 3 web servers behind a load balancer" without specifying the exact sequence of API calls.

# Declarative (Terraform HCL) — WHAT you want
# The tool determines HOW to create/modify/delete resources

resource "aws_instance" "web" {
  count         = 3
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server-${count.index + 1}"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_lb" "web" {
  name               = "web-load-balancer"
  internal           = false
  load_balancer_type = "application"
  subnets            = var.public_subnet_ids
}

Imperative: Describe HOW to Get There

In the imperative approach, you write the exact sequence of steps (commands, API calls) that should execute. You have full control over ordering and logic but must handle edge cases yourself.

#!/bin/bash
# Imperative (Bash script) — HOW to do it
# You specify every step and handle edge cases

# Step 1: Create instances
for i in 1 2 3; do
  INSTANCE_ID=$(aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type t3.micro \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=web-server-$i}]" \
    --query 'Instances[0].InstanceId' \
    --output text)
  echo "Created instance: $INSTANCE_ID"
  INSTANCE_IDS+=($INSTANCE_ID)
done

# Step 2: Wait for instances to be running
aws ec2 wait instance-running --instance-ids ${INSTANCE_IDS[@]}

# Step 3: Create load balancer
LB_ARN=$(aws elbv2 create-load-balancer \
  --name web-load-balancer \
  --subnets subnet-abc123 subnet-def456 \
  --type application \
  --query 'LoadBalancers[0].LoadBalancerArn' \
  --output text)
echo "Created load balancer: $LB_ARN"

Declarative vs Imperative Workflow

flowchart LR
    subgraph Declarative
        D1[Define Desired State] --> D2[Tool Reads Current State]
        D2 --> D3[Tool Calculates Diff]
        D3 --> D4[Tool Applies Changes]
        D4 --> D5[Desired = Current]
    end
    subgraph Imperative
        I1[Write Step 1] --> I2[Write Step 2]
        I2 --> I3[Write Step 3]
        I3 --> I4[Handle Errors]
        I4 --> I5[Hope State is Correct]
    end
    style D5 fill:#3B9797,color:#fff
    style I5 fill:#BF092F,color:#fff

When to Use Each Approach

Aspect	Declarative	Imperative
You specify	Desired end state	Step-by-step instructions
Idempotent?	Yes (by design)	Must be manually ensured
Ordering	Tool determines order	You control order
Drift handling	Automatic (re-converge)	Must detect and fix yourself
Learning curve	Domain-specific language	Familiar scripting languages
Flexibility	Limited to tool’s capabilities	Unlimited (any logic)
Best for	Cloud resources, repeatable infra	Complex orchestration, migrations
Examples	Terraform, CloudFormation, Bicep, K8s YAML	Ansible, Bash, Pulumi, Python scripts

                            
                            Real-world insight: Most mature organizations use both paradigms. Declarative for provisioning cloud resources (VPCs, databases, load balancers) and imperative for complex orchestration tasks (database migrations, rolling deployments, one-time scripts).
                        

The IaC Landscape

The IaC ecosystem has matured significantly, with specialized tools for different use cases. Understanding the landscape helps you choose the right tool for your situation.

Terraform (HashiCorp)

Terraform is the most widely adopted multi-cloud IaC tool. It uses HashiCorp Configuration Language (HCL) — a declarative language designed specifically for infrastructure definition.

# Example: Terraform configuration for AWS VPC
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "main-vpc"
    Environment = "production"
  }
}

AWS CloudFormation

CloudFormation is AWS's native IaC service. It uses JSON or YAML templates with deep integration into AWS services, offering features like change sets, drift detection, and StackSets for multi-account deployment.

# Example: CloudFormation YAML template
AWSTemplateFormatVersion: '2010-09-09'
Description: Simple VPC with public subnet

Resources:
  MainVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: main-vpc
        - Key: Environment
          Value: production

  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref MainVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: public-subnet-1

Outputs:
  VpcId:
    Description: VPC ID
    Value: !Ref MainVPC
    Export:
      Name: MainVpcId

Azure Bicep

Bicep is Azure's domain-specific language for deploying Azure resources. It compiles down to ARM (Azure Resource Manager) templates but with a much cleaner syntax.

// Example: Azure Bicep for a Storage Account
@description('Location for all resources')
param location string = resourceGroup().location

@description('Storage account name')
param storageAccountName string = 'st${uniqueString(resourceGroup().id)}'

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    accessTier: 'Hot'
  }
}

output storageAccountId string = storageAccount.id
output primaryEndpoint string = storageAccount.properties.primaryEndpoints.blob

Pulumi

Pulumi takes a different approach — you write IaC using general-purpose programming languages (Python, TypeScript, Go, C#). This gives you full access to loops, conditionals, type systems, and testing frameworks.

# Example: Pulumi with Python (conceptual)
# Install: pip install pulumi pulumi-aws
# Initialize: pulumi new aws-python

# __main__.py
import pulumi
import pulumi_aws as aws

# Create a VPC using familiar Python
vpc = aws.ec2.Vpc("main-vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    tags={"Name": "main-vpc", "Environment": "production"}
)

# Use Python loops for multiple subnets
subnets = []
for i, az in enumerate(["us-east-1a", "us-east-1b", "us-east-1c"]):
    subnet = aws.ec2.Subnet(f"subnet-{i}",
        vpc_id=vpc.id,
        cidr_block=f"10.0.{i+1}.0/24",
        availability_zone=az,
        tags={"Name": f"subnet-{az}"}
    )
    subnets.append(subnet)

# Export outputs
pulumi.export("vpc_id", vpc.id)
pulumi.export("subnet_ids", [s.id for s in subnets])

Ansible

Ansible is primarily a configuration management and orchestration tool that uses imperative YAML playbooks. It's agentless (connects via SSH) and excels at configuring servers after they're provisioned.

# Example: Ansible playbook for server configuration
---
- name: Configure web servers
  hosts: webservers
  become: yes
  vars:
    app_port: 8080
    nginx_version: "1.24"

  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Install nginx
      apt:
        name: "nginx={{ nginx_version }}*"
        state: present

    - name: Configure nginx virtual host
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/app
        mode: '0644'
      notify: Restart nginx

    - name: Enable site
      file:
        src: /etc/nginx/sites-available/app
        dest: /etc/nginx/sites-enabled/app
        state: link

  handlers:
    - name: Restart nginx
      systemd:
        name: nginx
        state: restarted
        enabled: yes

Tool Comparison

Tool	Language	Cloud Support	State	Paradigm	Best For
Terraform	HCL	Multi-cloud (1000+ providers)	State file	Declarative	Multi-cloud infrastructure
CloudFormation	YAML/JSON	AWS only	Managed by AWS	Declarative	AWS-native shops
Bicep	Bicep DSL	Azure only	Managed by Azure	Declarative	Azure-native shops
Pulumi	Python/TS/Go/C#	Multi-cloud	Pulumi Cloud or self-managed	Imperative*	Devs preferring real languages
Ansible	YAML	Multi-cloud + on-prem	Stateless	Imperative	Configuration management
CDK (AWS)	Python/TS/Java	AWS only	Managed by AWS	Imperative*	Complex AWS patterns

* Pulumi and CDK define infrastructure imperatively in code but generate declarative state that converges on desired outcomes.

Core IaC Concepts

Regardless of which tool you choose, these foundational concepts underpin all IaC systems.

Desired State vs Current State

Every IaC tool works by comparing two things: what you want (desired state, defined in code) and what exists (current state, read from the cloud). The tool then computes the minimal set of changes to make current match desired.

Desired State Reconciliation

flowchart TD
    A[Your Code
Desired State] --> C{IaC Engine}
    B[Cloud APIs
Current State] --> C
    C --> D{Differences?}
    D -->|Yes| E[Generate Plan]
    D -->|No| F[No Changes Needed]
    E --> G[Create Resources]
    E --> H[Update Resources]
    E --> I[Delete Resources]
    G --> J[State Converged ✓]
    H --> J
    I --> J
    F --> J
    style A fill:#3B9797,color:#fff
    style B fill:#16476A,color:#fff
    style J fill:#3B9797,color:#fff

Idempotency

Idempotency means running the same operation multiple times produces the same result. This is perhaps the most important property of declarative IaC — you can safely re-run your code without fear of creating duplicate resources or breaking things.

# Idempotent behavior example:
# Running terraform apply multiple times

$ terraform apply    # First run: Creates 3 servers, 1 VPC, 1 LB
# Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

$ terraform apply    # Second run: Nothing to do!
# Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

$ terraform apply    # Third run: Still nothing!
# Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

# Non-idempotent (imperative) behavior:
$ bash create-servers.sh    # Creates 3 servers
$ bash create-servers.sh    # Creates 3 MORE servers (6 total!)
$ bash create-servers.sh    # Creates 3 MORE servers (9 total!)

                            
                            Why idempotency matters: In production, things fail. Network timeouts, partial deployments, interrupted applies. With idempotent tools, you just re-run and the tool picks up where it left off. With imperative scripts, you need complex error handling and rollback logic.
                        

Plan/Apply Workflow

Most IaC tools separate planning from execution. The plan phase shows you what changes will occur, and the apply phase executes them. This safety mechanism prevents unintended modifications.

# The Plan/Apply workflow in Terraform:

# Step 1: PLAN — Preview changes (safe, read-only)
$ terraform plan
# Terraform will perform the following actions:
#
#   # aws_instance.web[0] will be created
#   + resource "aws_instance" "web" {
#       + ami           = "ami-0c55b159cbfafe1f0"
#       + instance_type = "t3.micro"
#       + tags          = {
#           + "Name" = "web-server-1"
#         }
#     }
#
# Plan: 3 to add, 0 to change, 0 to destroy.

# Step 2: REVIEW — Team examines the plan (human checkpoint)

# Step 3: APPLY — Execute the plan (makes real changes)
$ terraform apply
# Do you want to perform these actions?
#   Enter a value: yes
# aws_instance.web[0]: Creating...
# aws_instance.web[0]: Creation complete after 32s
# Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Drift Detection

Configuration drift occurs when the actual state of infrastructure diverges from the defined state — typically caused by manual changes (ClickOps), emergency fixes, or external automation that bypasses IaC.

Configuration Drift Lifecycle

sequenceDiagram
    participant Dev as Developer
    participant Code as IaC Code
    participant Cloud as Cloud Provider
    participant Ops as Ops Engineer

    Dev->>Code: Define desired state
    Code->>Cloud: terraform apply
    Note over Cloud: State matches code ✓

    Ops->>Cloud: Manual change via console!
    Note over Cloud: State DRIFTED from code ✗

    Dev->>Code: terraform plan
    Code-->>Dev: Drift detected!
Instance type changed manually

    Dev->>Code: terraform apply
    Code->>Cloud: Revert to desired state
    Note over Cloud: State matches code ✓

                            
                            Drift is the enemy of IaC. Once manual changes accumulate, your code no longer represents reality. Enforce policies: no manual changes in production. Use service control policies, RBAC, and monitoring to detect and prevent drift. Some teams run terraform plan on a schedule and alert when drift is detected.
                        

Immutable vs Mutable Infrastructure

Aspect	Mutable (Pets)	Immutable (Cattle)
Update strategy	Modify in-place (patch, upgrade)	Replace entirely (new image/container)
Server identity	Named, long-lived ("db-prod-01")	Disposable, auto-scaled ("instance-abc123")
Configuration drift	Accumulates over time	Impossible (replaced, not modified)
Rollback	Complex (undo changes)	Simple (deploy previous version)
Tools	Ansible, Chef, Puppet	Terraform + Packer, Kubernetes, Docker
Example	SSH in, run apt upgrade	Build new AMI, replace instances

                            
                            Pets vs Cattle: In the "pets" model, servers are unique and cherished — you nurse them back to health when sick. In the "cattle" model, servers are interchangeable — if one is unhealthy, you terminate it and spin up a replacement. Modern cloud architecture strongly favors the cattle model.
                        

State Management

State is the mechanism by which IaC tools track the relationship between your code and real-world resources. Understanding state is critical for working effectively with tools like Terraform.

What State Is and Why It's Needed

When you write resource "aws_instance" "web" {...} in Terraform, the tool needs to know: does this resource already exist? What's its current ID? What properties does it have? This mapping between code and real resources is stored in the state file.

{
  "version": 4,
  "terraform_version": "1.7.0",
  "resources": [
    {
      "mode": "managed",
      "type": "aws_instance",
      "name": "web",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456789",
            "ami": "ami-0c55b159cbfafe1f0",
            "instance_type": "t3.micro",
            "private_ip": "10.0.1.42",
            "public_ip": "54.23.167.89",
            "tags": {
              "Name": "web-server-1"
            }
          }
        }
      ]
    }
  ]
}

                            
                            State answers critical questions: What resources exist? What are their IDs and attributes? What depends on what? Without state, Terraform would try to create everything from scratch every time, or worse — have no way to update or delete existing resources.
                        

Remote State Backends

For team environments, state must be stored remotely where everyone can access it. Local state files on a developer's laptop won't work for collaboration.

# AWS S3 Backend with DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/network/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

# Azure Blob Storage Backend
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "stterraformstate"
    container_name       = "tfstate"
    key                  = "production/network.tfstate"
  }
}

# Google Cloud Storage Backend
terraform {
  backend "gcs" {
    bucket = "mycompany-terraform-state"
    prefix = "production/network"
  }
}

State Locking

When multiple team members might run Terraform simultaneously, state locking prevents concurrent modifications that could corrupt state or create conflicting resources.

State Locking Mechanism

sequenceDiagram
    participant A as Developer A
    participant Lock as Lock Table
(DynamoDB)
    participant State as State File
(S3)
    participant B as Developer B

    A->>Lock: Acquire lock
    Lock-->>A: Lock granted ✓
    A->>State: Read state
    A->>State: Write updated state
    B->>Lock: Acquire lock
    Lock-->>B: Lock DENIED ✗
(already held)
    Note over B: Waits or errors out
    A->>Lock: Release lock
    Lock-->>A: Lock released
    B->>Lock: Acquire lock
    Lock-->>B: Lock granted ✓

Backend	Locking Mechanism	Configuration
AWS S3	DynamoDB table	`dynamodb_table = "terraform-locks"`
Azure Blob	Blob lease	Automatic (built-in)
GCS	Object versioning	Automatic (built-in)
Terraform Cloud	Managed locking	Automatic (SaaS)
Consul	Session-based locks	`lock = true`

State Security

                            
                            State files contain secrets! Database passwords, API keys, private IPs, and other sensitive data appear in plaintext in state files. Always: (1) encrypt state at rest, (2) restrict access with IAM policies, (3) never commit state to Git, (4) enable versioning for recovery.
                        

# Security checklist for remote state:

# 1. Enable encryption at rest
# S3: Server-side encryption with KMS
aws s3api put-bucket-encryption \
  --bucket mycompany-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms"}}]
  }'

# 2. Block public access
aws s3api put-public-access-block \
  --bucket mycompany-terraform-state \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

# 3. Enable versioning (for recovery)
aws s3api put-bucket-versioning \
  --bucket mycompany-terraform-state \
  --versioning-configuration Status=Enabled

# 4. Restrict access with bucket policy
# Only allow specific IAM roles to read/write state

Terraform Introduction

Terraform is the industry standard for multi-cloud IaC. Let's explore its core concepts with a practical example that provisions a complete network stack.

HCL Syntax Basics

HashiCorp Configuration Language (HCL) is designed to be human-readable while being machine-parseable. Its three main constructs are resources, variables, and outputs.

# variables.tf — Input variables (parameterize your config)
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "development"

  validation {
    condition     = contains(["development", "staging", "production"], var.environment)
    error_message = "Environment must be development, staging, or production."
  }
}

variable "instance_count" {
  description = "Number of web server instances"
  type        = number
  default     = 2
}

variable "allowed_cidr_blocks" {
  description = "CIDR blocks allowed to access the application"
  type        = list(string)
  default     = ["10.0.0.0/8"]
}

variable "tags" {
  description = "Common tags for all resources"
  type        = map(string)
  default = {
    ManagedBy = "terraform"
    Team      = "platform"
  }
}

# main.tf — Resource definitions
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.tags, {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  })
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name = "${var.environment}-public-${count.index + 1}"
    Tier = "public"
  })
}

resource "aws_instance" "web" {
  count         = var.instance_count
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public[count.index % 2].id

  tags = merge(var.tags, {
    Name = "${var.environment}-web-${count.index + 1}"
    Role = "webserver"
  })
}

# outputs.tf — Output values (expose information)
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of public subnets"
  value       = aws_subnet.public[*].id
}

output "instance_public_ips" {
  description = "Public IPs of web server instances"
  value       = aws_instance.web[*].public_ip
}

Providers

Providers are plugins that Terraform uses to interact with cloud platforms, SaaS services, and other APIs. Each provider offers a set of resources and data sources.

Provider	Purpose	Resources	Registry
`hashicorp/aws`	Amazon Web Services	1,300+	registry.terraform.io/providers/hashicorp/aws
`hashicorp/azurerm`	Microsoft Azure	900+	registry.terraform.io/providers/hashicorp/azurerm
`hashicorp/google`	Google Cloud Platform	800+	registry.terraform.io/providers/hashicorp/google
`hashicorp/kubernetes`	Kubernetes clusters	50+	registry.terraform.io/providers/hashicorp/kubernetes
`integrations/github`	GitHub repos/teams	40+	registry.terraform.io/providers/integrations/github
`cloudflare/cloudflare`	Cloudflare DNS/CDN	60+	registry.terraform.io/providers/cloudflare/cloudflare

Resource Lifecycle

Terraform Resource Lifecycle

stateDiagram-v2
    [*] --> Planned: terraform plan
    Planned --> Creating: terraform apply
    Creating --> Created: API success
    Created --> Updating: Code changed + apply
    Updating --> Created: Update complete
    Created --> Destroying: Resource removed from code
    Destroying --> [*]: Destroyed

    Created --> Tainted: Manual taint
    Tainted --> Creating: terraform apply (recreate)

    note right of Created: Normal steady state
    note right of Tainted: Marked for recreation

Workflow Commands

# Complete Terraform workflow from scratch:

# 1. Initialize — download providers and modules
terraform init
# Initializing the backend...
# Initializing provider plugins...
# - Finding hashicorp/aws versions matching "~> 5.0"...
# - Installing hashicorp/aws v5.31.0...
# Terraform has been successfully initialized!

# 2. Format — consistent code style
terraform fmt -recursive
# main.tf
# variables.tf

# 3. Validate — check syntax and internal consistency
terraform validate
# Success! The configuration is valid.

# 4. Plan — preview what will change
terraform plan -out=tfplan
# Plan: 5 to add, 0 to change, 0 to destroy.
# Saved the plan to: tfplan

# 5. Apply — execute the plan
terraform apply tfplan
# Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

# 6. Show — inspect current state
terraform show

# 7. Destroy — tear everything down (careful!)
terraform destroy
# Plan: 0 to add, 0 to change, 5 to destroy.
# Do you really want to destroy all resources?

Modules & Reusability

As your infrastructure grows, you'll find yourself repeating patterns — every application needs a VPC, subnets, security groups, and a load balancer. Modules let you package these patterns into reusable components.

Module Structure

# Standard module directory structure:
modules/
└── vpc/
    ├── main.tf          # Resource definitions
    ├── variables.tf     # Input variables (module parameters)
    ├── outputs.tf       # Output values (what the module exposes)
    ├── versions.tf      # Provider version constraints
    └── README.md        # Usage documentation

# modules/vpc/variables.tf
variable "name" {
  description = "Name prefix for VPC resources"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_count" {
  description = "Number of public subnets"
  type        = number
  default     = 2
}

variable "environment" {
  description = "Environment tag"
  type        = string
}

# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.name}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "public" {
  count             = var.public_subnet_count
  vpc_id            = aws_vpc.this.id
  cidr_block        = cidrsubnet(var.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "${var.name}-public-${count.index + 1}"
    Tier = "public"
  }
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = {
    Name = "${var.name}-igw"
  }
}

# modules/vpc/outputs.tf
output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.this.id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "internet_gateway_id" {
  description = "The ID of the Internet Gateway"
  value       = aws_internet_gateway.this.id
}

Using Modules

# Using your custom module:
module "production_vpc" {
  source = "./modules/vpc"

  name                = "prod"
  cidr_block          = "10.0.0.0/16"
  public_subnet_count = 3
  environment         = "production"
}

module "staging_vpc" {
  source = "./modules/vpc"

  name                = "staging"
  cidr_block          = "10.1.0.0/16"
  public_subnet_count = 2
  environment         = "staging"
}

# Using a public registry module:
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.4.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
}

# Reference module outputs:
resource "aws_instance" "web" {
  subnet_id = module.production_vpc.public_subnet_ids[0]
  # ...
}

When to Create Modules

                            
                            Module creation guidelines:
                            Create a module when you repeat the same pattern 3+ times across projects
Don't create a module for a single resource — that adds complexity without benefit
Keep modules focused — a "vpc" module, not a "everything-for-my-app" module
Version your modules using Git tags so consumers can pin versions
Document inputs/outputs — a module without docs is a liability

                        

IaC Best Practices

Version Control Everything

# .gitignore for Terraform projects
# Never commit these files:

# Local state files (use remote backends!)
*.tfstate
*.tfstate.*

# State lock info
.terraform.lock.hcl

# Terraform working directories
.terraform/

# Variable files with secrets
*.tfvars
!example.tfvars

# Plan files (may contain sensitive data)
*.tfplan

# Crash log files
crash.log
crash.*.log

# Override files (local dev only)
override.tf
override.tf.json
*_override.tf
*_override.tf.json

Environment Separation

# Recommended directory structure for multi-environment:
infrastructure/
├── modules/                    # Reusable modules
│   ├── vpc/
│   ├── ecs-cluster/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf            # Uses modules with dev params
│   │   ├── variables.tf
│   │   ├── terraform.tfvars   # Dev-specific values
│   │   └── backend.tf         # Points to dev state bucket
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   └── production/
│       ├── main.tf
│       ├── variables.tf
│       ├── terraform.tfvars
│       └── backend.tf
└── global/                     # Shared resources (IAM, DNS)
    ├── iam/
    └── route53/

CI/CD for Infrastructure

IaC CI/CD Pipeline

flowchart LR
    A[Developer
Push Code] --> B[PR Created]
    B --> C[CI Pipeline]
    C --> D[terraform fmt
check]
    D --> E[terraform validate]
    E --> F[terraform plan]
    F --> G[Security Scan
tfsec/checkov]
    G --> H[Cost Estimate
infracost]
    H --> I[Plan Comment
on PR]
    I --> J{Approval?}
    J -->|Yes| K[Merge to main]
    K --> L[CD Pipeline]
    L --> M[terraform apply
-auto-approve]
    M --> N[Post-deploy
verification]
    J -->|No| O[Request Changes]
    style A fill:#3B9797,color:#fff
    style M fill:#132440,color:#fff
    style N fill:#3B9797,color:#fff

# Example: GitHub Actions CI/CD for Terraform
name: Terraform CI/CD

on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/production

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/environments/production

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            // Post plan output as PR comment

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/production

      - name: Terraform Apply
        run: terraform apply -auto-approve
        working-directory: infrastructure/environments/production

Testing Infrastructure Code

Testing Level	Tool	What It Checks	Speed
Syntax	`terraform validate`	HCL syntax, type errors	Instant
Formatting	`terraform fmt -check`	Consistent code style	Instant
Linting	tflint	Best practices, deprecated features	Seconds
Security	tfsec, Checkov, Trivy	Security misconfigurations	Seconds
Cost	Infracost	Cost impact of changes	Seconds
Policy	OPA/Sentinel	Organizational policies	Seconds
Integration	Terratest (Go)	Actually deploys & verifies	Minutes

# Running IaC tests in CI:

# 1. Format check (fails if code isn't formatted)
terraform fmt -check -recursive -diff

# 2. Validate syntax
terraform init -backend=false
terraform validate

# 3. Lint with tflint
tflint --init
tflint --recursive

# 4. Security scan with tfsec
tfsec . --format json --out results.json

# 5. Security scan with Checkov
checkov -d . --framework terraform

# 6. Cost estimation with Infracost
infracost breakdown --path . --format table
# NAME                     MONTHLY QTY  UNIT       MONTHLY COST
# aws_instance.web[0]
# ├─ Instance usage           730  hours         $7.59
# └─ root_block_device
#    └─ Storage (gp3)          20  GB            $1.60
# OVERALL TOTAL                                  $9.19

Hands-On Exercises

Exercise 1 Difficulty: Beginner

Write Your First Terraform Configuration

Create a Terraform config that manages a local file (no cloud account needed). This uses the local provider to demonstrate the full IaC lifecycle without any cloud costs.

# exercise1/main.tf
# Install Terraform, then run: terraform init && terraform apply

terraform {
  required_providers {
    local = {
      source  = "hashicorp/local"
      version = "~> 2.0"
    }
  }
}

resource "local_file" "hello" {
  content  = "Hello, Infrastructure as Code!\nManaged by Terraform."
  filename = "${path.module}/output/hello.txt"
}

resource "local_file" "config" {
  content = jsonencode({
    app_name    = "my-app"
    environment = "development"
    version     = "1.0.0"
    features    = ["logging", "metrics"]
  })
  filename = "${path.module}/output/config.json"
}

output "hello_file_path" {
  value = local_file.hello.filename
}

output "config_file_path" {
  value = local_file.config.filename
}

Tasks: (1) Run terraform init, (2) Run terraform plan and examine the output, (3) Run terraform apply, (4) Verify files were created, (5) Modify content and re-apply, (6) Run terraform destroy.

terraform beginner local provider

Exercise 2 Difficulty: Intermediate

Declarative vs Imperative Comparison

Implement the same task using both paradigms and observe the differences in behavior, especially around idempotency and error handling.

Task: Create a directory structure with 3 config files.

#!/bin/bash
# imperative-approach.sh
# Run: chmod +x imperative-approach.sh && ./imperative-approach.sh

# Imperative: explicit steps, NOT idempotent
mkdir -p /tmp/iac-exercise/configs

echo '{"service": "api", "port": 8080}' > /tmp/iac-exercise/configs/api.json
echo '{"service": "web", "port": 3000}' > /tmp/iac-exercise/configs/web.json
echo '{"service": "worker", "port": 9090}' > /tmp/iac-exercise/configs/worker.json

echo "Created 3 config files"
ls -la /tmp/iac-exercise/configs/

# Problem: Run this twice — it silently overwrites!
# Problem: Delete one file manually — script doesn't notice!

# declarative-approach/main.tf
# Declarative: define desired state, Terraform handles the rest

locals {
  services = {
    api    = { port = 8080 }
    web    = { port = 3000 }
    worker = { port = 9090 }
  }
}

resource "local_file" "config" {
  for_each = local.services

  content = jsonencode({
    service = each.key
    port    = each.value.port
  })
  filename = "${path.module}/configs/${each.key}.json"
}

# Benefits: Idempotent, detects drift, manages lifecycle

Observe: (1) Run the bash script twice — what happens? (2) Delete a file and re-run each approach. (3) Which approach detects and corrects drift?

declarative imperative comparison idempotency

Exercise 3 Difficulty: Advanced

Design a Remote State Architecture

Design (on paper or in HCL) a complete remote state architecture for a team of 5 engineers working across 3 environments (dev, staging, prod).

Requirements to address:

Where will state be stored? (S3, Azure Blob, GCS)
How will you prevent concurrent modifications? (Locking)
How will you separate environments? (Separate state files per env)
How will you encrypt sensitive data in state?
Who has access to production state vs dev state?
How will you recover from state corruption?

# Starter template for your design:
# bootstrap/main.tf — Creates the state infrastructure itself

resource "aws_s3_bucket" "terraform_state" {
  bucket = "myteam-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

# Your task: Add encryption, access policies, and per-env backends

state management architecture team workflows security

Exercise 4 Difficulty: Intermediate

Identify and Prevent Drift Scenarios

For each scenario below, explain: (1) How drift occurs, (2) How IaC detects it, (3) How to prevent it.

Scenario	Your Analysis
An engineer opens port 22 via the AWS console during debugging and forgets to close it	How does `terraform plan` detect this? What policy prevents it?
Auto-scaling adds 3 new instances that aren't in Terraform state	Is this drift? How should IaC handle auto-scaled resources?
A database RDS instance is upgraded manually from db.t3.medium to db.t3.large	What happens on next `terraform apply`? Is that safe?
Someone deletes a resource that Terraform manages	What does Terraform do? Recreate or error?

Bonus: Write a cron job or CI schedule that runs terraform plan daily and alerts on drift.

drift detection governance prevention

Conclusion & Next Steps

Infrastructure as Code fundamentally transforms how we manage infrastructure — from manual, error-prone processes to automated, version-controlled, reviewable workflows. The key takeaways from this article:

                            
                            Key Takeaways:
                            IaC eliminates snowflakes — infrastructure is reproducible, consistent, and documented as code
Declarative > Imperative for most infrastructure provisioning (idempotent, drift-resistant)
Terraform is the industry standard for multi-cloud declarative IaC with the largest ecosystem
State is sacred — store remotely, lock it, encrypt it, version it
Modules enable reuse — package patterns once, deploy everywhere
CI/CD for infrastructure — plan on PR, apply on merge, test continuously
Prevent drift — enforce no-manual-change policies and monitor for deviations

                        

Next in the Series

In Part 9: Terraform Fundamentals, we take a deep dive into Terraform's HCL language — providers, resources, data sources, locals, expressions, functions, and real-world deployment patterns. You'll build complete infrastructure from scratch across AWS and Azure.

Previous Part 7: Cloud Computing Fundamentals Next Part 9: Terraform Fundamentals

Cookie Consent