Back to Infrastructure & Cloud Automation Series

Part 8: Infrastructure as Code

May 14, 2026 Wasil Zafar 50 min read

Master Infrastructure as Code (IaC) — from declarative vs imperative paradigms and Terraform fundamentals to state management, modular design, drift detection, and CI/CD pipelines for infrastructure. Turn your cloud resources into version-controlled, reviewable, testable code.

Table of Contents

  1. The IaC Revolution
  2. Declarative vs Imperative
  3. The IaC Landscape
  4. Core IaC Concepts
  5. State Management
  6. Terraform Introduction
  7. Modules & Reusability
  8. IaC Best Practices
  9. Hands-On Exercises
  10. Conclusion & Next Steps

The IaC Revolution

Imagine you need to build an identical copy of your production environment for a new development team. With traditional infrastructure management, this might take weeks of manual work — logging into consoles, clicking through wizards, running commands from memory, and praying you didn't miss a security group rule. With Infrastructure as Code, you run a single command and have an identical environment in minutes.

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes or interactive tools. It applies software engineering practices — version control, code review, testing, and CI/CD — to infrastructure management.

Why IaC Exists: The Problems It Solves

Before IaC, infrastructure was managed through a combination of manual processes that introduced serious problems:

Problems with Manual Infrastructure Management
flowchart TD
    A[Manual Infrastructure] --> B[Snowflake Servers]
    A --> C[Configuration Drift]
    A --> D[Tribal Knowledge]
    A --> E[Slow Provisioning]
    A --> F[No Audit Trail]
    B --> G[Every server is unique]
    C --> H[Environments diverge over time]
    D --> I[Only one person knows how it works]
    E --> J[Days/weeks to provision]
    F --> K[Who changed what and when?]
    style A fill:#BF092F,color:#fff
    style B fill:#132440,color:#fff
    style C fill:#132440,color:#fff
    style D fill:#132440,color:#fff
    style E fill:#132440,color:#fff
    style F fill:#132440,color:#fff
                            
Snowflake Servers: When infrastructure is configured manually, each server becomes unique — like a snowflake. Reproducing one is nearly impossible because nobody documented every patch, tweak, and hotfix applied over months or years.
ProblemManual ApproachIaC Solution
ReproducibilityFollow a wiki/runbook (often outdated)Run code — same result every time
ConsistencyHumans make mistakesCode is deterministic
SpeedHours/days per environmentMinutes per environment
DocumentationSeparate docs (drift from reality)Code IS the documentation
CollaborationScreen sharing, tribal knowledgeCode reviews, pull requests
Disaster RecoveryRebuild from memory/backupsRe-apply code to new region
Audit TrailManual change logs (if any)Git history shows every change

Key Benefits of IaC

IaC Benefits Ecosystem
mindmap
  root((IaC Benefits))
    Version Control
      Git history
      Branching
      Code review
      Rollback
    Reproducibility
      Identical environments
      Disaster recovery
      Multi-region
      Dev/Staging/Prod parity
    Automation
      CI/CD pipelines
      Self-service
      Scheduled deployments
      Auto-scaling rules
    Collaboration
      Pull requests
      Team ownership
      Knowledge sharing
      Onboarding
    Testing
      Validation
      Linting
      Security scanning
      Cost estimation
                            
The Source Code Analogy: Just as application source code defines how your software behaves, IaC defines how your infrastructure behaves. You write it, review it, test it, version it, and deploy it through automated pipelines — exactly like application code.

Declarative vs Imperative

The most fundamental decision in IaC is choosing between two paradigms: declarative (describe the desired end state) and imperative (describe the steps to reach that state). This distinction shapes everything from how you think about infrastructure to which tools you choose.

Declarative: Describe WHAT You Want

In the declarative approach, you define the desired state of your infrastructure. The tool figures out how to make reality match that state. You say "I want 3 web servers behind a load balancer" without specifying the exact sequence of API calls.

# Declarative (Terraform HCL) — WHAT you want
# The tool determines HOW to create/modify/delete resources

resource "aws_instance" "web" {
  count         = 3
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server-${count.index + 1}"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_lb" "web" {
  name               = "web-load-balancer"
  internal           = false
  load_balancer_type = "application"
  subnets            = var.public_subnet_ids
}

Imperative: Describe HOW to Get There

In the imperative approach, you write the exact sequence of steps (commands, API calls) that should execute. You have full control over ordering and logic but must handle edge cases yourself.

#!/bin/bash
# Imperative (Bash script) — HOW to do it
# You specify every step and handle edge cases

# Step 1: Create instances
for i in 1 2 3; do
  INSTANCE_ID=$(aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type t3.micro \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=web-server-$i}]" \
    --query 'Instances[0].InstanceId' \
    --output text)
  echo "Created instance: $INSTANCE_ID"
  INSTANCE_IDS+=($INSTANCE_ID)
done

# Step 2: Wait for instances to be running
aws ec2 wait instance-running --instance-ids ${INSTANCE_IDS[@]}

# Step 3: Create load balancer
LB_ARN=$(aws elbv2 create-load-balancer \
  --name web-load-balancer \
  --subnets subnet-abc123 subnet-def456 \
  --type application \
  --query 'LoadBalancers[0].LoadBalancerArn' \
  --output text)
echo "Created load balancer: $LB_ARN"
Declarative vs Imperative Workflow
flowchart LR
    subgraph Declarative
        D1[Define Desired State] --> D2[Tool Reads Current State]
        D2 --> D3[Tool Calculates Diff]
        D3 --> D4[Tool Applies Changes]
        D4 --> D5[Desired = Current]
    end
    subgraph Imperative
        I1[Write Step 1] --> I2[Write Step 2]
        I2 --> I3[Write Step 3]
        I3 --> I4[Handle Errors]
        I4 --> I5[Hope State is Correct]
    end
    style D5 fill:#3B9797,color:#fff
    style I5 fill:#BF092F,color:#fff
                            

When to Use Each Approach

AspectDeclarativeImperative
You specifyDesired end stateStep-by-step instructions
Idempotent?Yes (by design)Must be manually ensured
OrderingTool determines orderYou control order
Drift handlingAutomatic (re-converge)Must detect and fix yourself
Learning curveDomain-specific languageFamiliar scripting languages
FlexibilityLimited to tool’s capabilitiesUnlimited (any logic)
Best forCloud resources, repeatable infraComplex orchestration, migrations
ExamplesTerraform, CloudFormation, Bicep, K8s YAMLAnsible, Bash, Pulumi, Python scripts
Real-world insight: Most mature organizations use both paradigms. Declarative for provisioning cloud resources (VPCs, databases, load balancers) and imperative for complex orchestration tasks (database migrations, rolling deployments, one-time scripts).

The IaC Landscape

The IaC ecosystem has matured significantly, with specialized tools for different use cases. Understanding the landscape helps you choose the right tool for your situation.

Terraform (HashiCorp)

Terraform is the most widely adopted multi-cloud IaC tool. It uses HashiCorp Configuration Language (HCL) — a declarative language designed specifically for infrastructure definition.

# Example: Terraform configuration for AWS VPC
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "main-vpc"
    Environment = "production"
  }
}

AWS CloudFormation

CloudFormation is AWS's native IaC service. It uses JSON or YAML templates with deep integration into AWS services, offering features like change sets, drift detection, and StackSets for multi-account deployment.

# Example: CloudFormation YAML template
AWSTemplateFormatVersion: '2010-09-09'
Description: Simple VPC with public subnet

Resources:
  MainVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: main-vpc
        - Key: Environment
          Value: production

  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref MainVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: public-subnet-1

Outputs:
  VpcId:
    Description: VPC ID
    Value: !Ref MainVPC
    Export:
      Name: MainVpcId

Azure Bicep

Bicep is Azure's domain-specific language for deploying Azure resources. It compiles down to ARM (Azure Resource Manager) templates but with a much cleaner syntax.

// Example: Azure Bicep for a Storage Account
@description('Location for all resources')
param location string = resourceGroup().location

@description('Storage account name')
param storageAccountName string = 'st${uniqueString(resourceGroup().id)}'

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    accessTier: 'Hot'
  }
}

output storageAccountId string = storageAccount.id
output primaryEndpoint string = storageAccount.properties.primaryEndpoints.blob

Pulumi

Pulumi takes a different approach — you write IaC using general-purpose programming languages (Python, TypeScript, Go, C#). This gives you full access to loops, conditionals, type systems, and testing frameworks.

# Example: Pulumi with Python (conceptual)
# Install: pip install pulumi pulumi-aws
# Initialize: pulumi new aws-python

# __main__.py
import pulumi
import pulumi_aws as aws

# Create a VPC using familiar Python
vpc = aws.ec2.Vpc("main-vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    tags={"Name": "main-vpc", "Environment": "production"}
)

# Use Python loops for multiple subnets
subnets = []
for i, az in enumerate(["us-east-1a", "us-east-1b", "us-east-1c"]):
    subnet = aws.ec2.Subnet(f"subnet-{i}",
        vpc_id=vpc.id,
        cidr_block=f"10.0.{i+1}.0/24",
        availability_zone=az,
        tags={"Name": f"subnet-{az}"}
    )
    subnets.append(subnet)

# Export outputs
pulumi.export("vpc_id", vpc.id)
pulumi.export("subnet_ids", [s.id for s in subnets])

Ansible

Ansible is primarily a configuration management and orchestration tool that uses imperative YAML playbooks. It's agentless (connects via SSH) and excels at configuring servers after they're provisioned.

# Example: Ansible playbook for server configuration
---
- name: Configure web servers
  hosts: webservers
  become: yes
  vars:
    app_port: 8080
    nginx_version: "1.24"

  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Install nginx
      apt:
        name: "nginx={{ nginx_version }}*"
        state: present

    - name: Configure nginx virtual host
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/app
        mode: '0644'
      notify: Restart nginx

    - name: Enable site
      file:
        src: /etc/nginx/sites-available/app
        dest: /etc/nginx/sites-enabled/app
        state: link

  handlers:
    - name: Restart nginx
      systemd:
        name: nginx
        state: restarted
        enabled: yes

Tool Comparison

ToolLanguageCloud SupportStateParadigmBest For
TerraformHCLMulti-cloud (1000+ providers)State fileDeclarativeMulti-cloud infrastructure
CloudFormationYAML/JSONAWS onlyManaged by AWSDeclarativeAWS-native shops
BicepBicep DSLAzure onlyManaged by AzureDeclarativeAzure-native shops
PulumiPython/TS/Go/C#Multi-cloudPulumi Cloud or self-managedImperative*Devs preferring real languages
AnsibleYAMLMulti-cloud + on-premStatelessImperativeConfiguration management
CDK (AWS)Python/TS/JavaAWS onlyManaged by AWSImperative*Complex AWS patterns

* Pulumi and CDK define infrastructure imperatively in code but generate declarative state that converges on desired outcomes.

Core IaC Concepts

Regardless of which tool you choose, these foundational concepts underpin all IaC systems.

Desired State vs Current State

Every IaC tool works by comparing two things: what you want (desired state, defined in code) and what exists (current state, read from the cloud). The tool then computes the minimal set of changes to make current match desired.

Desired State Reconciliation
flowchart TD
    A[Your Code
Desired State] --> C{IaC Engine} B[Cloud APIs
Current State] --> C C --> D{Differences?} D -->|Yes| E[Generate Plan] D -->|No| F[No Changes Needed] E --> G[Create Resources] E --> H[Update Resources] E --> I[Delete Resources] G --> J[State Converged ✓] H --> J I --> J F --> J style A fill:#3B9797,color:#fff style B fill:#16476A,color:#fff style J fill:#3B9797,color:#fff

Idempotency

Idempotency means running the same operation multiple times produces the same result. This is perhaps the most important property of declarative IaC — you can safely re-run your code without fear of creating duplicate resources or breaking things.

# Idempotent behavior example:
# Running terraform apply multiple times

$ terraform apply    # First run: Creates 3 servers, 1 VPC, 1 LB
# Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

$ terraform apply    # Second run: Nothing to do!
# Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

$ terraform apply    # Third run: Still nothing!
# Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

# Non-idempotent (imperative) behavior:
$ bash create-servers.sh    # Creates 3 servers
$ bash create-servers.sh    # Creates 3 MORE servers (6 total!)
$ bash create-servers.sh    # Creates 3 MORE servers (9 total!)
Why idempotency matters: In production, things fail. Network timeouts, partial deployments, interrupted applies. With idempotent tools, you just re-run and the tool picks up where it left off. With imperative scripts, you need complex error handling and rollback logic.

Plan/Apply Workflow

Most IaC tools separate planning from execution. The plan phase shows you what changes will occur, and the apply phase executes them. This safety mechanism prevents unintended modifications.

# The Plan/Apply workflow in Terraform:

# Step 1: PLAN — Preview changes (safe, read-only)
$ terraform plan
# Terraform will perform the following actions:
#
#   # aws_instance.web[0] will be created
#   + resource "aws_instance" "web" {
#       + ami           = "ami-0c55b159cbfafe1f0"
#       + instance_type = "t3.micro"
#       + tags          = {
#           + "Name" = "web-server-1"
#         }
#     }
#
# Plan: 3 to add, 0 to change, 0 to destroy.

# Step 2: REVIEW — Team examines the plan (human checkpoint)

# Step 3: APPLY — Execute the plan (makes real changes)
$ terraform apply
# Do you want to perform these actions?
#   Enter a value: yes
# aws_instance.web[0]: Creating...
# aws_instance.web[0]: Creation complete after 32s
# Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Drift Detection

Configuration drift occurs when the actual state of infrastructure diverges from the defined state — typically caused by manual changes (ClickOps), emergency fixes, or external automation that bypasses IaC.

Configuration Drift Lifecycle
sequenceDiagram
    participant Dev as Developer
    participant Code as IaC Code
    participant Cloud as Cloud Provider
    participant Ops as Ops Engineer

    Dev->>Code: Define desired state
    Code->>Cloud: terraform apply
    Note over Cloud: State matches code ✓

    Ops->>Cloud: Manual change via console!
    Note over Cloud: State DRIFTED from code ✗

    Dev->>Code: terraform plan
    Code-->>Dev: Drift detected!
Instance type changed manually Dev->>Code: terraform apply Code->>Cloud: Revert to desired state Note over Cloud: State matches code ✓
Drift is the enemy of IaC. Once manual changes accumulate, your code no longer represents reality. Enforce policies: no manual changes in production. Use service control policies, RBAC, and monitoring to detect and prevent drift. Some teams run terraform plan on a schedule and alert when drift is detected.

Immutable vs Mutable Infrastructure

AspectMutable (Pets)Immutable (Cattle)
Update strategyModify in-place (patch, upgrade)Replace entirely (new image/container)
Server identityNamed, long-lived ("db-prod-01")Disposable, auto-scaled ("instance-abc123")
Configuration driftAccumulates over timeImpossible (replaced, not modified)
RollbackComplex (undo changes)Simple (deploy previous version)
ToolsAnsible, Chef, PuppetTerraform + Packer, Kubernetes, Docker
ExampleSSH in, run apt upgradeBuild new AMI, replace instances
Pets vs Cattle: In the "pets" model, servers are unique and cherished — you nurse them back to health when sick. In the "cattle" model, servers are interchangeable — if one is unhealthy, you terminate it and spin up a replacement. Modern cloud architecture strongly favors the cattle model.

State Management

State is the mechanism by which IaC tools track the relationship between your code and real-world resources. Understanding state is critical for working effectively with tools like Terraform.

What State Is and Why It's Needed

When you write resource "aws_instance" "web" {...} in Terraform, the tool needs to know: does this resource already exist? What's its current ID? What properties does it have? This mapping between code and real resources is stored in the state file.

{
  "version": 4,
  "terraform_version": "1.7.0",
  "resources": [
    {
      "mode": "managed",
      "type": "aws_instance",
      "name": "web",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456789",
            "ami": "ami-0c55b159cbfafe1f0",
            "instance_type": "t3.micro",
            "private_ip": "10.0.1.42",
            "public_ip": "54.23.167.89",
            "tags": {
              "Name": "web-server-1"
            }
          }
        }
      ]
    }
  ]
}
State answers critical questions: What resources exist? What are their IDs and attributes? What depends on what? Without state, Terraform would try to create everything from scratch every time, or worse — have no way to update or delete existing resources.

Remote State Backends

For team environments, state must be stored remotely where everyone can access it. Local state files on a developer's laptop won't work for collaboration.

# AWS S3 Backend with DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/network/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}
# Azure Blob Storage Backend
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "stterraformstate"
    container_name       = "tfstate"
    key                  = "production/network.tfstate"
  }
}
# Google Cloud Storage Backend
terraform {
  backend "gcs" {
    bucket = "mycompany-terraform-state"
    prefix = "production/network"
  }
}

State Locking

When multiple team members might run Terraform simultaneously, state locking prevents concurrent modifications that could corrupt state or create conflicting resources.

State Locking Mechanism
sequenceDiagram
    participant A as Developer A
    participant Lock as Lock Table
(DynamoDB) participant State as State File
(S3) participant B as Developer B A->>Lock: Acquire lock Lock-->>A: Lock granted ✓ A->>State: Read state A->>State: Write updated state B->>Lock: Acquire lock Lock-->>B: Lock DENIED ✗
(already held) Note over B: Waits or errors out A->>Lock: Release lock Lock-->>A: Lock released B->>Lock: Acquire lock Lock-->>B: Lock granted ✓
BackendLocking MechanismConfiguration
AWS S3DynamoDB tabledynamodb_table = "terraform-locks"
Azure BlobBlob leaseAutomatic (built-in)
GCSObject versioningAutomatic (built-in)
Terraform CloudManaged lockingAutomatic (SaaS)
ConsulSession-based lockslock = true

State Security

State files contain secrets! Database passwords, API keys, private IPs, and other sensitive data appear in plaintext in state files. Always: (1) encrypt state at rest, (2) restrict access with IAM policies, (3) never commit state to Git, (4) enable versioning for recovery.
# Security checklist for remote state:

# 1. Enable encryption at rest
# S3: Server-side encryption with KMS
aws s3api put-bucket-encryption \
  --bucket mycompany-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms"}}]
  }'

# 2. Block public access
aws s3api put-public-access-block \
  --bucket mycompany-terraform-state \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

# 3. Enable versioning (for recovery)
aws s3api put-bucket-versioning \
  --bucket mycompany-terraform-state \
  --versioning-configuration Status=Enabled

# 4. Restrict access with bucket policy
# Only allow specific IAM roles to read/write state

Terraform Introduction

Terraform is the industry standard for multi-cloud IaC. Let's explore its core concepts with a practical example that provisions a complete network stack.

HCL Syntax Basics

HashiCorp Configuration Language (HCL) is designed to be human-readable while being machine-parseable. Its three main constructs are resources, variables, and outputs.

# variables.tf — Input variables (parameterize your config)
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "development"

  validation {
    condition     = contains(["development", "staging", "production"], var.environment)
    error_message = "Environment must be development, staging, or production."
  }
}

variable "instance_count" {
  description = "Number of web server instances"
  type        = number
  default     = 2
}

variable "allowed_cidr_blocks" {
  description = "CIDR blocks allowed to access the application"
  type        = list(string)
  default     = ["10.0.0.0/8"]
}

variable "tags" {
  description = "Common tags for all resources"
  type        = map(string)
  default = {
    ManagedBy = "terraform"
    Team      = "platform"
  }
}
# main.tf — Resource definitions
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.tags, {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  })
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name = "${var.environment}-public-${count.index + 1}"
    Tier = "public"
  })
}

resource "aws_instance" "web" {
  count         = var.instance_count
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public[count.index % 2].id

  tags = merge(var.tags, {
    Name = "${var.environment}-web-${count.index + 1}"
    Role = "webserver"
  })
}
# outputs.tf — Output values (expose information)
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of public subnets"
  value       = aws_subnet.public[*].id
}

output "instance_public_ips" {
  description = "Public IPs of web server instances"
  value       = aws_instance.web[*].public_ip
}

Providers

Providers are plugins that Terraform uses to interact with cloud platforms, SaaS services, and other APIs. Each provider offers a set of resources and data sources.

ProviderPurposeResourcesRegistry
hashicorp/awsAmazon Web Services1,300+registry.terraform.io/providers/hashicorp/aws
hashicorp/azurermMicrosoft Azure900+registry.terraform.io/providers/hashicorp/azurerm
hashicorp/googleGoogle Cloud Platform800+registry.terraform.io/providers/hashicorp/google
hashicorp/kubernetesKubernetes clusters50+registry.terraform.io/providers/hashicorp/kubernetes
integrations/githubGitHub repos/teams40+registry.terraform.io/providers/integrations/github
cloudflare/cloudflareCloudflare DNS/CDN60+registry.terraform.io/providers/cloudflare/cloudflare

Resource Lifecycle

Terraform Resource Lifecycle
stateDiagram-v2
    [*] --> Planned: terraform plan
    Planned --> Creating: terraform apply
    Creating --> Created: API success
    Created --> Updating: Code changed + apply
    Updating --> Created: Update complete
    Created --> Destroying: Resource removed from code
    Destroying --> [*]: Destroyed

    Created --> Tainted: Manual taint
    Tainted --> Creating: terraform apply (recreate)

    note right of Created: Normal steady state
    note right of Tainted: Marked for recreation
                            

Workflow Commands

# Complete Terraform workflow from scratch:

# 1. Initialize — download providers and modules
terraform init
# Initializing the backend...
# Initializing provider plugins...
# - Finding hashicorp/aws versions matching "~> 5.0"...
# - Installing hashicorp/aws v5.31.0...
# Terraform has been successfully initialized!

# 2. Format — consistent code style
terraform fmt -recursive
# main.tf
# variables.tf

# 3. Validate — check syntax and internal consistency
terraform validate
# Success! The configuration is valid.

# 4. Plan — preview what will change
terraform plan -out=tfplan
# Plan: 5 to add, 0 to change, 0 to destroy.
# Saved the plan to: tfplan

# 5. Apply — execute the plan
terraform apply tfplan
# Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

# 6. Show — inspect current state
terraform show

# 7. Destroy — tear everything down (careful!)
terraform destroy
# Plan: 0 to add, 0 to change, 5 to destroy.
# Do you really want to destroy all resources?

Modules & Reusability

As your infrastructure grows, you'll find yourself repeating patterns — every application needs a VPC, subnets, security groups, and a load balancer. Modules let you package these patterns into reusable components.

Module Structure

# Standard module directory structure:
modules/
└── vpc/
    ├── main.tf          # Resource definitions
    ├── variables.tf     # Input variables (module parameters)
    ├── outputs.tf       # Output values (what the module exposes)
    ├── versions.tf      # Provider version constraints
    └── README.md        # Usage documentation
# modules/vpc/variables.tf
variable "name" {
  description = "Name prefix for VPC resources"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_count" {
  description = "Number of public subnets"
  type        = number
  default     = 2
}

variable "environment" {
  description = "Environment tag"
  type        = string
}
# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.name}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "public" {
  count             = var.public_subnet_count
  vpc_id            = aws_vpc.this.id
  cidr_block        = cidrsubnet(var.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "${var.name}-public-${count.index + 1}"
    Tier = "public"
  }
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = {
    Name = "${var.name}-igw"
  }
}
# modules/vpc/outputs.tf
output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.this.id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "internet_gateway_id" {
  description = "The ID of the Internet Gateway"
  value       = aws_internet_gateway.this.id
}

Using Modules

# Using your custom module:
module "production_vpc" {
  source = "./modules/vpc"

  name                = "prod"
  cidr_block          = "10.0.0.0/16"
  public_subnet_count = 3
  environment         = "production"
}

module "staging_vpc" {
  source = "./modules/vpc"

  name                = "staging"
  cidr_block          = "10.1.0.0/16"
  public_subnet_count = 2
  environment         = "staging"
}

# Using a public registry module:
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.4.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true
}

# Reference module outputs:
resource "aws_instance" "web" {
  subnet_id = module.production_vpc.public_subnet_ids[0]
  # ...
}

When to Create Modules

Module creation guidelines:
  • Create a module when you repeat the same pattern 3+ times across projects
  • Don't create a module for a single resource — that adds complexity without benefit
  • Keep modules focused — a "vpc" module, not a "everything-for-my-app" module
  • Version your modules using Git tags so consumers can pin versions
  • Document inputs/outputs — a module without docs is a liability

IaC Best Practices

Version Control Everything

# .gitignore for Terraform projects
# Never commit these files:

# Local state files (use remote backends!)
*.tfstate
*.tfstate.*

# State lock info
.terraform.lock.hcl

# Terraform working directories
.terraform/

# Variable files with secrets
*.tfvars
!example.tfvars

# Plan files (may contain sensitive data)
*.tfplan

# Crash log files
crash.log
crash.*.log

# Override files (local dev only)
override.tf
override.tf.json
*_override.tf
*_override.tf.json

Environment Separation

# Recommended directory structure for multi-environment:
infrastructure/
├── modules/                    # Reusable modules
│   ├── vpc/
│   ├── ecs-cluster/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf            # Uses modules with dev params
│   │   ├── variables.tf
│   │   ├── terraform.tfvars   # Dev-specific values
│   │   └── backend.tf         # Points to dev state bucket
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   └── production/
│       ├── main.tf
│       ├── variables.tf
│       ├── terraform.tfvars
│       └── backend.tf
└── global/                     # Shared resources (IAM, DNS)
    ├── iam/
    └── route53/

CI/CD for Infrastructure

IaC CI/CD Pipeline
flowchart LR
    A[Developer
Push Code] --> B[PR Created] B --> C[CI Pipeline] C --> D[terraform fmt
check] D --> E[terraform validate] E --> F[terraform plan] F --> G[Security Scan
tfsec/checkov] G --> H[Cost Estimate
infracost] H --> I[Plan Comment
on PR] I --> J{Approval?} J -->|Yes| K[Merge to main] K --> L[CD Pipeline] L --> M[terraform apply
-auto-approve] M --> N[Post-deploy
verification] J -->|No| O[Request Changes] style A fill:#3B9797,color:#fff style M fill:#132440,color:#fff style N fill:#3B9797,color:#fff
# Example: GitHub Actions CI/CD for Terraform
name: Terraform CI/CD

on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/production

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/environments/production

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            // Post plan output as PR comment

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/production

      - name: Terraform Apply
        run: terraform apply -auto-approve
        working-directory: infrastructure/environments/production

Testing Infrastructure Code

Testing LevelToolWhat It ChecksSpeed
Syntaxterraform validateHCL syntax, type errorsInstant
Formattingterraform fmt -checkConsistent code styleInstant
LintingtflintBest practices, deprecated featuresSeconds
Securitytfsec, Checkov, TrivySecurity misconfigurationsSeconds
CostInfracostCost impact of changesSeconds
PolicyOPA/SentinelOrganizational policiesSeconds
IntegrationTerratest (Go)Actually deploys & verifiesMinutes
# Running IaC tests in CI:

# 1. Format check (fails if code isn't formatted)
terraform fmt -check -recursive -diff

# 2. Validate syntax
terraform init -backend=false
terraform validate

# 3. Lint with tflint
tflint --init
tflint --recursive

# 4. Security scan with tfsec
tfsec . --format json --out results.json

# 5. Security scan with Checkov
checkov -d . --framework terraform

# 6. Cost estimation with Infracost
infracost breakdown --path . --format table
# NAME                     MONTHLY QTY  UNIT       MONTHLY COST
# aws_instance.web[0]
# ├─ Instance usage           730  hours         $7.59
# └─ root_block_device
#    └─ Storage (gp3)          20  GB            $1.60
# OVERALL TOTAL                                  $9.19

Hands-On Exercises

Exercise 1 Difficulty: Beginner

Write Your First Terraform Configuration

Create a Terraform config that manages a local file (no cloud account needed). This uses the local provider to demonstrate the full IaC lifecycle without any cloud costs.

# exercise1/main.tf
# Install Terraform, then run: terraform init && terraform apply

terraform {
  required_providers {
    local = {
      source  = "hashicorp/local"
      version = "~> 2.0"
    }
  }
}

resource "local_file" "hello" {
  content  = "Hello, Infrastructure as Code!\nManaged by Terraform."
  filename = "${path.module}/output/hello.txt"
}

resource "local_file" "config" {
  content = jsonencode({
    app_name    = "my-app"
    environment = "development"
    version     = "1.0.0"
    features    = ["logging", "metrics"]
  })
  filename = "${path.module}/output/config.json"
}

output "hello_file_path" {
  value = local_file.hello.filename
}

output "config_file_path" {
  value = local_file.config.filename
}

Tasks: (1) Run terraform init, (2) Run terraform plan and examine the output, (3) Run terraform apply, (4) Verify files were created, (5) Modify content and re-apply, (6) Run terraform destroy.

terraform beginner local provider
Exercise 2 Difficulty: Intermediate

Declarative vs Imperative Comparison

Implement the same task using both paradigms and observe the differences in behavior, especially around idempotency and error handling.

Task: Create a directory structure with 3 config files.

#!/bin/bash
# imperative-approach.sh
# Run: chmod +x imperative-approach.sh && ./imperative-approach.sh

# Imperative: explicit steps, NOT idempotent
mkdir -p /tmp/iac-exercise/configs

echo '{"service": "api", "port": 8080}' > /tmp/iac-exercise/configs/api.json
echo '{"service": "web", "port": 3000}' > /tmp/iac-exercise/configs/web.json
echo '{"service": "worker", "port": 9090}' > /tmp/iac-exercise/configs/worker.json

echo "Created 3 config files"
ls -la /tmp/iac-exercise/configs/

# Problem: Run this twice — it silently overwrites!
# Problem: Delete one file manually — script doesn't notice!
# declarative-approach/main.tf
# Declarative: define desired state, Terraform handles the rest

locals {
  services = {
    api    = { port = 8080 }
    web    = { port = 3000 }
    worker = { port = 9090 }
  }
}

resource "local_file" "config" {
  for_each = local.services

  content = jsonencode({
    service = each.key
    port    = each.value.port
  })
  filename = "${path.module}/configs/${each.key}.json"
}

# Benefits: Idempotent, detects drift, manages lifecycle

Observe: (1) Run the bash script twice — what happens? (2) Delete a file and re-run each approach. (3) Which approach detects and corrects drift?

declarative imperative comparison idempotency
Exercise 3 Difficulty: Advanced

Design a Remote State Architecture

Design (on paper or in HCL) a complete remote state architecture for a team of 5 engineers working across 3 environments (dev, staging, prod).

Requirements to address:

  • Where will state be stored? (S3, Azure Blob, GCS)
  • How will you prevent concurrent modifications? (Locking)
  • How will you separate environments? (Separate state files per env)
  • How will you encrypt sensitive data in state?
  • Who has access to production state vs dev state?
  • How will you recover from state corruption?
# Starter template for your design:
# bootstrap/main.tf — Creates the state infrastructure itself

resource "aws_s3_bucket" "terraform_state" {
  bucket = "myteam-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

# Your task: Add encryption, access policies, and per-env backends
state management architecture team workflows security
Exercise 4 Difficulty: Intermediate

Identify and Prevent Drift Scenarios

For each scenario below, explain: (1) How drift occurs, (2) How IaC detects it, (3) How to prevent it.

ScenarioYour Analysis
An engineer opens port 22 via the AWS console during debugging and forgets to close itHow does terraform plan detect this? What policy prevents it?
Auto-scaling adds 3 new instances that aren't in Terraform stateIs this drift? How should IaC handle auto-scaled resources?
A database RDS instance is upgraded manually from db.t3.medium to db.t3.largeWhat happens on next terraform apply? Is that safe?
Someone deletes a resource that Terraform managesWhat does Terraform do? Recreate or error?

Bonus: Write a cron job or CI schedule that runs terraform plan daily and alerts on drift.

drift detection governance prevention

Conclusion & Next Steps

Infrastructure as Code fundamentally transforms how we manage infrastructure — from manual, error-prone processes to automated, version-controlled, reviewable workflows. The key takeaways from this article:

Key Takeaways:
  • IaC eliminates snowflakes — infrastructure is reproducible, consistent, and documented as code
  • Declarative > Imperative for most infrastructure provisioning (idempotent, drift-resistant)
  • Terraform is the industry standard for multi-cloud declarative IaC with the largest ecosystem
  • State is sacred — store remotely, lock it, encrypt it, version it
  • Modules enable reuse — package patterns once, deploy everywhere
  • CI/CD for infrastructure — plan on PR, apply on merge, test continuously
  • Prevent drift — enforce no-manual-change policies and monitor for deviations

Next in the Series

In Part 9: Terraform Fundamentals, we take a deep dive into Terraform's HCL language — providers, resources, data sources, locals, expressions, functions, and real-world deployment patterns. You'll build complete infrastructure from scratch across AWS and Azure.