Part 15: Advanced Terraform Patterns

Beyond the Basics

You have written your first Terraform configurations, provisioned a VPC, launched an EC2 instance, and set up an S3 bucket. Your terraform apply works flawlessly for a single environment with 20 resources. Then reality hits: your organization needs the same infrastructure across development, staging, and production, in multiple regions, managed by multiple teams, with policy enforcement and cost controls.

This is the complexity cliff — the point where basic Terraform patterns break down under the weight of enterprise requirements. Managing 50+ resources across 3+ environments in multiple regions with team-level access controls demands a fundamentally different approach to Infrastructure as Code.

                            
                            Key Insight: Advanced Terraform is not about writing more complex HCL. It is about organizing configurations for maintainability, composing modules for reusability, splitting state for safety, and automating workflows for consistency. The goal is less code, fewer risks, and faster deployments.
                        

When Basic Terraform Isn't Enough

Signs you have outgrown basic Terraform patterns:

Copy-paste between environments — Duplicating entire directory trees for dev/staging/prod
Monolithic state files — A single state file with 500+ resources and 10-minute plan times
Module spaghetti — Deeply nested modules with unclear ownership and undocumented interfaces
Manual state surgery — Regularly running terraform state mv because refactoring breaks addresses
Blast radius anxiety — Every terraform apply touches resources owned by other teams
Drift everywhere — No mechanism to detect or prevent manual changes in the console

Terraform Maturity Journey

flowchart LR
    A[Single File
1-20 Resources] --> B[Modules
20-100 Resources]
    B --> C[Workspaces
Multi-Environment]
    C --> D[State Splitting
Team Boundaries]
    D --> E[Terragrunt
DRY at Scale]
    E --> F[Enterprise
Policy + Governance]

    style A fill:#f8f9fa,stroke:#3B9797
    style B fill:#f8f9fa,stroke:#3B9797
    style C fill:#f8f9fa,stroke:#16476A
    style D fill:#f8f9fa,stroke:#16476A
    style E fill:#f8f9fa,stroke:#132440
    style F fill:#f8f9fa,stroke:#BF092F

This article equips you with the patterns, tools, and techniques to navigate each stage of this journey confidently.

Workspaces

Terraform workspaces provide isolated state files within the same configuration directory. Each workspace maintains its own terraform.tfstate file, enabling you to deploy the same infrastructure configuration to multiple environments without duplicating code.

# Create and manage workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

# List all workspaces
terraform workspace list
# Output:
#   default
#   dev
# * staging
#   prod

# Switch workspace
terraform workspace select prod

# Delete a workspace (must not be current)
terraform workspace delete dev

Using terraform.workspace for Environment Logic

The terraform.workspace variable lets you branch configuration logic per environment:

# variables.tf - Environment-specific sizing
locals {
  environment_config = {
    dev = {
      instance_type  = "t3.small"
      instance_count = 1
      db_instance    = "db.t3.micro"
      multi_az       = false
    }
    staging = {
      instance_type  = "t3.medium"
      instance_count = 2
      db_instance    = "db.t3.small"
      multi_az       = false
    }
    prod = {
      instance_type  = "t3.large"
      instance_count = 3
      db_instance    = "db.r5.large"
      multi_az       = true
    }
  }

  config = local.environment_config[terraform.workspace]
}

# main.tf - Using workspace-driven configuration
resource "aws_instance" "app" {
  count         = local.config.instance_count
  instance_type = local.config.instance_type
  ami           = data.aws_ami.ubuntu.id

  tags = {
    Name        = "app-${terraform.workspace}-${count.index}"
    Environment = terraform.workspace
  }
}

resource "aws_db_instance" "main" {
  instance_class    = local.config.db_instance
  multi_az          = local.config.multi_az
  identifier        = "db-${terraform.workspace}"
  allocated_storage = terraform.workspace == "prod" ? 100 : 20
}

When Workspaces Work (and When They Don't)

Aspect	Workspaces	Directory-Based Environments
State isolation	Separate state files, same backend	Completely separate backends possible
Configuration drift	Environments always share same code	Environments can diverge (risky)
Access control	Difficult to restrict per-workspace	Separate repos/dirs = separate permissions
Best for	Identical infra, only sizing differs	Environments with structural differences
CI/CD complexity	Single pipeline with workspace variable	Separate pipelines per directory
Visibility	Easy to forget which workspace is active	File path makes environment obvious

                            
                            Warning: The most dangerous workspace anti-pattern is forgetting which workspace you are in. Running terraform destroy in production because you thought you were in dev is a career-defining moment. Always verify with terraform workspace show before destructive operations, and implement CI/CD that selects workspaces automatically.
                        

Advanced Module Patterns

Modules are Terraform's unit of reuse, but poorly designed modules create more problems than they solve. Advanced module patterns focus on composition over inheritance, clear interfaces, and testability.

Module Composition Architecture

flowchart TD
    Root[Root Module
environments/prod] --> Net[Network Module
v2.1.0]
    Root --> Compute[Compute Module
v1.4.0]
    Root --> Data[Database Module
v3.0.0]
    Root --> Monitor[Monitoring Module
v1.2.0]

    Net --> VPC[VPC Submodule]
    Net --> SG[Security Groups]
    Net --> LB[Load Balancer]

    Compute --> ASG[Auto Scaling Group]
    Compute --> Launch[Launch Template]

    Data --> RDS[RDS Instance]
    Data --> Redis[ElastiCache]

    Monitor --> CW[CloudWatch]
    Monitor --> Alert[Alerting Rules]

    style Root fill:#132440,color:#fff
    style Net fill:#3B9797,color:#fff
    style Compute fill:#3B9797,color:#fff
    style Data fill:#3B9797,color:#fff
    style Monitor fill:#3B9797,color:#fff

Module Versioning and Version Constraints

# Using versioned modules from a private registry
module "vpc" {
  source  = "app.terraform.io/myorg/vpc/aws"
  version = "~> 2.1"  # >= 2.1.0, < 3.0.0

  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  enable_nat_gateway = true
}

# Git source with tag constraint
module "monitoring" {
  source = "git::https://github.com/myorg/terraform-aws-monitoring.git?ref=v1.2.0"

  alarm_sns_topic = aws_sns_topic.alerts.arn
  environment     = var.environment
}

# Local module for project-specific logic
module "app_config" {
  source = "../../modules/app-config"

  app_name    = var.app_name
  environment = var.environment
  secrets     = var.app_secrets
}

Module Testing with Terraform Test Framework

Terraform 1.6+ introduced a native test framework using .tftest.hcl files:

# tests/vpc.tftest.hcl - Module integration test
variables {
  cidr_block         = "10.99.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b"]
  environment        = "test"
}

run "creates_vpc_with_correct_cidr" {
  command = plan

  assert {
    condition     = aws_vpc.main.cidr_block == "10.99.0.0/16"
    error_message = "VPC CIDR block does not match expected value"
  }
}

run "creates_expected_subnet_count" {
  command = plan

  assert {
    condition     = length(aws_subnet.private) == 2
    error_message = "Expected 2 private subnets, got ${length(aws_subnet.private)}"
  }

  assert {
    condition     = length(aws_subnet.public) == 2
    error_message = "Expected 2 public subnets"
  }
}

run "full_apply_and_verify" {
  command = apply

  assert {
    condition     = aws_vpc.main.enable_dns_hostnames == true
    error_message = "DNS hostnames should be enabled"
  }
}

# Run Terraform tests
terraform test

# Run with verbose output
terraform test -verbose

# Run specific test file
terraform test -filter=tests/vpc.tftest.hcl

Module Design Patterns

Pattern	Purpose	Example	When to Use
Wrapper Module	Opinionated defaults over generic module	Company VPC module wrapping community VPC	Enforce org standards while using community modules
Composition Module	Orchestrates multiple smaller modules	"Web App" combining VPC + ALB + ECS + RDS	Common deployment patterns used by many teams
Utility Module	Computes values without creating resources	CIDR calculator, naming convention generator	Reusable logic needed by multiple modules
Service Module	Encapsulates one application's infrastructure	All resources for "payment-service"	Team-owned service with clear boundaries

                            
                            The Diamond Problem: When Module A and Module B both depend on Module C (e.g., both create security groups in the same VPC), you get conflicting resource creation. Solve this by extracting shared resources to a higher level and passing references down as input variables rather than letting each module create its own.
                        

Terragrunt

Terragrunt is a thin wrapper around Terraform that provides extra tools for keeping configurations DRY (Don't Repeat Yourself), managing remote state, and orchestrating multi-module deployments. It shines when you have the same Terraform module deployed across many environments and regions.

# terragrunt.hcl (root) - Shared configuration for all environments
# This file lives at the repo root and is inherited by all child configs

remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "myorg-terraform-state-${get_aws_account_id()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "${local.region}"
  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = "${local.environment}"
      Project     = "${local.project}"
    }
  }
}
EOF
}

Multi-Environment Directory Structure

# Recommended Terragrunt directory structure
infrastructure/
├── terragrunt.hcl              # Root config (remote state, provider generation)
├── _envcommon/                  # Shared module configurations
│   ├── vpc.hcl
│   ├── eks.hcl
│   └── rds.hcl
├── dev/
│   ├── env.hcl                 # Environment-specific variables
│   ├── us-east-1/
│   │   ├── region.hcl          # Region-specific variables
│   │   ├── vpc/
│   │   │   └── terragrunt.hcl  # Includes _envcommon/vpc.hcl
│   │   ├── eks/
│   │   │   └── terragrunt.hcl
│   │   └── rds/
│   │       └── terragrunt.hcl
│   └── eu-west-1/
│       ├── region.hcl
│       └── vpc/
│           └── terragrunt.hcl
├── staging/
│   ├── env.hcl
│   └── us-east-1/
│       └── ...
└── prod/
    ├── env.hcl
    ├── us-east-1/
    │   └── ...
    └── eu-west-1/
        └── ...

Dependencies and run_all

# dev/us-east-1/eks/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

include "envcommon" {
  path   = "${dirname(find_in_parent_folders())}/_envcommon/eks.hcl"
  expose = true
}

locals {
  env_vars    = read_terragrunt_config(find_in_parent_folders("env.hcl"))
  region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  environment = local.env_vars.locals.environment
  region      = local.region_vars.locals.region
}

# Declare dependencies - EKS needs VPC to exist first
dependency "vpc" {
  config_path = "../vpc"

  mock_outputs = {
    vpc_id          = "vpc-mock-12345"
    private_subnets = ["subnet-mock-1", "subnet-mock-2"]
  }
}

inputs = {
  cluster_name    = "eks-${local.environment}-${local.region}"
  vpc_id          = dependency.vpc.outputs.vpc_id
  subnet_ids      = dependency.vpc.outputs.private_subnets
  cluster_version = "1.29"
  node_groups = {
    default = {
      instance_types = local.environment == "prod" ? ["m5.xlarge"] : ["t3.medium"]
      min_size       = local.environment == "prod" ? 3 : 1
      max_size       = local.environment == "prod" ? 10 : 3
      desired_size   = local.environment == "prod" ? 3 : 1
    }
  }
}

# Apply all modules in correct dependency order
cd infrastructure/dev/us-east-1
terragrunt run-all apply

# Plan all modules with dependency graph
terragrunt run-all plan

# Destroy in reverse dependency order
terragrunt run-all destroy

# Apply only specific module and its dependencies
cd infrastructure/dev/us-east-1/eks
terragrunt apply  # Automatically applies vpc first if needed

Feature	Raw Terraform	Terragrunt
Backend configuration	Copy-paste backend blocks in every module	Generated automatically from root config
Provider configuration	Repeated in every environment directory	Generated from templates with variables
Cross-module dependencies	Manual terraform_remote_state data sources	Declarative dependency blocks with mock outputs
Multi-module operations	Manual scripts or Makefiles	Built-in run-all with parallelism
Environment differences	tfvars files or workspaces	Hierarchical variable inheritance
Learning curve	Just HCL	HCL + Terragrunt-specific concepts

State Management Advanced

State is Terraform's most critical and fragile component. Advanced state management focuses on reducing blast radius, enabling team autonomy, and facilitating safe refactoring.

State Splitting Architecture

flowchart TD
    subgraph "Monolithic State (Before)"
        Mono[Single State File
500+ Resources
All Teams]
    end

    subgraph "Split State (After)"
        Net[Network State
VPC, Subnets, NAT
Platform Team]
        Comp[Compute State
EKS, ASG, ALB
Platform Team]
        App1[App A State
Services, DBs
Team Alpha]
        App2[App B State
Services, Queues
Team Beta]
        Shared[Shared State
IAM, DNS, KMS
Security Team]
    end

    Mono --> Net
    Mono --> Comp
    Mono --> App1
    Mono --> App2
    Mono --> Shared

    Net -.->|remote_state| Comp
    Net -.->|remote_state| App1
    Shared -.->|remote_state| App1
    Shared -.->|remote_state| App2

    style Mono fill:#BF092F,color:#fff
    style Net fill:#3B9797,color:#fff
    style Comp fill:#3B9797,color:#fff
    style App1 fill:#16476A,color:#fff
    style App2 fill:#16476A,color:#fff
    style Shared fill:#132440,color:#fff

State Operations

# Move a resource to a different address (refactoring)
terraform state mv aws_instance.app aws_instance.web_server

# Move a resource into a module
terraform state mv aws_s3_bucket.logs module.logging.aws_s3_bucket.logs

# Remove a resource from state (without destroying it)
terraform state rm aws_iam_role.legacy_role

# Import existing infrastructure into state
terraform import aws_instance.imported i-1234567890abcdef0

# List all resources in current state
terraform state list

# Show details of a specific resource
terraform state show aws_instance.web_server

Moved Blocks (Terraform 1.1+)

Moved blocks let you refactor resource addresses without manual state surgery:

# Rename a resource - Terraform handles the state move automatically
moved {
  from = aws_instance.app
  to   = aws_instance.web_server
}

# Move a resource into a module
moved {
  from = aws_security_group.app_sg
  to   = module.networking.aws_security_group.app
}

# Rename a module
moved {
  from = module.old_name
  to   = module.new_name
}

# Move from count to for_each
moved {
  from = aws_subnet.private[0]
  to   = aws_subnet.private["us-east-1a"]
}

moved {
  from = aws_subnet.private[1]
  to   = aws_subnet.private["us-east-1b"]
}

Import Blocks (Terraform 1.5+)

Declarative import without running terraform import commands:

# imports.tf - Declare resources to import
import {
  to = aws_instance.legacy_server
  id = "i-0abc123def456789"
}

import {
  to = aws_vpc.existing
  id = "vpc-0123456789abcdef0"
}

import {
  to = aws_s3_bucket.logs
  id = "my-company-logs-bucket"
}

# Generate configuration from imports
# terraform plan -generate-config-out=generated.tf

# Generate HCL configuration for imported resources
terraform plan -generate-config-out=generated_imports.tf

# Review generated config, refine it, then apply
terraform apply

Cross-State References

# In the consuming module - reference network state outputs
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "myorg-terraform-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  subnet_id         = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
  vpc_security_group_ids = [
    data.terraform_remote_state.network.outputs.app_security_group_id
  ]
  instance_type = "t3.medium"
  ami           = data.aws_ami.ubuntu.id
}

# Alternative: Use data sources instead of remote_state for looser coupling
data "aws_vpc" "main" {
  tags = {
    Name        = "main-vpc"
    Environment = var.environment
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }
  tags = {
    Tier = "private"
  }
}

                            
                            Best Practice: Prefer data source lookups over terraform_remote_state when possible. Data sources create looser coupling between state files — the consuming module does not need to know the exact backend configuration of the producing module. Use tags or naming conventions as the coupling mechanism.
                        

Dynamic Blocks & Meta-Arguments

Dynamic blocks eliminate repetitive nested block definitions, while meta-arguments (for_each, count, depends_on, lifecycle) give you fine-grained control over resource creation and behavior.

Dynamic Blocks

# Without dynamic blocks - repetitive security group rules
resource "aws_security_group" "app" {
  name   = "app-sg"
  vpc_id = var.vpc_id

  # Dynamic block for ingress rules
  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      from_port   = ingress.value.from_port
      to_port     = ingress.value.to_port
      protocol    = ingress.value.protocol
      cidr_blocks = ingress.value.cidr_blocks
      description = ingress.value.description
    }
  }

  # Dynamic block for egress rules
  dynamic "egress" {
    for_each = var.egress_rules
    content {
      from_port   = egress.value.from_port
      to_port     = egress.value.to_port
      protocol    = egress.value.protocol
      cidr_blocks = egress.value.cidr_blocks
    }
  }
}

# Variable definition for the rules
variable "ingress_rules" {
  type = list(object({
    from_port   = number
    to_port     = number
    protocol    = string
    cidr_blocks = list(string)
    description = string
  }))
  default = [
    {
      from_port   = 443
      to_port     = 443
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
      description = "HTTPS from anywhere"
    },
    {
      from_port   = 80
      to_port     = 80
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
      description = "HTTP from anywhere"
    }
  ]
}

for_each with Maps and Sets

# for_each with a map - each instance gets meaningful key
variable "services" {
  type = map(object({
    container_port = number
    cpu            = number
    memory         = number
    desired_count  = number
  }))
  default = {
    api = {
      container_port = 8080
      cpu            = 512
      memory         = 1024
      desired_count  = 3
    }
    worker = {
      container_port = 9090
      cpu            = 1024
      memory         = 2048
      desired_count  = 2
    }
    scheduler = {
      container_port = 8081
      cpu            = 256
      memory         = 512
      desired_count  = 1
    }
  }
}

resource "aws_ecs_service" "services" {
  for_each = var.services

  name            = "${var.project}-${each.key}"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.services[each.key].arn
  desired_count   = each.value.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets         = var.private_subnet_ids
    security_groups = [aws_security_group.services[each.key].id]
  }
}

Count vs for_each Decision Guide

Criteria	count	for_each
Resource addressing	`resource[0]`, `resource[1]`	`resource["name"]`
Removing middle item	Shifts all indexes (destroys/recreates)	Only removes that key (safe)
Conditional creation	`count = var.enabled ? 1 : 0`	Possible but verbose
Readability	Good for identical copies	Good for distinct instances
Use when	Toggle on/off, N identical copies	Collection of distinct items

Lifecycle Rules

# lifecycle meta-argument examples
resource "aws_instance" "critical" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.large"

  lifecycle {
    # Prevent accidental destruction
    prevent_destroy = true

    # Ignore changes made outside Terraform (e.g., auto-scaling)
    ignore_changes = [
      tags["LastModified"],
      instance_type,  # Allow manual resizing without drift
    ]

    # Create replacement before destroying old (zero-downtime)
    create_before_destroy = true

    # Trigger replacement when launch template changes
    replace_triggered_by = [
      aws_launch_template.app.latest_version
    ]
  }
}

# Conditional resource creation
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  count = var.enable_monitoring ? 1 : 0

  alarm_name  = "${var.project}-high-cpu"
  namespace   = "AWS/EC2"
  metric_name = "CPUUtilization"
  threshold   = 80
  period      = 300

  alarm_actions = [var.sns_topic_arn]
}

Multi-Region Deployments

Multi-region infrastructure provides disaster recovery, reduced latency for global users, and compliance with data residency requirements. Terraform handles multi-region through provider aliases — multiple instances of the same provider targeting different regions.

Multi-Region Architecture

flowchart TD
    subgraph "Global Resources"
        R53[Route 53
DNS Failover]
        CF[CloudFront
CDN]
        IAM[IAM Roles
Global]
    end

    subgraph "US-East-1 (Primary)"
        VPC1[VPC]
        EKS1[EKS Cluster]
        RDS1[(RDS Primary)]
        S3_1[S3 Bucket]
    end

    subgraph "EU-West-1 (Secondary)"
        VPC2[VPC]
        EKS2[EKS Cluster]
        RDS2[(RDS Read Replica)]
        S3_2[S3 Bucket]
    end

    R53 --> VPC1
    R53 --> VPC2
    CF --> VPC1
    CF --> VPC2
    RDS1 -.->|Replication| RDS2
    S3_1 -.->|Cross-Region Replication| S3_2

    style R53 fill:#132440,color:#fff
    style CF fill:#132440,color:#fff
    style IAM fill:#132440,color:#fff
    style VPC1 fill:#3B9797,color:#fff
    style VPC2 fill:#16476A,color:#fff

Provider Aliases

# providers.tf - Multi-region provider configuration
provider "aws" {
  region = "us-east-1"
  alias  = "primary"

  default_tags {
    tags = {
      Region      = "us-east-1"
      Environment = var.environment
    }
  }
}

provider "aws" {
  region = "eu-west-1"
  alias  = "secondary"

  default_tags {
    tags = {
      Region      = "eu-west-1"
      Environment = var.environment
    }
  }
}

# Global provider (us-east-1 for global services like CloudFront, Route53)
provider "aws" {
  region = "us-east-1"
  alias  = "global"
}

# main.tf - Multi-region module instances
module "primary_region" {
  source = "./modules/regional-stack"
  providers = {
    aws = aws.primary
  }

  region           = "us-east-1"
  vpc_cidr         = "10.0.0.0/16"
  cluster_name     = "eks-primary"
  is_primary       = true
  db_instance_class = "db.r5.xlarge"
}

module "secondary_region" {
  source = "./modules/regional-stack"
  providers = {
    aws = aws.secondary
  }

  region                = "eu-west-1"
  vpc_cidr              = "10.1.0.0/16"
  cluster_name          = "eks-secondary"
  is_primary            = false
  db_instance_class     = "db.r5.large"
  primary_db_arn        = module.primary_region.db_arn
  enable_read_replica   = true
}

# Global DNS failover
resource "aws_route53_health_check" "primary" {
  provider = aws.global

  fqdn              = module.primary_region.alb_dns_name
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "app" {
  provider = aws.global

  zone_id = var.hosted_zone_id
  name    = "app.example.com"
  type    = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier  = "primary"
  health_check_id = aws_route53_health_check.primary.id

  alias {
    name                   = module.primary_region.alb_dns_name
    zone_id                = module.primary_region.alb_zone_id
    evaluate_target_health = true
  }
}

Global vs Regional Resources

Global (Deploy Once)	Regional (Deploy Per-Region)	Replicated (Sync Across Regions)
Route 53 Hosted Zones	VPCs & Subnets	S3 Buckets (CRR)
CloudFront Distributions	EKS/ECS Clusters	DynamoDB Global Tables
IAM Roles & Policies	RDS Instances	ECR (Cross-Region Replication)
AWS Organizations	Load Balancers	Secrets Manager (Replica)
WAF (Global scope)	Security Groups	KMS Multi-Region Keys

Custom & Community Providers

While HashiCorp and major cloud providers maintain official providers, you may need custom providers for internal APIs, legacy systems, or niche services. The Terraform Plugin Framework (replacing the older SDK) makes building custom providers more accessible.

# required_providers - Mixing official and community providers
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.25"
    }
    datadog = {
      source  = "DataDog/datadog"
      version = "~> 3.30"
    }
    grafana = {
      source  = "grafana/grafana"
      version = "~> 2.5"
    }
    # Community provider for PagerDuty
    pagerduty = {
      source  = "PagerDuty/pagerduty"
      version = "~> 3.0"
    }
  }
}

Provider Tier	Maintained By	Examples	Trust Level
Official	HashiCorp	aws, azurerm, google, kubernetes	Highest — rigorous testing, SLA-backed
Partner	Technology partners	datadog, pagerduty, cloudflare, grafana	High — vendor-maintained, reviewed by HashiCorp
Community	Individual maintainers	Various niche tools and services	Variable — review code, check maintenance
Custom (Internal)	Your organization	Internal APIs, legacy systems	You own it — full control and responsibility

                            
                            When to Build a Custom Provider: Consider a custom provider when you have an internal service with a REST API that multiple teams need to configure declaratively, when manual console clicks are causing drift, or when you need Terraform's plan/apply lifecycle for a system that has no existing provider. For one-off integrations, a null_resource with local-exec provisioner may suffice.
                        

Enterprise Patterns

Enterprise-scale Terraform requires governance, policy enforcement, cost control, and team collaboration tooling that goes beyond what open-source Terraform provides alone. Terraform Cloud and Enterprise add these capabilities as a managed platform.

Enterprise Terraform Workflow

flowchart TD
    Dev[Developer
Writes HCL] --> PR[Pull Request
Code Review]
    PR --> Speculative[Speculative Plan
PR Comment]
    PR --> Sentinel[Sentinel Policy Check
Compliance Gate]
    PR --> Cost[Cost Estimation
Budget Check]

    Sentinel -->|Pass| Approve[Manual Approval
Required for Prod]
    Cost -->|Under Budget| Approve
    Sentinel -->|Fail| Block[PR Blocked
Policy Violation]

    Approve --> Apply[Terraform Apply
State Lock + Audit]
    Apply --> Notify[Notifications
Slack + Email]
    Apply --> Drift[Drift Detection
Scheduled Scans]

    style Dev fill:#f8f9fa,stroke:#3B9797
    style Sentinel fill:#132440,color:#fff
    style Cost fill:#16476A,color:#fff
    style Block fill:#BF092F,color:#fff
    style Apply fill:#3B9797,color:#fff

Policy as Code with Sentinel

# sentinel/restrict-instance-types.sentinel
# Policy: Only allow approved EC2 instance types
import "tfplan/v2" as tfplan

allowed_instance_types = [
  "t3.micro", "t3.small", "t3.medium", "t3.large",
  "m5.large", "m5.xlarge", "m5.2xlarge",
  "r5.large", "r5.xlarge",
]

ec2_instances = filter tfplan.resource_changes as _, rc {
  rc.type is "aws_instance" and
  rc.mode is "managed" and
  (rc.change.actions contains "create" or rc.change.actions contains "update")
}

instance_type_allowed = rule {
  all ec2_instances as _, instance {
    instance.change.after.instance_type in allowed_instance_types
  }
}

main = rule {
  instance_type_allowed
}

# sentinel/enforce-tags.sentinel
# Policy: All resources must have required tags
import "tfplan/v2" as tfplan

required_tags = ["Environment", "Team", "CostCenter", "ManagedBy"]

taggable_resources = filter tfplan.resource_changes as _, rc {
  rc.change.after.tags is not null and
  (rc.change.actions contains "create" or rc.change.actions contains "update")
}

all_tags_present = rule {
  all taggable_resources as _, resource {
    all required_tags as tag {
      resource.change.after.tags contains tag
    }
  }
}

main = rule {
  all_tags_present
}

# sentinel/restrict-regions.sentinel
# Policy: Only allow resources in approved regions
import "tfplan/v2" as tfplan

approved_regions = ["us-east-1", "us-west-2", "eu-west-1"]

regional_resources = filter tfplan.resource_changes as _, rc {
  rc.change.after contains "region" and
  rc.change.after.region is not null
}

region_allowed = rule {
  all regional_resources as _, resource {
    resource.change.after.region in approved_regions
  }
}

main = rule {
  region_allowed
}

Terraform Cloud Workspace Configuration

# Using tfe provider to manage Terraform Cloud itself as code
provider "tfe" {
  organization = "myorg"
}

resource "tfe_workspace" "production" {
  name              = "infrastructure-prod"
  organization      = "myorg"
  terraform_version = "1.7.0"
  working_directory = "environments/prod"

  vcs_repo {
    identifier     = "myorg/infrastructure"
    branch         = "main"
    oauth_token_id = var.oauth_token_id
  }

  # Require manual approval for applies
  auto_apply = false

  # Enable drift detection
  assessments_enabled = true

  # Set execution mode
  execution_mode = "remote"

  # Tag for organization
  tag_names = ["production", "infrastructure", "us-east-1"]
}

resource "tfe_workspace" "staging" {
  name              = "infrastructure-staging"
  organization      = "myorg"
  terraform_version = "1.7.0"
  working_directory = "environments/staging"

  vcs_repo {
    identifier     = "myorg/infrastructure"
    branch         = "main"
    oauth_token_id = var.oauth_token_id
  }

  auto_apply = true  # Auto-apply for non-production
  tag_names  = ["staging", "infrastructure", "us-east-1"]
}

# Apply Sentinel policy set to production workspaces
resource "tfe_policy_set" "production_policies" {
  name         = "production-guardrails"
  organization = "myorg"

  vcs_repo {
    identifier     = "myorg/sentinel-policies"
    branch         = "main"
    oauth_token_id = var.oauth_token_id
  }

  workspace_ids = [
    tfe_workspace.production.id,
  ]
}

                            
                            Enterprise Anti-Pattern: Do not give every team their own copy of infrastructure code. Instead, create shared modules with clear interfaces that teams consume via the private registry. The platform team owns the modules; application teams own the composition — which modules to use and what values to pass.
                        

Performance & Troubleshooting

As Terraform configurations grow, plan and apply times can stretch from seconds to minutes. Understanding performance tuning and common failure modes is essential for productive workflows.

Parallelism Tuning

# Default parallelism is 10 concurrent operations
terraform apply -parallelism=20

# Reduce parallelism for API rate-limited providers
terraform apply -parallelism=5

# Targeted applies for faster iteration during development
terraform apply -target=module.networking
terraform apply -target=aws_instance.web_server

# Refresh-only to detect drift without making changes
terraform apply -refresh-only

Debug Logging

# Enable verbose logging
export TF_LOG=DEBUG
terraform plan 2> debug.log

# Log levels: TRACE, DEBUG, INFO, WARN, ERROR
export TF_LOG=TRACE

# Log only provider communication
export TF_LOG_PROVIDER=DEBUG

# Log only core Terraform operations
export TF_LOG_CORE=DEBUG

# Write logs to a specific file
export TF_LOG_PATH="./terraform.log"

# Disable logging
unset TF_LOG

Common Errors and Solutions

Error	Cause	Solution
`Error acquiring the state lock`	Previous run crashed without releasing DynamoDB lock	`terraform force-unlock LOCK_ID`
`Provider produced inconsistent result`	Provider bug or API eventual consistency	Run `terraform refresh` then `terraform plan` again
`Cycle detected in resource dependencies`	Circular reference between resources	Break cycle with explicit `depends_on` or restructure
`Error: Reference to undeclared resource`	Resource was removed but still referenced	Remove all references or re-add the resource
`Too many requests (429)`	API rate limiting from cloud provider	Reduce `-parallelism` or add retry logic
`state snapshot was created by a newer version`	State was last modified by newer Terraform	Upgrade Terraform to match or higher version

# Force-unlock a stuck state lock
terraform force-unlock 12345678-abcd-efgh-ijkl-123456789012

# Recover from corrupted state by pulling fresh copy
terraform state pull > backup.tfstate
# Edit backup.tfstate if needed, then push back:
terraform state push backup.tfstate

# Validate configuration syntax without accessing remote state
terraform validate

# Format all .tf files consistently
terraform fmt -recursive

Hands-On Exercises

Exercise 1 Workspaces

Convert a Single-Environment Setup to Workspaces

Take an existing Terraform configuration that deploys to a single environment and convert it to use workspaces for dev, staging, and production:

Create a locals block with an environment_config map keyed by workspace name
Replace all hardcoded values (instance types, counts, CIDR blocks) with workspace-driven lookups
Add workspace-based naming to all resource tags and Name attributes
Configure an S3 backend with workspace-prefixed state keys
Create all three workspaces and verify terraform plan produces correct output for each

# Exercise: Complete this workspace-based configuration
locals {
  env_config = {
    dev = {
      instance_type = "t3.micro"
      min_size      = 1
      max_size      = 2
      cidr          = "10.0.0.0/16"
    }
    staging = {
      # YOUR CONFIG HERE
    }
    prod = {
      # YOUR CONFIG HERE
    }
  }
  config = local.env_config[terraform.workspace]
}

# Add S3 backend with workspace key prefix
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "app/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    # Workspaces automatically prefix the key
  }
}

Workspaces Environments State

Exercise 2 Module Development

Build a Reusable Module with Tests

Create a reusable VPC module with proper interfaces, documentation, and Terraform test files:

Create module structure: main.tf, variables.tf, outputs.tf, versions.tf, README.md
Implement a VPC with configurable CIDR, public/private subnets across N availability zones
Add input validation using validation blocks on variables
Write 3+ test cases in .tftest.hcl files covering normal, edge, and error cases
Add a moved block to handle a refactoring scenario
Run terraform test and verify all tests pass

# Exercise: Complete the module test file
# tests/vpc_module.tftest.hcl

variables {
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b"]
  enable_nat        = true
}

run "validates_cidr_format" {
  command = plan

  variables {
    vpc_cidr = "invalid-cidr"
  }

  # This should fail validation
  expect_failures = [var.vpc_cidr]
}

run "creates_correct_subnet_count" {
  command = plan

  assert {
    condition     = # YOUR ASSERTION HERE
    error_message = "Should create 2 public and 2 private subnets"
  }
}

run "nat_gateway_conditional" {
  command = plan

  variables {
    enable_nat = false
  }

  assert {
    condition     = # YOUR ASSERTION HERE
    error_message = "NAT gateway should not be created when disabled"
  }
}

Modules Testing Validation

Exercise 3 Terragrunt

Set Up Terragrunt for Multi-Environment

Create a complete Terragrunt directory structure that manages VPC and EKS across dev, staging, and production:

Create the root terragrunt.hcl with remote state generation (S3 + DynamoDB)
Create _envcommon/vpc.hcl and _envcommon/eks.hcl with shared module configs
Create env.hcl files for dev, staging, and prod with environment-specific values
Wire up EKS's dependency block to reference VPC outputs
Run terragrunt run-all plan from the dev directory and verify dependency ordering

# Exercise: Complete the Terragrunt configuration
# infrastructure/_envcommon/eks.hcl

terraform {
  source = "git::https://github.com/myorg/terraform-modules.git//eks?ref=v2.0.0"
}

locals {
  env_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
  env      = local.env_vars.locals.environment
}

inputs = {
  cluster_name    = "eks-${local.env}"
  cluster_version = "1.29"
  # Add environment-specific node group configuration
  node_groups = {
    default = {
      instance_types = # YOUR CONFIG based on environment
      min_size       = # YOUR CONFIG
      max_size       = # YOUR CONFIG
    }
  }
}

Terragrunt DRY Multi-Environment

Exercise 4 Multi-Region

Implement Multi-Region Deployment

Design and implement a multi-region deployment with DNS failover between US and EU regions:

Define provider aliases for us-east-1 (primary) and eu-west-1 (secondary)
Create a regional module that deploys VPC + ALB + ECS service
Instantiate the module twice with different providers
Configure Route 53 health checks monitoring the primary ALB
Set up failover routing so traffic shifts to EU if US is unhealthy
Configure S3 cross-region replication for static assets

# Exercise: Complete the multi-region failover
resource "aws_route53_record" "app_primary" {
  provider = aws.global
  zone_id  = var.hosted_zone_id
  name     = "app.example.com"
  type     = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier  = "primary"
  health_check_id = # YOUR HEALTH CHECK REFERENCE

  alias {
    name                   = # PRIMARY ALB DNS
    zone_id                = # PRIMARY ALB ZONE ID
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "app_secondary" {
  provider = aws.global
  zone_id  = var.hosted_zone_id
  name     = "app.example.com"
  type     = "A"

  failover_routing_policy {
    type = # YOUR FAILOVER TYPE
  }

  set_identifier = "secondary"

  alias {
    name                   = # SECONDARY ALB DNS
    zone_id                = # SECONDARY ALB ZONE ID
    evaluate_target_health = true
  }
}

Multi-Region DNS Failover High Availability

Conclusion & Next Steps

Advanced Terraform patterns transform Infrastructure as Code from a simple provisioning tool into a scalable, governed, and collaborative engineering practice. The patterns covered in this article represent the collective wisdom of organizations managing thousands of resources across multiple environments and regions.

Key takeaways:

Workspaces for simple multi-env — Same code, different state files; best when environments are structurally identical
Module composition over monoliths — Small, tested, versioned modules that compose into larger architectures
Terragrunt for DRY at scale — Eliminate backend copy-paste, manage cross-module dependencies declaratively
State splitting reduces blast radius — Separate by team/service/environment for safer applies and team autonomy
Moved and import blocks for safe refactoring — Rename and adopt resources without manual state surgery
Dynamic blocks eliminate repetition — Use for_each over count for stable resource addressing
Provider aliases enable multi-region — Same module instantiated per region with proper DNS failover
Sentinel policies enforce governance — Policy as Code that blocks non-compliant changes at plan time

Next in the Series

In Part 16: Multi-Cloud Architecture, we explore designing portable, resilient infrastructure across AWS, Azure, and GCP — abstraction layers, cloud-agnostic patterns, multi-cloud networking, and when multi-cloud is genuinely beneficial versus unnecessary complexity.

Previous Part 14: Platform Engineering Next Part 16: Multi-Cloud Architecture

Cookie Consent