Beyond the Basics
You have written your first Terraform configurations, provisioned a VPC, launched an EC2 instance, and set up an S3 bucket. Your terraform apply works flawlessly for a single environment with 20 resources. Then reality hits: your organization needs the same infrastructure across development, staging, and production, in multiple regions, managed by multiple teams, with policy enforcement and cost controls.
This is the complexity cliff — the point where basic Terraform patterns break down under the weight of enterprise requirements. Managing 50+ resources across 3+ environments in multiple regions with team-level access controls demands a fundamentally different approach to Infrastructure as Code.
When Basic Terraform Isn't Enough
Signs you have outgrown basic Terraform patterns:
- Copy-paste between environments — Duplicating entire directory trees for dev/staging/prod
- Monolithic state files — A single state file with 500+ resources and 10-minute plan times
- Module spaghetti — Deeply nested modules with unclear ownership and undocumented interfaces
- Manual state surgery — Regularly running
terraform state mvbecause refactoring breaks addresses - Blast radius anxiety — Every
terraform applytouches resources owned by other teams - Drift everywhere — No mechanism to detect or prevent manual changes in the console
flowchart LR
A[Single File
1-20 Resources] --> B[Modules
20-100 Resources]
B --> C[Workspaces
Multi-Environment]
C --> D[State Splitting
Team Boundaries]
D --> E[Terragrunt
DRY at Scale]
E --> F[Enterprise
Policy + Governance]
style A fill:#f8f9fa,stroke:#3B9797
style B fill:#f8f9fa,stroke:#3B9797
style C fill:#f8f9fa,stroke:#16476A
style D fill:#f8f9fa,stroke:#16476A
style E fill:#f8f9fa,stroke:#132440
style F fill:#f8f9fa,stroke:#BF092F
This article equips you with the patterns, tools, and techniques to navigate each stage of this journey confidently.
Workspaces
Terraform workspaces provide isolated state files within the same configuration directory. Each workspace maintains its own terraform.tfstate file, enabling you to deploy the same infrastructure configuration to multiple environments without duplicating code.
# Create and manage workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
# List all workspaces
terraform workspace list
# Output:
# default
# dev
# * staging
# prod
# Switch workspace
terraform workspace select prod
# Delete a workspace (must not be current)
terraform workspace delete dev
Using terraform.workspace for Environment Logic
The terraform.workspace variable lets you branch configuration logic per environment:
# variables.tf - Environment-specific sizing
locals {
environment_config = {
dev = {
instance_type = "t3.small"
instance_count = 1
db_instance = "db.t3.micro"
multi_az = false
}
staging = {
instance_type = "t3.medium"
instance_count = 2
db_instance = "db.t3.small"
multi_az = false
}
prod = {
instance_type = "t3.large"
instance_count = 3
db_instance = "db.r5.large"
multi_az = true
}
}
config = local.environment_config[terraform.workspace]
}
# main.tf - Using workspace-driven configuration
resource "aws_instance" "app" {
count = local.config.instance_count
instance_type = local.config.instance_type
ami = data.aws_ami.ubuntu.id
tags = {
Name = "app-${terraform.workspace}-${count.index}"
Environment = terraform.workspace
}
}
resource "aws_db_instance" "main" {
instance_class = local.config.db_instance
multi_az = local.config.multi_az
identifier = "db-${terraform.workspace}"
allocated_storage = terraform.workspace == "prod" ? 100 : 20
}
When Workspaces Work (and When They Don't)
| Aspect | Workspaces | Directory-Based Environments |
|---|---|---|
| State isolation | Separate state files, same backend | Completely separate backends possible |
| Configuration drift | Environments always share same code | Environments can diverge (risky) |
| Access control | Difficult to restrict per-workspace | Separate repos/dirs = separate permissions |
| Best for | Identical infra, only sizing differs | Environments with structural differences |
| CI/CD complexity | Single pipeline with workspace variable | Separate pipelines per directory |
| Visibility | Easy to forget which workspace is active | File path makes environment obvious |
terraform destroy in production because you thought you were in dev is a career-defining moment. Always verify with terraform workspace show before destructive operations, and implement CI/CD that selects workspaces automatically.
Advanced Module Patterns
Modules are Terraform's unit of reuse, but poorly designed modules create more problems than they solve. Advanced module patterns focus on composition over inheritance, clear interfaces, and testability.
flowchart TD
Root[Root Module
environments/prod] --> Net[Network Module
v2.1.0]
Root --> Compute[Compute Module
v1.4.0]
Root --> Data[Database Module
v3.0.0]
Root --> Monitor[Monitoring Module
v1.2.0]
Net --> VPC[VPC Submodule]
Net --> SG[Security Groups]
Net --> LB[Load Balancer]
Compute --> ASG[Auto Scaling Group]
Compute --> Launch[Launch Template]
Data --> RDS[RDS Instance]
Data --> Redis[ElastiCache]
Monitor --> CW[CloudWatch]
Monitor --> Alert[Alerting Rules]
style Root fill:#132440,color:#fff
style Net fill:#3B9797,color:#fff
style Compute fill:#3B9797,color:#fff
style Data fill:#3B9797,color:#fff
style Monitor fill:#3B9797,color:#fff
Module Versioning and Version Constraints
# Using versioned modules from a private registry
module "vpc" {
source = "app.terraform.io/myorg/vpc/aws"
version = "~> 2.1" # >= 2.1.0, < 3.0.0
cidr_block = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
enable_nat_gateway = true
}
# Git source with tag constraint
module "monitoring" {
source = "git::https://github.com/myorg/terraform-aws-monitoring.git?ref=v1.2.0"
alarm_sns_topic = aws_sns_topic.alerts.arn
environment = var.environment
}
# Local module for project-specific logic
module "app_config" {
source = "../../modules/app-config"
app_name = var.app_name
environment = var.environment
secrets = var.app_secrets
}
Module Testing with Terraform Test Framework
Terraform 1.6+ introduced a native test framework using .tftest.hcl files:
# tests/vpc.tftest.hcl - Module integration test
variables {
cidr_block = "10.99.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
environment = "test"
}
run "creates_vpc_with_correct_cidr" {
command = plan
assert {
condition = aws_vpc.main.cidr_block == "10.99.0.0/16"
error_message = "VPC CIDR block does not match expected value"
}
}
run "creates_expected_subnet_count" {
command = plan
assert {
condition = length(aws_subnet.private) == 2
error_message = "Expected 2 private subnets, got ${length(aws_subnet.private)}"
}
assert {
condition = length(aws_subnet.public) == 2
error_message = "Expected 2 public subnets"
}
}
run "full_apply_and_verify" {
command = apply
assert {
condition = aws_vpc.main.enable_dns_hostnames == true
error_message = "DNS hostnames should be enabled"
}
}
# Run Terraform tests
terraform test
# Run with verbose output
terraform test -verbose
# Run specific test file
terraform test -filter=tests/vpc.tftest.hcl
Module Design Patterns
| Pattern | Purpose | Example | When to Use |
|---|---|---|---|
| Wrapper Module | Opinionated defaults over generic module | Company VPC module wrapping community VPC | Enforce org standards while using community modules |
| Composition Module | Orchestrates multiple smaller modules | "Web App" combining VPC + ALB + ECS + RDS | Common deployment patterns used by many teams |
| Utility Module | Computes values without creating resources | CIDR calculator, naming convention generator | Reusable logic needed by multiple modules |
| Service Module | Encapsulates one application's infrastructure | All resources for "payment-service" | Team-owned service with clear boundaries |
Terragrunt
Terragrunt is a thin wrapper around Terraform that provides extra tools for keeping configurations DRY (Don't Repeat Yourself), managing remote state, and orchestrating multi-module deployments. It shines when you have the same Terraform module deployed across many environments and regions.
# terragrunt.hcl (root) - Shared configuration for all environments
# This file lives at the repo root and is inherited by all child configs
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "myorg-terraform-state-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "${local.region}"
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = "${local.environment}"
Project = "${local.project}"
}
}
}
EOF
}
Multi-Environment Directory Structure
# Recommended Terragrunt directory structure
infrastructure/
├── terragrunt.hcl # Root config (remote state, provider generation)
├── _envcommon/ # Shared module configurations
│ ├── vpc.hcl
│ ├── eks.hcl
│ └── rds.hcl
├── dev/
│ ├── env.hcl # Environment-specific variables
│ ├── us-east-1/
│ │ ├── region.hcl # Region-specific variables
│ │ ├── vpc/
│ │ │ └── terragrunt.hcl # Includes _envcommon/vpc.hcl
│ │ ├── eks/
│ │ │ └── terragrunt.hcl
│ │ └── rds/
│ │ └── terragrunt.hcl
│ └── eu-west-1/
│ ├── region.hcl
│ └── vpc/
│ └── terragrunt.hcl
├── staging/
│ ├── env.hcl
│ └── us-east-1/
│ └── ...
└── prod/
├── env.hcl
├── us-east-1/
│ └── ...
└── eu-west-1/
└── ...
Dependencies and run_all
# dev/us-east-1/eks/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
include "envcommon" {
path = "${dirname(find_in_parent_folders())}/_envcommon/eks.hcl"
expose = true
}
locals {
env_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))
environment = local.env_vars.locals.environment
region = local.region_vars.locals.region
}
# Declare dependencies - EKS needs VPC to exist first
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock-12345"
private_subnets = ["subnet-mock-1", "subnet-mock-2"]
}
}
inputs = {
cluster_name = "eks-${local.environment}-${local.region}"
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnets
cluster_version = "1.29"
node_groups = {
default = {
instance_types = local.environment == "prod" ? ["m5.xlarge"] : ["t3.medium"]
min_size = local.environment == "prod" ? 3 : 1
max_size = local.environment == "prod" ? 10 : 3
desired_size = local.environment == "prod" ? 3 : 1
}
}
}
# Apply all modules in correct dependency order
cd infrastructure/dev/us-east-1
terragrunt run-all apply
# Plan all modules with dependency graph
terragrunt run-all plan
# Destroy in reverse dependency order
terragrunt run-all destroy
# Apply only specific module and its dependencies
cd infrastructure/dev/us-east-1/eks
terragrunt apply # Automatically applies vpc first if needed
| Feature | Raw Terraform | Terragrunt |
|---|---|---|
| Backend configuration | Copy-paste backend blocks in every module | Generated automatically from root config |
| Provider configuration | Repeated in every environment directory | Generated from templates with variables |
| Cross-module dependencies | Manual terraform_remote_state data sources | Declarative dependency blocks with mock outputs |
| Multi-module operations | Manual scripts or Makefiles | Built-in run-all with parallelism |
| Environment differences | tfvars files or workspaces | Hierarchical variable inheritance |
| Learning curve | Just HCL | HCL + Terragrunt-specific concepts |
State Management Advanced
State is Terraform's most critical and fragile component. Advanced state management focuses on reducing blast radius, enabling team autonomy, and facilitating safe refactoring.
flowchart TD
subgraph "Monolithic State (Before)"
Mono[Single State File
500+ Resources
All Teams]
end
subgraph "Split State (After)"
Net[Network State
VPC, Subnets, NAT
Platform Team]
Comp[Compute State
EKS, ASG, ALB
Platform Team]
App1[App A State
Services, DBs
Team Alpha]
App2[App B State
Services, Queues
Team Beta]
Shared[Shared State
IAM, DNS, KMS
Security Team]
end
Mono --> Net
Mono --> Comp
Mono --> App1
Mono --> App2
Mono --> Shared
Net -.->|remote_state| Comp
Net -.->|remote_state| App1
Shared -.->|remote_state| App1
Shared -.->|remote_state| App2
style Mono fill:#BF092F,color:#fff
style Net fill:#3B9797,color:#fff
style Comp fill:#3B9797,color:#fff
style App1 fill:#16476A,color:#fff
style App2 fill:#16476A,color:#fff
style Shared fill:#132440,color:#fff
State Operations
# Move a resource to a different address (refactoring)
terraform state mv aws_instance.app aws_instance.web_server
# Move a resource into a module
terraform state mv aws_s3_bucket.logs module.logging.aws_s3_bucket.logs
# Remove a resource from state (without destroying it)
terraform state rm aws_iam_role.legacy_role
# Import existing infrastructure into state
terraform import aws_instance.imported i-1234567890abcdef0
# List all resources in current state
terraform state list
# Show details of a specific resource
terraform state show aws_instance.web_server
Moved Blocks (Terraform 1.1+)
Moved blocks let you refactor resource addresses without manual state surgery:
# Rename a resource - Terraform handles the state move automatically
moved {
from = aws_instance.app
to = aws_instance.web_server
}
# Move a resource into a module
moved {
from = aws_security_group.app_sg
to = module.networking.aws_security_group.app
}
# Rename a module
moved {
from = module.old_name
to = module.new_name
}
# Move from count to for_each
moved {
from = aws_subnet.private[0]
to = aws_subnet.private["us-east-1a"]
}
moved {
from = aws_subnet.private[1]
to = aws_subnet.private["us-east-1b"]
}
Import Blocks (Terraform 1.5+)
Declarative import without running terraform import commands:
# imports.tf - Declare resources to import
import {
to = aws_instance.legacy_server
id = "i-0abc123def456789"
}
import {
to = aws_vpc.existing
id = "vpc-0123456789abcdef0"
}
import {
to = aws_s3_bucket.logs
id = "my-company-logs-bucket"
}
# Generate configuration from imports
# terraform plan -generate-config-out=generated.tf
# Generate HCL configuration for imported resources
terraform plan -generate-config-out=generated_imports.tf
# Review generated config, refine it, then apply
terraform apply
Cross-State References
# In the consuming module - reference network state outputs
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "network/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
vpc_security_group_ids = [
data.terraform_remote_state.network.outputs.app_security_group_id
]
instance_type = "t3.medium"
ami = data.aws_ami.ubuntu.id
}
# Alternative: Use data sources instead of remote_state for looser coupling
data "aws_vpc" "main" {
tags = {
Name = "main-vpc"
Environment = var.environment
}
}
data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [data.aws_vpc.main.id]
}
tags = {
Tier = "private"
}
}
terraform_remote_state when possible. Data sources create looser coupling between state files — the consuming module does not need to know the exact backend configuration of the producing module. Use tags or naming conventions as the coupling mechanism.
Dynamic Blocks & Meta-Arguments
Dynamic blocks eliminate repetitive nested block definitions, while meta-arguments (for_each, count, depends_on, lifecycle) give you fine-grained control over resource creation and behavior.
Dynamic Blocks
# Without dynamic blocks - repetitive security group rules
resource "aws_security_group" "app" {
name = "app-sg"
vpc_id = var.vpc_id
# Dynamic block for ingress rules
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.from_port
to_port = ingress.value.to_port
protocol = ingress.value.protocol
cidr_blocks = ingress.value.cidr_blocks
description = ingress.value.description
}
}
# Dynamic block for egress rules
dynamic "egress" {
for_each = var.egress_rules
content {
from_port = egress.value.from_port
to_port = egress.value.to_port
protocol = egress.value.protocol
cidr_blocks = egress.value.cidr_blocks
}
}
}
# Variable definition for the rules
variable "ingress_rules" {
type = list(object({
from_port = number
to_port = number
protocol = string
cidr_blocks = list(string)
description = string
}))
default = [
{
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTPS from anywhere"
},
{
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTP from anywhere"
}
]
}
for_each with Maps and Sets
# for_each with a map - each instance gets meaningful key
variable "services" {
type = map(object({
container_port = number
cpu = number
memory = number
desired_count = number
}))
default = {
api = {
container_port = 8080
cpu = 512
memory = 1024
desired_count = 3
}
worker = {
container_port = 9090
cpu = 1024
memory = 2048
desired_count = 2
}
scheduler = {
container_port = 8081
cpu = 256
memory = 512
desired_count = 1
}
}
}
resource "aws_ecs_service" "services" {
for_each = var.services
name = "${var.project}-${each.key}"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.services[each.key].arn
desired_count = each.value.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.services[each.key].id]
}
}
Count vs for_each Decision Guide
| Criteria | count | for_each |
|---|---|---|
| Resource addressing | resource[0], resource[1] |
resource["name"] |
| Removing middle item | Shifts all indexes (destroys/recreates) | Only removes that key (safe) |
| Conditional creation | count = var.enabled ? 1 : 0 |
Possible but verbose |
| Readability | Good for identical copies | Good for distinct instances |
| Use when | Toggle on/off, N identical copies | Collection of distinct items |
Lifecycle Rules
# lifecycle meta-argument examples
resource "aws_instance" "critical" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.large"
lifecycle {
# Prevent accidental destruction
prevent_destroy = true
# Ignore changes made outside Terraform (e.g., auto-scaling)
ignore_changes = [
tags["LastModified"],
instance_type, # Allow manual resizing without drift
]
# Create replacement before destroying old (zero-downtime)
create_before_destroy = true
# Trigger replacement when launch template changes
replace_triggered_by = [
aws_launch_template.app.latest_version
]
}
}
# Conditional resource creation
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
count = var.enable_monitoring ? 1 : 0
alarm_name = "${var.project}-high-cpu"
namespace = "AWS/EC2"
metric_name = "CPUUtilization"
threshold = 80
period = 300
alarm_actions = [var.sns_topic_arn]
}
Multi-Region Deployments
Multi-region infrastructure provides disaster recovery, reduced latency for global users, and compliance with data residency requirements. Terraform handles multi-region through provider aliases — multiple instances of the same provider targeting different regions.
flowchart TD
subgraph "Global Resources"
R53[Route 53
DNS Failover]
CF[CloudFront
CDN]
IAM[IAM Roles
Global]
end
subgraph "US-East-1 (Primary)"
VPC1[VPC]
EKS1[EKS Cluster]
RDS1[(RDS Primary)]
S3_1[S3 Bucket]
end
subgraph "EU-West-1 (Secondary)"
VPC2[VPC]
EKS2[EKS Cluster]
RDS2[(RDS Read Replica)]
S3_2[S3 Bucket]
end
R53 --> VPC1
R53 --> VPC2
CF --> VPC1
CF --> VPC2
RDS1 -.->|Replication| RDS2
S3_1 -.->|Cross-Region Replication| S3_2
style R53 fill:#132440,color:#fff
style CF fill:#132440,color:#fff
style IAM fill:#132440,color:#fff
style VPC1 fill:#3B9797,color:#fff
style VPC2 fill:#16476A,color:#fff
Provider Aliases
# providers.tf - Multi-region provider configuration
provider "aws" {
region = "us-east-1"
alias = "primary"
default_tags {
tags = {
Region = "us-east-1"
Environment = var.environment
}
}
}
provider "aws" {
region = "eu-west-1"
alias = "secondary"
default_tags {
tags = {
Region = "eu-west-1"
Environment = var.environment
}
}
}
# Global provider (us-east-1 for global services like CloudFront, Route53)
provider "aws" {
region = "us-east-1"
alias = "global"
}
# main.tf - Multi-region module instances
module "primary_region" {
source = "./modules/regional-stack"
providers = {
aws = aws.primary
}
region = "us-east-1"
vpc_cidr = "10.0.0.0/16"
cluster_name = "eks-primary"
is_primary = true
db_instance_class = "db.r5.xlarge"
}
module "secondary_region" {
source = "./modules/regional-stack"
providers = {
aws = aws.secondary
}
region = "eu-west-1"
vpc_cidr = "10.1.0.0/16"
cluster_name = "eks-secondary"
is_primary = false
db_instance_class = "db.r5.large"
primary_db_arn = module.primary_region.db_arn
enable_read_replica = true
}
# Global DNS failover
resource "aws_route53_health_check" "primary" {
provider = aws.global
fqdn = module.primary_region.alb_dns_name
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "app" {
provider = aws.global
zone_id = var.hosted_zone_id
name = "app.example.com"
type = "A"
failover_routing_policy {
type = "PRIMARY"
}
set_identifier = "primary"
health_check_id = aws_route53_health_check.primary.id
alias {
name = module.primary_region.alb_dns_name
zone_id = module.primary_region.alb_zone_id
evaluate_target_health = true
}
}
Global vs Regional Resources
| Global (Deploy Once) | Regional (Deploy Per-Region) | Replicated (Sync Across Regions) |
|---|---|---|
| Route 53 Hosted Zones | VPCs & Subnets | S3 Buckets (CRR) |
| CloudFront Distributions | EKS/ECS Clusters | DynamoDB Global Tables |
| IAM Roles & Policies | RDS Instances | ECR (Cross-Region Replication) |
| AWS Organizations | Load Balancers | Secrets Manager (Replica) |
| WAF (Global scope) | Security Groups | KMS Multi-Region Keys |
Custom & Community Providers
While HashiCorp and major cloud providers maintain official providers, you may need custom providers for internal APIs, legacy systems, or niche services. The Terraform Plugin Framework (replacing the older SDK) makes building custom providers more accessible.
# required_providers - Mixing official and community providers
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.25"
}
datadog = {
source = "DataDog/datadog"
version = "~> 3.30"
}
grafana = {
source = "grafana/grafana"
version = "~> 2.5"
}
# Community provider for PagerDuty
pagerduty = {
source = "PagerDuty/pagerduty"
version = "~> 3.0"
}
}
}
| Provider Tier | Maintained By | Examples | Trust Level |
|---|---|---|---|
| Official | HashiCorp | aws, azurerm, google, kubernetes | Highest — rigorous testing, SLA-backed |
| Partner | Technology partners | datadog, pagerduty, cloudflare, grafana | High — vendor-maintained, reviewed by HashiCorp |
| Community | Individual maintainers | Various niche tools and services | Variable — review code, check maintenance |
| Custom (Internal) | Your organization | Internal APIs, legacy systems | You own it — full control and responsibility |
null_resource with local-exec provisioner may suffice.
Enterprise Patterns
Enterprise-scale Terraform requires governance, policy enforcement, cost control, and team collaboration tooling that goes beyond what open-source Terraform provides alone. Terraform Cloud and Enterprise add these capabilities as a managed platform.
flowchart TD
Dev[Developer
Writes HCL] --> PR[Pull Request
Code Review]
PR --> Speculative[Speculative Plan
PR Comment]
PR --> Sentinel[Sentinel Policy Check
Compliance Gate]
PR --> Cost[Cost Estimation
Budget Check]
Sentinel -->|Pass| Approve[Manual Approval
Required for Prod]
Cost -->|Under Budget| Approve
Sentinel -->|Fail| Block[PR Blocked
Policy Violation]
Approve --> Apply[Terraform Apply
State Lock + Audit]
Apply --> Notify[Notifications
Slack + Email]
Apply --> Drift[Drift Detection
Scheduled Scans]
style Dev fill:#f8f9fa,stroke:#3B9797
style Sentinel fill:#132440,color:#fff
style Cost fill:#16476A,color:#fff
style Block fill:#BF092F,color:#fff
style Apply fill:#3B9797,color:#fff
Policy as Code with Sentinel
# sentinel/restrict-instance-types.sentinel
# Policy: Only allow approved EC2 instance types
import "tfplan/v2" as tfplan
allowed_instance_types = [
"t3.micro", "t3.small", "t3.medium", "t3.large",
"m5.large", "m5.xlarge", "m5.2xlarge",
"r5.large", "r5.xlarge",
]
ec2_instances = filter tfplan.resource_changes as _, rc {
rc.type is "aws_instance" and
rc.mode is "managed" and
(rc.change.actions contains "create" or rc.change.actions contains "update")
}
instance_type_allowed = rule {
all ec2_instances as _, instance {
instance.change.after.instance_type in allowed_instance_types
}
}
main = rule {
instance_type_allowed
}
# sentinel/enforce-tags.sentinel
# Policy: All resources must have required tags
import "tfplan/v2" as tfplan
required_tags = ["Environment", "Team", "CostCenter", "ManagedBy"]
taggable_resources = filter tfplan.resource_changes as _, rc {
rc.change.after.tags is not null and
(rc.change.actions contains "create" or rc.change.actions contains "update")
}
all_tags_present = rule {
all taggable_resources as _, resource {
all required_tags as tag {
resource.change.after.tags contains tag
}
}
}
main = rule {
all_tags_present
}
# sentinel/restrict-regions.sentinel
# Policy: Only allow resources in approved regions
import "tfplan/v2" as tfplan
approved_regions = ["us-east-1", "us-west-2", "eu-west-1"]
regional_resources = filter tfplan.resource_changes as _, rc {
rc.change.after contains "region" and
rc.change.after.region is not null
}
region_allowed = rule {
all regional_resources as _, resource {
resource.change.after.region in approved_regions
}
}
main = rule {
region_allowed
}
Terraform Cloud Workspace Configuration
# Using tfe provider to manage Terraform Cloud itself as code
provider "tfe" {
organization = "myorg"
}
resource "tfe_workspace" "production" {
name = "infrastructure-prod"
organization = "myorg"
terraform_version = "1.7.0"
working_directory = "environments/prod"
vcs_repo {
identifier = "myorg/infrastructure"
branch = "main"
oauth_token_id = var.oauth_token_id
}
# Require manual approval for applies
auto_apply = false
# Enable drift detection
assessments_enabled = true
# Set execution mode
execution_mode = "remote"
# Tag for organization
tag_names = ["production", "infrastructure", "us-east-1"]
}
resource "tfe_workspace" "staging" {
name = "infrastructure-staging"
organization = "myorg"
terraform_version = "1.7.0"
working_directory = "environments/staging"
vcs_repo {
identifier = "myorg/infrastructure"
branch = "main"
oauth_token_id = var.oauth_token_id
}
auto_apply = true # Auto-apply for non-production
tag_names = ["staging", "infrastructure", "us-east-1"]
}
# Apply Sentinel policy set to production workspaces
resource "tfe_policy_set" "production_policies" {
name = "production-guardrails"
organization = "myorg"
vcs_repo {
identifier = "myorg/sentinel-policies"
branch = "main"
oauth_token_id = var.oauth_token_id
}
workspace_ids = [
tfe_workspace.production.id,
]
}
Performance & Troubleshooting
As Terraform configurations grow, plan and apply times can stretch from seconds to minutes. Understanding performance tuning and common failure modes is essential for productive workflows.
Parallelism Tuning
# Default parallelism is 10 concurrent operations
terraform apply -parallelism=20
# Reduce parallelism for API rate-limited providers
terraform apply -parallelism=5
# Targeted applies for faster iteration during development
terraform apply -target=module.networking
terraform apply -target=aws_instance.web_server
# Refresh-only to detect drift without making changes
terraform apply -refresh-only
Debug Logging
# Enable verbose logging
export TF_LOG=DEBUG
terraform plan 2> debug.log
# Log levels: TRACE, DEBUG, INFO, WARN, ERROR
export TF_LOG=TRACE
# Log only provider communication
export TF_LOG_PROVIDER=DEBUG
# Log only core Terraform operations
export TF_LOG_CORE=DEBUG
# Write logs to a specific file
export TF_LOG_PATH="./terraform.log"
# Disable logging
unset TF_LOG
Common Errors and Solutions
| Error | Cause | Solution |
|---|---|---|
Error acquiring the state lock |
Previous run crashed without releasing DynamoDB lock | terraform force-unlock LOCK_ID |
Provider produced inconsistent result |
Provider bug or API eventual consistency | Run terraform refresh then terraform plan again |
Cycle detected in resource dependencies |
Circular reference between resources | Break cycle with explicit depends_on or restructure |
Error: Reference to undeclared resource |
Resource was removed but still referenced | Remove all references or re-add the resource |
Too many requests (429) |
API rate limiting from cloud provider | Reduce -parallelism or add retry logic |
state snapshot was created by a newer version |
State was last modified by newer Terraform | Upgrade Terraform to match or higher version |
# Force-unlock a stuck state lock
terraform force-unlock 12345678-abcd-efgh-ijkl-123456789012
# Recover from corrupted state by pulling fresh copy
terraform state pull > backup.tfstate
# Edit backup.tfstate if needed, then push back:
terraform state push backup.tfstate
# Validate configuration syntax without accessing remote state
terraform validate
# Format all .tf files consistently
terraform fmt -recursive
Hands-On Exercises
Convert a Single-Environment Setup to Workspaces
Take an existing Terraform configuration that deploys to a single environment and convert it to use workspaces for dev, staging, and production:
- Create a
localsblock with anenvironment_configmap keyed by workspace name - Replace all hardcoded values (instance types, counts, CIDR blocks) with workspace-driven lookups
- Add workspace-based naming to all resource
tagsandNameattributes - Configure an S3 backend with workspace-prefixed state keys
- Create all three workspaces and verify
terraform planproduces correct output for each
# Exercise: Complete this workspace-based configuration
locals {
env_config = {
dev = {
instance_type = "t3.micro"
min_size = 1
max_size = 2
cidr = "10.0.0.0/16"
}
staging = {
# YOUR CONFIG HERE
}
prod = {
# YOUR CONFIG HERE
}
}
config = local.env_config[terraform.workspace]
}
# Add S3 backend with workspace key prefix
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "app/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
# Workspaces automatically prefix the key
}
}
Build a Reusable Module with Tests
Create a reusable VPC module with proper interfaces, documentation, and Terraform test files:
- Create module structure:
main.tf,variables.tf,outputs.tf,versions.tf,README.md - Implement a VPC with configurable CIDR, public/private subnets across N availability zones
- Add input validation using
validationblocks on variables - Write 3+ test cases in
.tftest.hclfiles covering normal, edge, and error cases - Add a
movedblock to handle a refactoring scenario - Run
terraform testand verify all tests pass
# Exercise: Complete the module test file
# tests/vpc_module.tftest.hcl
variables {
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
enable_nat = true
}
run "validates_cidr_format" {
command = plan
variables {
vpc_cidr = "invalid-cidr"
}
# This should fail validation
expect_failures = [var.vpc_cidr]
}
run "creates_correct_subnet_count" {
command = plan
assert {
condition = # YOUR ASSERTION HERE
error_message = "Should create 2 public and 2 private subnets"
}
}
run "nat_gateway_conditional" {
command = plan
variables {
enable_nat = false
}
assert {
condition = # YOUR ASSERTION HERE
error_message = "NAT gateway should not be created when disabled"
}
}
Set Up Terragrunt for Multi-Environment
Create a complete Terragrunt directory structure that manages VPC and EKS across dev, staging, and production:
- Create the root
terragrunt.hclwith remote state generation (S3 + DynamoDB) - Create
_envcommon/vpc.hcland_envcommon/eks.hclwith shared module configs - Create
env.hclfiles for dev, staging, and prod with environment-specific values - Wire up EKS's
dependencyblock to reference VPC outputs - Run
terragrunt run-all planfrom the dev directory and verify dependency ordering
# Exercise: Complete the Terragrunt configuration
# infrastructure/_envcommon/eks.hcl
terraform {
source = "git::https://github.com/myorg/terraform-modules.git//eks?ref=v2.0.0"
}
locals {
env_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
env = local.env_vars.locals.environment
}
inputs = {
cluster_name = "eks-${local.env}"
cluster_version = "1.29"
# Add environment-specific node group configuration
node_groups = {
default = {
instance_types = # YOUR CONFIG based on environment
min_size = # YOUR CONFIG
max_size = # YOUR CONFIG
}
}
}
Implement Multi-Region Deployment
Design and implement a multi-region deployment with DNS failover between US and EU regions:
- Define provider aliases for
us-east-1(primary) andeu-west-1(secondary) - Create a regional module that deploys VPC + ALB + ECS service
- Instantiate the module twice with different providers
- Configure Route 53 health checks monitoring the primary ALB
- Set up failover routing so traffic shifts to EU if US is unhealthy
- Configure S3 cross-region replication for static assets
# Exercise: Complete the multi-region failover
resource "aws_route53_record" "app_primary" {
provider = aws.global
zone_id = var.hosted_zone_id
name = "app.example.com"
type = "A"
failover_routing_policy {
type = "PRIMARY"
}
set_identifier = "primary"
health_check_id = # YOUR HEALTH CHECK REFERENCE
alias {
name = # PRIMARY ALB DNS
zone_id = # PRIMARY ALB ZONE ID
evaluate_target_health = true
}
}
resource "aws_route53_record" "app_secondary" {
provider = aws.global
zone_id = var.hosted_zone_id
name = "app.example.com"
type = "A"
failover_routing_policy {
type = # YOUR FAILOVER TYPE
}
set_identifier = "secondary"
alias {
name = # SECONDARY ALB DNS
zone_id = # SECONDARY ALB ZONE ID
evaluate_target_health = true
}
}
Conclusion & Next Steps
Advanced Terraform patterns transform Infrastructure as Code from a simple provisioning tool into a scalable, governed, and collaborative engineering practice. The patterns covered in this article represent the collective wisdom of organizations managing thousands of resources across multiple environments and regions.
Key takeaways:
- Workspaces for simple multi-env — Same code, different state files; best when environments are structurally identical
- Module composition over monoliths — Small, tested, versioned modules that compose into larger architectures
- Terragrunt for DRY at scale — Eliminate backend copy-paste, manage cross-module dependencies declaratively
- State splitting reduces blast radius — Separate by team/service/environment for safer applies and team autonomy
- Moved and import blocks for safe refactoring — Rename and adopt resources without manual state surgery
- Dynamic blocks eliminate repetition — Use
for_eachovercountfor stable resource addressing - Provider aliases enable multi-region — Same module instantiated per region with proper DNS failover
- Sentinel policies enforce governance — Policy as Code that blocks non-compliant changes at plan time
Next in the Series
In Part 16: Multi-Cloud Architecture, we explore designing portable, resilient infrastructure across AWS, Azure, and GCP — abstraction layers, cloud-agnostic patterns, multi-cloud networking, and when multi-cloud is genuinely beneficial versus unnecessary complexity.