Back to Infrastructure & Cloud Automation Series

Part 20: Career & Capstone Project

May 14, 2026 Wasil Zafar 50 min read

Complete your infrastructure journey with career guidance, certification roadmaps, portfolio building, interview preparation, and a comprehensive capstone project deploying production-grade multi-cloud infrastructure.

Table of Contents

  1. Your Infrastructure Journey
  2. Career Paths
  3. Certification Roadmap
  4. Building Your Portfolio
  5. Interview Preparation
  6. Capstone Project Overview
  7. Phase 1: Networking
  8. Phase 2: Compute & Storage
  9. Phase 3: Application & CI/CD
  10. Phase 4: Observability & Security
  11. Cost & Cleanup
  12. Series Complete

Your Infrastructure Journey

Congratulations — you've reached the final chapter of a 20-part journey through infrastructure and cloud automation. From bare-metal hardware fundamentals to platform engineering at scale, you've built a comprehensive understanding of how modern infrastructure is designed, deployed, secured, and optimized. This final installment brings everything together: career guidance to translate your knowledge into professional success, and a capstone project that demonstrates mastery across the entire stack.

Over the course of this series, you've progressed from understanding physical servers, networking, and operating systems through configuration management, containerization, orchestration, infrastructure as code, CI/CD, security, and financial optimization. Each part built upon the last, creating a foundation that mirrors the real-world evolution from traditional IT operations to modern cloud-native platform engineering.

The Journey: This series covered 20 interconnected topics spanning hardware, networking, Linux, configuration management, containers, Kubernetes, cloud platforms, Terraform, CI/CD, security, monitoring, GitOps, service mesh, platform engineering, multi-cloud, serverless, disaster recovery, FinOps, and now — your career and a comprehensive capstone project.

Skills You've Built

Let's acknowledge the breadth and depth of knowledge you've accumulated:

Phase Parts Skills Acquired
Foundations 1–4 Hardware, networking, Linux administration, shell scripting
Automation 5–6 Configuration management (Ansible), containers (Docker)
Cloud & IaC 7–9 Cloud fundamentals, Terraform, CI/CD pipelines
Orchestration 10–11 Kubernetes, security hardening
Operations 12–14 Monitoring, GitOps, platform engineering
Advanced 15–20 Service mesh, multi-cloud, serverless, DR, FinOps, career
The 20-Part Infrastructure Journey
flowchart LR
    subgraph Foundations
        P1[1. Hardware]
        P2[2. Networking]
        P3[3. Linux]
        P4[4. Scripting]
    end
    subgraph Automation
        P5[5. Ansible]
        P6[6. Containers]
    end
    subgraph Cloud
        P7[7. Cloud Fundamentals]
        P8[8. Terraform]
        P9[9. CI/CD]
    end
    subgraph Orchestration
        P10[10. Kubernetes]
        P11[11. Security]
    end
    subgraph Operations
        P12[12. Monitoring]
        P13[13. GitOps]
        P14[14. Platform Eng]
    end
    subgraph Advanced
        P15[15. Service Mesh]
        P16[16. Multi-Cloud]
        P17[17. Serverless]
        P18[18. DR]
        P19[19. FinOps]
        P20[20. Career]
    end
    Foundations --> Automation --> Cloud --> Orchestration --> Operations --> Advanced
                            

Career Paths in Infrastructure

The infrastructure domain offers multiple career trajectories, each with distinct responsibilities, required skills, and growth potential. Understanding these paths helps you target your job search and professional development effectively.

Infrastructure / Cloud Engineer

Cloud Engineers design, implement, and maintain cloud infrastructure. They focus on resource provisioning, networking, storage, and compute services. This role emphasizes breadth across cloud services and strong IaC skills. Day-to-day work includes writing Terraform modules, configuring VPCs, managing IAM policies, and troubleshooting infrastructure issues.

DevOps Engineer

DevOps Engineers bridge development and operations, focusing on CI/CD pipelines, automation, and developer productivity. They build and maintain deployment pipelines, manage artifact repositories, implement testing automation, and ensure smooth software delivery from code commit to production. The role demands strong scripting skills and deep knowledge of CI/CD tooling.

Site Reliability Engineer (SRE)

SREs apply software engineering principles to infrastructure and operations problems. Originating at Google, SRE focuses on reliability through SLOs, error budgets, incident management, and automation that eliminates toil. SREs write code to solve operational problems and are expected to spend at least 50% of their time on engineering rather than operations.

Platform Engineer

Platform Engineers build internal developer platforms (IDPs) that abstract infrastructure complexity. They create golden paths, self-service portals, and standardized templates that enable development teams to deploy independently. This role combines infrastructure expertise with product thinking — your users are internal developers.

Cloud Architect / Solutions Architect

Cloud Architects design large-scale distributed systems, make technology selection decisions, define standards, and provide technical leadership. They bridge business requirements with technical implementation, often working with enterprise customers or leading architecture decisions across multiple teams.

Security Engineer / DevSecOps

Security Engineers focused on infrastructure protect cloud environments through policy enforcement, vulnerability management, compliance automation, and incident response. DevSecOps practitioners embed security into CI/CD pipelines and shift security left into the development lifecycle.

Role Primary Focus Key Skills Salary Range (US)
Cloud Engineer Infrastructure provisioning & maintenance Terraform, AWS/Azure/GCP, networking $110k–$170k
DevOps Engineer CI/CD & developer productivity Jenkins/GitHub Actions, Docker, scripting $120k–$180k
SRE Reliability & incident response SLOs, observability, Go/Python, Kubernetes $140k–$220k
Platform Engineer Internal developer platforms Backstage, Kubernetes, API design, UX $140k–$210k
Cloud Architect System design & technical strategy Multi-cloud, distributed systems, leadership $160k–$250k
DevSecOps Security automation & compliance SAST/DAST, OPA, network security, compliance $130k–$200k
Career Progression Tree
flowchart TD
    A[Junior Sysadmin / IT Support] --> B[Cloud Engineer]
    A --> C[DevOps Engineer]
    B --> D[Senior Cloud Engineer]
    C --> E[Senior DevOps Engineer]
    D --> F[Cloud Architect]
    D --> G[SRE]
    E --> G
    E --> H[Platform Engineer]
    G --> I[Staff SRE / Principal]
    H --> J[Staff Platform Engineer]
    F --> K[Principal Architect / VP Engineering]
    I --> K
    J --> K
    C --> L[DevSecOps Engineer]
    L --> M[Security Architect]
    M --> K
                            

Certification Roadmap

Certifications validate your knowledge and signal competence to employers. While they're not a substitute for hands-on experience, they open doors — particularly for career changers and early-career professionals. Here's a strategic roadmap organized by vendor and difficulty level.

AWS Certification Path

Level Certification Focus Prep Time
Foundational Cloud Practitioner (CLF-C02) Cloud concepts, billing, security basics 2–4 weeks
Associate Solutions Architect Associate (SAA-C03) Architecture design, services selection 4–8 weeks
Associate SysOps Administrator (SOA-C02) Operations, monitoring, troubleshooting 4–6 weeks
Professional DevOps Engineer Professional (DOP-C02) CI/CD, automation, SDLC 8–12 weeks
Specialty Advanced Networking / Security Deep-dive domains 6–10 weeks

Azure Certification Path

Level Certification Focus Prep Time
Foundational AZ-900: Azure Fundamentals Cloud concepts, Azure services overview 1–3 weeks
Associate AZ-104: Azure Administrator Resource management, networking, identity 4–8 weeks
Associate AZ-400: DevOps Engineer Expert CI/CD, IaC, security, compliance 6–10 weeks
Expert AZ-305: Solutions Architect Expert Architecture design, governance, identity 8–12 weeks

Vendor-Neutral & Kubernetes Certifications

Certification Vendor Focus Difficulty Cost
Terraform Associate (003) HashiCorp IaC fundamentals, HCL, state management Moderate $70
CKA (Certified Kubernetes Admin) CNCF Cluster admin, networking, troubleshooting Hard $395
CKAD (Certified Kubernetes App Dev) CNCF Application deployment, configuration Moderate $395
CKS (Certified Kubernetes Security) CNCF Cluster security, supply chain, runtime Very Hard $395
LFCS (Linux Foundation Certified Sysadmin) Linux Foundation Linux administration, networking, security Moderate $395
Recommended Certification Order by Career Stage
flowchart TD
    subgraph "Year 1: Foundations"
        A1[AWS Cloud Practitioner
or AZ-900] --> A2[Terraform Associate] A2 --> A3[AWS SAA or AZ-104] end subgraph "Year 2: Specialization" B1[CKA] --> B2[AWS DevOps Professional
or AZ-400] B2 --> B3[CKAD or CKS] end subgraph "Year 3+: Expert" C1[AWS/Azure Solutions Architect] --> C2[Specialty Certs] end A3 --> B1 B3 --> C1
Certification Strategy: Don't collect certifications without hands-on projects to back them up. A CKA with no Kubernetes project on your GitHub is a red flag to experienced interviewers. Pair every certification with a portfolio project that demonstrates the skills in practice.

Building Your Portfolio

Your GitHub profile is your infrastructure resume. Employers reviewing candidates for cloud/DevOps roles will look at your repositories before your LinkedIn. A well-structured portfolio demonstrates that you can not only build infrastructure but also document, organize, and communicate your work professionally.

GitHub Profile Best Practices

  • Profile README: Create a personal README.md with a brief intro, tech stack badges, and links to key projects
  • Pinned repositories: Pin your 4–6 best infrastructure projects
  • Consistent activity: Regular commits show ongoing learning
  • Clean commit history: Meaningful commit messages, not "fix" or "update"

Key Projects to Showcase

# Ideal GitHub repository structure for an infrastructure project
my-infra-project/
├── README.md                    # Architecture diagram, setup instructions, decisions
├── LICENSE
├── .github/
│   └── workflows/
│       ├── ci.yml               # Terraform validate + plan on PR
│       └── cd.yml               # Terraform apply on merge to main
├── docs/
│   ├── architecture.md          # Detailed architecture decisions
│   ├── runbook.md               # Operational procedures
│   └── cost-analysis.md         # Monthly cost breakdown
├── terraform/
│   ├── environments/
│   │   ├── dev/
│   │   │   ├── main.tf
│   │   │   ├── variables.tf
│   │   │   └── terraform.tfvars
│   │   ├── staging/
│   │   └── prod/
│   ├── modules/
│   │   ├── networking/
│   │   ├── compute/
│   │   ├── database/
│   │   └── monitoring/
│   └── global/
│       └── backend.tf
├── kubernetes/
│   ├── base/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── ingress.yaml
│   └── overlays/
│       ├── dev/
│       └── prod/
├── monitoring/
│   ├── prometheus/
│   ├── grafana/
│   └── alertmanager/
└── Makefile                      # Common commands documented

Writing About Your Projects

Document not just what you built but why you made specific decisions. A README that explains "I chose EKS over ECS because..." demonstrates architectural thinking. Include:

  • Architecture diagrams (Mermaid or draw.io)
  • Cost estimates (Infracost output)
  • Trade-off decisions and alternatives considered
  • What you would do differently in a production environment
  • Lessons learned during implementation
# Example README.md header for a portfolio project
cat <<'EOF' > README.md
# Production-Grade EKS Infrastructure

[![Terraform](https://img.shields.io/badge/Terraform-1.7+-purple)](https://terraform.io)
[![AWS](https://img.shields.io/badge/AWS-EKS%201.29-orange)](https://aws.amazon.com/eks/)
[![CI/CD](https://img.shields.io/badge/CI%2FCD-GitHub%20Actions-blue)](https://github.com/features/actions)

## Architecture

Multi-environment EKS cluster with:
- VPC with public/private subnets across 3 AZs
- Managed node groups with spot instances (70% cost saving)
- RDS PostgreSQL with Multi-AZ failover
- Prometheus + Grafana observability stack
- GitHub Actions CI/CD with Terraform plan/apply

## Quick Start

```bash
# Prerequisites: AWS CLI, Terraform, kubectl
make init ENV=dev
make plan ENV=dev
make apply ENV=dev
```

## Cost Estimate

| Environment | Monthly Cost | Spot Savings |
|-------------|-------------|--------------|
| Dev         | $145/month  | $89 saved    |
| Staging     | $312/month  | $198 saved   |
| Prod        | $1,247/month| $834 saved   |

## Architecture Decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Orchestration | EKS | Team Kubernetes expertise, portability |
| Node type | Spot + On-demand mix | Cost optimization with reliability |
| Database | RDS PostgreSQL | Managed service, Multi-AZ, automated backups |
| IaC | Terraform | Multi-cloud support, mature ecosystem |
EOF

Interview Preparation

Infrastructure interviews test both technical depth and operational judgment. You'll face a mix of system design questions, hands-on coding challenges, troubleshooting scenarios, and behavioral questions that assess how you handle incidents and collaborate with teams.

Common Interview Topics by Role

Topic Cloud Engineer DevOps SRE Platform Eng
Networking (VPC, DNS, LB) ★★★ ★★ ★★ ★★
Terraform / IaC ★★★ ★★★ ★★ ★★★
CI/CD Pipelines ★★ ★★★ ★★ ★★★
Kubernetes ★★ ★★★ ★★★ ★★★
Monitoring & SLOs ★★ ★★ ★★★ ★★
System Design ★★ ★★ ★★★ ★★★
Incident Management ★★ ★★★ ★★
Security / IAM ★★★ ★★ ★★ ★★

Sample Questions with Answer Frameworks

System Design Question: "Design a deployment pipeline for a microservices application that deploys to multiple environments (dev, staging, prod) with automated testing and rollback capability."

Answer Framework (STAR + Architecture):

# Framework: describe the pipeline stages and technology choices

# 1. Source Stage
# - GitHub repository with branch protection
# - PR triggers CI, merge to main triggers CD

# 2. Build Stage  
# - Docker multi-stage builds
# - Unit tests + code coverage
# - SAST scanning (Snyk/Trivy)
# - Container image push to ECR/ACR

# 3. Deploy to Dev (automatic)
# - Terraform plan + apply for infrastructure
# - Kubernetes rolling deployment
# - Smoke tests after deploy

# 4. Deploy to Staging (automatic after dev passes)
# - Integration tests
# - Performance tests (k6/Locust)
# - Security scanning

# 5. Deploy to Production (manual approval gate)
# - Canary deployment (10% → 50% → 100%)
# - Automated rollback if error rate > 1%
# - Post-deploy validation

# 6. Observability
# - Deployment markers in Grafana
# - SLO monitoring during rollout
# - Automated incident creation on failure
Terraform Coding Challenge: "Write a Terraform module that creates a VPC with public and private subnets across 3 availability zones, with a NAT gateway for private subnet internet access."
# Answer: Modular VPC with 3 AZs
# This is a common live-coding challenge in infrastructure interviews

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "environment" {
  description = "Environment name"
  type        = string
}

variable "availability_zones" {
  description = "List of AZs"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${var.availability_zones[count.index]}"
    Type = "public"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.environment}-private-${var.availability_zones[count.index]}"
    Type = "private"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.environment}-igw" }
}

resource "aws_eip" "nat" {
  domain = "vpc"
  tags   = { Name = "${var.environment}-nat-eip" }
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id
  tags          = { Name = "${var.environment}-nat" }
  depends_on    = [aws_internet_gateway.main]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = { Name = "${var.environment}-public-rt" }
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }

  tags = { Name = "${var.environment}-private-rt" }
}

resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

output "vpc_id" { value = aws_vpc.main.id }
output "public_subnet_ids" { value = aws_subnet.public[*].id }
output "private_subnet_ids" { value = aws_subnet.private[*].id }

Behavioral Questions for DevOps/SRE

  • "Tell me about an incident you managed." — Use the timeline format: detection → triage → mitigation → resolution → post-mortem
  • "How do you handle pushback from developers on security/process changes?" — Emphasize empathy, data-driven arguments, and incremental adoption
  • "Describe a time you automated a manual process." — Quantify: hours saved, error reduction, team impact
  • "What's your approach to on-call?" — Runbooks, escalation policies, blameless post-mortems, reducing alert fatigue

The Capstone Project — Overview

This capstone project ties together concepts from all 20 parts into a single, production-grade infrastructure deployment. You'll build a multi-environment web application platform using Terraform for provisioning, Kubernetes for orchestration, GitHub Actions for CI/CD, and Prometheus/Grafana for observability.

What You're Building: A production-grade, multi-environment (dev/staging/prod) web application infrastructure on AWS with managed Kubernetes (EKS), PostgreSQL database (RDS), object storage (S3), full observability stack, and automated CI/CD pipelines. This single project demonstrates mastery of infrastructure as code, container orchestration, deployment automation, monitoring, and security.

Technology Stack

Layer Technology Series Part
Infrastructure as Code Terraform 1.7+ Part 8
Cloud Provider AWS (VPC, EKS, RDS, S3) Part 7
Container Orchestration Kubernetes (EKS 1.29) Part 10
CI/CD GitHub Actions Part 9
Monitoring Prometheus + Grafana (Helm) Part 13
Security RBAC, Network Policies, OPA Part 11
Cost Management Infracost, spot instances Part 19
Capstone Architecture Overview
flowchart TB
    subgraph Internet
        U[Users]
        GH[GitHub]
    end
    subgraph AWS["AWS Cloud"]
        subgraph VPC["VPC (10.0.0.0/16)"]
            subgraph Public["Public Subnets"]
                ALB[Application Load Balancer]
                NAT[NAT Gateway]
            end
            subgraph Private["Private Subnets"]
                subgraph EKS["EKS Cluster"]
                    APP[App Pods]
                    MON[Monitoring Pods]
                end
                RDS[(RDS PostgreSQL)]
            end
        end
        S3[S3 Bucket]
        ECR[ECR Registry]
    end
    U --> ALB --> APP
    APP --> RDS
    APP --> S3
    GH --> ECR --> EKS
    MON --> APP
                            

Capstone Phase 1: Foundation & Networking

Phase 1 Terraform • VPC • Subnets • Security Groups

Foundation & Networking

Create the network foundation: VPC with public and private subnets across 3 availability zones, internet gateway, NAT gateway, security groups, and network ACLs. This is the base layer everything else builds upon.

VPC Subnets NAT Security Groups
# terraform/environments/dev/main.tf
# Capstone Phase 1: Foundation configuration

terraform {
  required_version = ">= 1.7.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }

  backend "s3" {
    bucket         = "capstone-terraform-state"
    key            = "dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = "capstone"
      Environment = var.environment
      ManagedBy   = "terraform"
      Owner       = "platform-team"
    }
  }
}

module "networking" {
  source = "../../modules/networking"

  environment        = var.environment
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  private_subnet_cidrs = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
  
  enable_nat_gateway = true
  single_nat_gateway = true  # Cost saving for dev; use false for prod
}
# terraform/modules/networking/main.tf
# Reusable networking module

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = { Name = "${var.environment}-vpc" }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name                     = "${var.environment}-public-${count.index + 1}"
    "kubernetes.io/role/elb" = "1"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name                              = "${var.environment}-private-${count.index + 1}"
    "kubernetes.io/role/internal-elb" = "1"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.environment}-igw" }
}

resource "aws_eip" "nat" {
  count  = var.enable_nat_gateway ? 1 : 0
  domain = "vpc"
  tags   = { Name = "${var.environment}-nat-eip" }
}

resource "aws_nat_gateway" "main" {
  count         = var.enable_nat_gateway ? 1 : 0
  allocation_id = aws_eip.nat[0].id
  subnet_id     = aws_subnet.public[0].id
  tags          = { Name = "${var.environment}-nat" }
  depends_on    = [aws_internet_gateway.main]
}

Security Groups

# terraform/modules/networking/security_groups.tf
# Security groups for the capstone project

resource "aws_security_group" "alb" {
  name_prefix = "${var.environment}-alb-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for Application Load Balancer"

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTP from internet"
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS from internet"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "${var.environment}-alb-sg" }
}

resource "aws_security_group" "eks_nodes" {
  name_prefix = "${var.environment}-eks-nodes-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for EKS worker nodes"

  ingress {
    from_port       = 0
    to_port         = 0
    protocol        = "-1"
    self            = true
    description     = "Node-to-node communication"
  }

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "HTTP from ALB"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "${var.environment}-eks-nodes-sg" }
}

resource "aws_security_group" "database" {
  name_prefix = "${var.environment}-db-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for RDS database"

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
    description     = "PostgreSQL from EKS nodes only"
  }

  tags = { Name = "${var.environment}-db-sg" }
}

Capstone Phase 2: Compute & Storage

Phase 2 EKS • Node Groups • RDS • S3

Compute & Storage

Deploy a managed Kubernetes cluster with mixed node groups (on-demand + spot), a managed PostgreSQL database with automated backups, and S3 storage with lifecycle policies for cost optimization.

EKS Spot Instances RDS S3
# terraform/modules/compute/eks.tf
# EKS cluster with managed node groups

resource "aws_eks_cluster" "main" {
  name     = "${var.environment}-cluster"
  role_arn = aws_iam_role.eks_cluster.arn
  version  = "1.29"

  vpc_config {
    subnet_ids              = var.private_subnet_ids
    endpoint_private_access = true
    endpoint_public_access  = var.environment == "dev" ? true : false
    security_group_ids      = [var.eks_security_group_id]
  }

  encryption_config {
    provider { key_arn = aws_kms_key.eks.arn }
    resources = ["secrets"]
  }

  enabled_cluster_log_types = [
    "api", "audit", "authenticator",
    "controllerManager", "scheduler"
  ]

  tags = { Name = "${var.environment}-eks" }
}

resource "aws_eks_node_group" "ondemand" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.environment}-ondemand"
  node_role_arn   = aws_iam_role.eks_node.arn
  subnet_ids      = var.private_subnet_ids
  capacity_type   = "ON_DEMAND"
  instance_types  = ["t3.medium"]

  scaling_config {
    desired_size = var.environment == "prod" ? 3 : 2
    min_size     = var.environment == "prod" ? 3 : 1
    max_size     = var.environment == "prod" ? 10 : 4
  }

  labels = {
    workload = "critical"
    type     = "ondemand"
  }

  tags = { Name = "${var.environment}-ondemand-nodes" }
}

resource "aws_eks_node_group" "spot" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.environment}-spot"
  node_role_arn   = aws_iam_role.eks_node.arn
  subnet_ids      = var.private_subnet_ids
  capacity_type   = "SPOT"
  instance_types  = ["t3.medium", "t3.large", "t3a.medium", "t3a.large"]

  scaling_config {
    desired_size = var.environment == "prod" ? 3 : 1
    min_size     = 0
    max_size     = var.environment == "prod" ? 8 : 3
  }

  labels = {
    workload = "flexible"
    type     = "spot"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "PREFER_NO_SCHEDULE"
  }

  tags = { Name = "${var.environment}-spot-nodes" }
}

Database & Object Storage

# terraform/modules/database/rds.tf
# Managed PostgreSQL with security best practices

resource "aws_db_subnet_group" "main" {
  name       = "${var.environment}-db-subnet-group"
  subnet_ids = var.private_subnet_ids
  tags       = { Name = "${var.environment}-db-subnet-group" }
}

resource "aws_db_instance" "postgres" {
  identifier     = "${var.environment}-postgres"
  engine         = "postgres"
  engine_version = "16.1"
  instance_class = var.environment == "prod" ? "db.r6g.large" : "db.t3.medium"
  
  allocated_storage     = 20
  max_allocated_storage = var.environment == "prod" ? 100 : 50
  storage_encrypted     = true
  kms_key_id            = aws_kms_key.rds.arn

  db_name  = "capstone"
  username = "capstone_admin"
  password = var.db_password  # From secrets manager

  multi_az               = var.environment == "prod" ? true : false
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [var.database_security_group_id]
  
  backup_retention_period = var.environment == "prod" ? 30 : 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "Mon:04:00-Mon:05:00"
  
  deletion_protection = var.environment == "prod" ? true : false
  skip_final_snapshot = var.environment != "prod"

  performance_insights_enabled = true

  tags = { Name = "${var.environment}-postgres" }
}
# terraform/modules/storage/s3.tf
# Object storage with lifecycle policies

resource "aws_s3_bucket" "app" {
  bucket = "${var.environment}-capstone-app-assets"
  tags   = { Name = "${var.environment}-app-assets" }
}

resource "aws_s3_bucket_versioning" "app" {
  bucket = aws_s3_bucket.app.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "app" {
  bucket = aws_s3_bucket.app.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "app" {
  bucket = aws_s3_bucket.app.id

  rule {
    id     = "transition-to-ia"
    status = "Enabled"

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 90
      storage_class = "GLACIER"
    }

    noncurrent_version_expiration {
      noncurrent_days = 60
    }
  }
}

resource "aws_s3_bucket_public_access_block" "app" {
  bucket                  = aws_s3_bucket.app.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Capstone Phase 3: Application & CI/CD

Phase 3 Kubernetes • GitHub Actions • Container Registry

Application Deployment & CI/CD

Deploy the application with Kubernetes manifests (Deployment, Service, Ingress, ConfigMap) and automate the entire workflow with GitHub Actions: lint, plan, apply, build, push, and deploy across environments.

Deployment Service Ingress GitHub Actions
# kubernetes/base/deployment.yaml
# Application deployment with best practices

apiVersion: apps/v1
kind: Deployment
metadata:
  name: capstone-app
  labels:
    app: capstone
    component: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: capstone
      component: api
  template:
    metadata:
      labels:
        app: capstone
        component: api
    spec:
      serviceAccountName: capstone-app
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
        - name: app
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/capstone:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: capstone-db-credentials
                  key: url
            - name: S3_BUCKET
              valueFrom:
                configMapKeyRef:
                  name: capstone-config
                  key: s3_bucket
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: capstone
# kubernetes/base/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: capstone-app
  labels:
    app: capstone
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    app: capstone
    component: api

---
# kubernetes/base/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: capstone-app
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/xxx
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/ssl-redirect: "443"
spec:
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: capstone-app
                port:
                  number: 80

GitHub Actions CI/CD Pipeline

# .github/workflows/deploy.yml
# Complete CI/CD pipeline for the capstone project

name: Deploy Infrastructure & Application

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  AWS_REGION: us-east-1
  EKS_CLUSTER: dev-cluster
  ECR_REGISTRY: 123456789.dkr.ecr.us-east-1.amazonaws.com
  ECR_REPOSITORY: capstone

jobs:
  # Job 1: Terraform Lint & Validate
  terraform-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Format Check
        run: terraform fmt -check -recursive terraform/

      - name: Terraform Init & Validate
        working-directory: terraform/environments/dev
        run: |
          terraform init -backend=false
          terraform validate

  # Job 2: Terraform Plan (on PR)
  terraform-plan:
    runs-on: ubuntu-latest
    needs: terraform-validate
    if: github.event_name == 'pull_request'
    permissions:
      id-token: write
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - uses: hashicorp/setup-terraform@v3

      - name: Terraform Plan
        working-directory: terraform/environments/dev
        run: |
          terraform init
          terraform plan -out=tfplan -no-color | tee plan-output.txt

      - name: Comment PR with Plan
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('terraform/environments/dev/plan-output.txt', 'utf8');
            const truncated = plan.length > 60000 ? plan.substring(0, 60000) + '\n...(truncated)' : plan;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
            });

  # Job 3: Build & Push Container
  build:
    runs-on: ubuntu-latest
    needs: terraform-validate
    if: github.event_name == 'push'
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - uses: aws-actions/amazon-ecr-login@v2

      - name: Build and Push
        id: meta
        run: |
          IMAGE_TAG="${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}"
          docker build -t $IMAGE_TAG .
          docker push $IMAGE_TAG
          echo "tags=$IMAGE_TAG" >> $GITHUB_OUTPUT

  # Job 4: Deploy to Dev
  deploy-dev:
    runs-on: ubuntu-latest
    needs: build
    if: github.event_name == 'push'
    environment: dev
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ${{ env.EKS_CLUSTER }}

      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/capstone-app \
            app=${{ needs.build.outputs.image_tag }} \
            -n capstone
          kubectl rollout status deployment/capstone-app \
            -n capstone --timeout=300s

      - name: Smoke Test
        run: |
          APP_URL=$(kubectl get ingress capstone-app -n capstone -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
          for i in {1..10}; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://$APP_URL/healthz")
            if [ "$STATUS" = "200" ]; then echo "Health check passed"; exit 0; fi
            sleep 5
          done
          echo "Health check failed"; exit 1

Capstone Phase 4: Observability & Security

Phase 4 Prometheus • Grafana • RBAC • Network Policies

Observability & Security

Deploy a full monitoring stack with Prometheus, Grafana, and Alertmanager via Helm. Implement Kubernetes RBAC, network policies, and secrets management to secure the entire platform.

Prometheus Grafana RBAC Network Policy
# monitoring/prometheus/values.yaml
# Helm values for kube-prometheus-stack

prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 1000m
        memory: 2Gi

grafana:
  enabled: true
  adminPassword: ""  # Set via secret
  persistence:
    enabled: true
    size: 10Gi
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: 'default'
          orgId: 1
          folder: 'Capstone'
          type: file
          disableDeletion: false
          editable: true
          options:
            path: /var/lib/grafana/dashboards/default

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 5Gi
  config:
    global:
      resolve_timeout: 5m
    route:
      receiver: 'slack-notifications'
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      routes:
        - receiver: 'pagerduty-critical'
          match:
            severity: critical
          repeat_interval: 1h
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - api_url: ''  # Set via secret
            channel: '#alerts-capstone'
            title: '{{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
      - name: 'pagerduty-critical'
        pagerduty_configs:
          - service_key: ''  # Set via secret
# monitoring/alerts/app-alerts.yaml
# Custom PrometheusRule for application alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: capstone-app-alerts
  namespace: monitoring
  labels:
    release: prometheus
spec:
  groups:
    - name: capstone.app
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{job="capstone-app",status=~"5.."}[5m]))
            /
            sum(rate(http_requests_total{job="capstone-app"}[5m])) > 0.01
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate on capstone-app"
            description: "Error rate is {{ $value | humanizePercentage }} (threshold: 1%)"

        - alert: HighLatency
          expr: |
            histogram_quantile(0.99, 
              sum(rate(http_request_duration_seconds_bucket{job="capstone-app"}[5m])) by (le)
            ) > 2
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "High p99 latency on capstone-app"
            description: "p99 latency is {{ $value }}s (threshold: 2s)"

        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total{namespace="capstone"}[15m]) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} is crash-looping"
            description: "Pod has restarted {{ $value | humanize }} times in the last 15 minutes"

RBAC & Network Policies

# kubernetes/security/rbac.yaml
# Namespace-scoped RBAC for the capstone application

apiVersion: v1
kind: Namespace
metadata:
  name: capstone
  labels:
    name: capstone
    pod-security.kubernetes.io/enforce: restricted

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: capstone-app
  namespace: capstone
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/capstone-app-role

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: capstone-app-role
  namespace: capstone
rules:
  - apiGroups: [""]
    resources: ["configmaps", "secrets"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: capstone-app-binding
  namespace: capstone
subjects:
  - kind: ServiceAccount
    name: capstone-app
    namespace: capstone
roleRef:
  kind: Role
  name: capstone-app-role
  apiGroup: rbac.authorization.k8s.io
# kubernetes/security/network-policy.yaml
# Restrict network traffic to least-privilege

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: capstone-app-policy
  namespace: capstone
spec:
  podSelector:
    matchLabels:
      app: capstone
      component: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
  egress:
    # Allow DNS
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53
    # Allow database access
    - to:
        - ipBlock:
            cidr: 10.0.11.0/24  # Private subnet CIDR
      ports:
        - protocol: TCP
          port: 5432
    # Allow S3 (via VPC endpoint)
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - protocol: TCP
          port: 443
# Deploy the monitoring stack and security policies
# Run these commands after EKS cluster is ready

# Add Helm repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values monitoring/prometheus/values.yaml \
  --set grafana.adminPassword="$(aws secretsmanager get-secret-value \
    --secret-id capstone/grafana-admin --query SecretString --output text)"

# Apply RBAC and Network Policies
kubectl apply -f kubernetes/security/rbac.yaml
kubectl apply -f kubernetes/security/network-policy.yaml

# Apply custom alert rules
kubectl apply -f monitoring/alerts/app-alerts.yaml

# Verify deployment
kubectl get pods -n monitoring
kubectl get prometheusrules -n monitoring

Capstone: Cost & Cleanup

Before deploying your capstone to a cloud provider, estimate costs with Infracost. After completing the project and documenting it for your portfolio, tear down expensive resources to avoid ongoing charges while keeping documentation and code intact.

# Estimate monthly costs before deploying
infracost breakdown --path terraform/environments/dev

# Expected output for dev environment:
# ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
# ┃ Resource                       ┃ Monthly Cost     ┃
# ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━┫
# ┃ aws_eks_cluster.main           ┃ $73.00          ┃
# ┃ aws_eks_node_group.ondemand    ┃ $61.32          ┃
# ┃ aws_eks_node_group.spot        ┃ ~$18.40         ┃
# ┃ aws_db_instance.postgres       ┃ $49.28          ┃
# ┃ aws_nat_gateway.main           ┃ $32.40          ┃
# ┃ aws_s3_bucket.app              ┃ ~$2.30          ┃
# ┃ Other (ALB, EBS, etc.)         ┃ ~$35.00         ┃
# ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━┫
# ┃ TOTAL                          ┃ ~$271/month     ┃
# ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━┛
# Cleanup: Destroy all resources when done
# IMPORTANT: Save screenshots and documentation first!

# Step 1: Delete Kubernetes resources first
kubectl delete namespace capstone
kubectl delete namespace monitoring

# Step 2: Destroy Terraform infrastructure
cd terraform/environments/dev
terraform destroy -auto-approve

# Step 3: Clean up S3 state bucket (manual - has versioning)
aws s3 rm s3://capstone-terraform-state --recursive
aws s3api delete-bucket --bucket capstone-terraform-state

# Step 4: Delete ECR images
aws ecr batch-delete-image \
  --repository-name capstone \
  --image-ids "$(aws ecr list-images --repository-name capstone \
    --query 'imageIds[*]' --output json)"

What to Keep for Your Portfolio

Keep Destroy
All Terraform code on GitHub Running EKS cluster ($73+/month)
Kubernetes manifests RDS database ($49+/month)
GitHub Actions workflows NAT Gateway ($32+/month)
Architecture diagrams Load Balancer
Screenshots of Grafana dashboards EC2 instances / node groups
Cost analysis documentation EBS volumes
README with architecture decisions S3 buckets with data
Cost Warning: The dev environment costs approximately $270/month. If you're building this for learning purposes, deploy it for 1–2 days to validate everything works, capture screenshots and documentation, then terraform destroy immediately. Set a billing alarm at $50 as a safety net. Never leave infrastructure running unattended.

Documenting Your Project for Employers

# Create a comprehensive project summary
cat <<'EOF' > docs/project-summary.md
# Capstone Project Summary

## What I Built
Production-grade multi-environment infrastructure on AWS demonstrating:
- Infrastructure as Code (Terraform modules, remote state, workspaces)
- Container orchestration (EKS with spot/on-demand mixed nodes)
- CI/CD automation (GitHub Actions with plan-on-PR, apply-on-merge)
- Full observability (Prometheus, Grafana, custom alerts, SLOs)
- Security hardening (RBAC, network policies, encryption, least privilege)
- Cost optimization (spot instances, lifecycle policies, Infracost)

## Key Decisions & Trade-offs
| Decision | Choice | Why |
|----------|--------|-----|
| Single vs Multi NAT | Single (dev) | $64/mo savings, acceptable for non-prod |
| Spot instances | Mixed fleet | 60% savings with graceful interruption handling |
| Database | RDS vs self-managed | Operational simplicity, automated backups/patches |
| Monitoring | Prometheus vs CloudWatch | Portability, community dashboards, cost |

## What I Would Change in Production
- Multi-NAT gateway for AZ resilience
- Dedicated VPN/Direct Connect for hybrid connectivity
- WAF in front of ALB
- Multi-region DR with Route 53 failover
- Dedicated monitoring account (cross-account metrics)

## Skills Demonstrated
Terraform, AWS (VPC, EKS, RDS, S3, IAM, KMS), Kubernetes,
GitHub Actions, Helm, Prometheus, Grafana, Network Security,
Cost Optimization, Documentation
EOF

Series Complete — Congratulations!

Series Complete! You have completed all 20 parts of the Infrastructure & Cloud Automation series. From bare-metal hardware to production-grade cloud platforms, you've built a comprehensive skill set that spans the full infrastructure lifecycle. You now possess the knowledge to design, deploy, secure, monitor, and optimize modern cloud infrastructure at scale. This is an achievement worth celebrating — the breadth and depth you've covered places you among the well-prepared infrastructure professionals entering the market today.

Let's reflect on what you've accomplished across this entire series:

  • Parts 1–4 (Foundations): You understand how physical servers work, how networks route traffic, how Linux operates, and how to automate with scripts
  • Parts 5–6 (Automation): You can configure fleets of servers with Ansible and containerize applications with Docker
  • Parts 7–9 (Cloud & IaC): You provision cloud infrastructure declaratively with Terraform and automate delivery with CI/CD
  • Parts 10–11 (Orchestration): You deploy and secure applications on Kubernetes with production-grade practices
  • Parts 12–14 (Operations): You build observability stacks, implement GitOps, and design developer platforms
  • Parts 15–20 (Advanced): You understand service meshes, multi-cloud, serverless, disaster recovery, cost optimization, and career development
Your Complete Infrastructure Skill Set
mindmap
  root((Infrastructure
Engineer)) Foundations Hardware Networking Linux Scripting Automation Ansible Docker CI/CD Cloud AWS / Azure / GCP Terraform Serverless Orchestration Kubernetes Service Mesh GitOps Operations Monitoring Incident Response FinOps Security IAM / RBAC Network Policies Compliance

The Landscape is Always Evolving

Infrastructure engineering is a field that never stands still. New tools emerge, paradigms shift, and best practices evolve. Stay current by:

  • Communities: CNCF Slack, DevOps subreddit, HashiCorp Discuss, Kubernetes Slack
  • Conferences: KubeCon, HashiConf, re:Invent, DevOpsDays, SREcon
  • Newsletters: TLDR DevOps, DevOps Weekly, KubeWeekly, Last Week in AWS
  • Podcasts: Ship It!, Kubernetes Podcast, Software Engineering Daily
  • Blogs: Kelsey Hightower, Charity Majors, Julia Evans, Corey Quinn

Final Thoughts

The infrastructure domain rewards curiosity, persistence, and a willingness to break things in order to understand them. Every outage you troubleshoot, every module you write, every pipeline you build adds to your expertise. The capstone project in this article is your launchpad — customize it, extend it, make it your own, and let it tell your story to future employers.

Thank you for joining this 20-part journey. Now go build something remarkable.