Back to Infrastructure & Cloud Automation Series

Part 7: Cloud Computing Fundamentals

May 14, 2026 Wasil Zafar 50 min read

A comprehensive guide to cloud computing — from service models and deployment strategies to the shared responsibility model, cloud economics, and multi-cloud architectures. Understand how AWS, Azure, and GCP organize their services and when to use each.

Table of Contents

  1. Introduction
  2. Service Models
  3. Deployment Models
  4. Shared Responsibility Model
  5. Cloud Economics
  6. The Big Three Providers
  7. Cloud Architecture Patterns
  8. Getting Started with Cloud
  9. Hands-On Exercises
  10. Conclusion & Next Steps

Introduction

Cloud computing has fundamentally transformed how organizations build, deploy, and operate technology. Yet despite being the backbone of modern infrastructure, it remains widely misunderstood. "The cloud is just someone else's computer" is a popular quip — but it's dangerously oversimplified. Cloud computing isn't just about where your servers live; it's about an entirely different operational model for consuming technology.

In this part, we'll build a complete understanding of cloud computing from the ground up: what it truly means, how service models differ, how the major providers organize their offerings, and how to make sound architectural and economic decisions in the cloud era.

Key Insight: Cloud computing is not a location — it's a delivery model. It represents a fundamental shift from owning and operating infrastructure (CapEx) to consuming it as a metered service (OpEx). The real value isn't "someone else's data center" — it's on-demand elasticity, global reach, and the ability to experiment at near-zero marginal cost.

The 5 NIST Essential Characteristics

The National Institute of Standards and Technology (NIST) defines cloud computing through five essential characteristics that any true cloud service must exhibit. If a service lacks any of these, it's not really cloud — it's just hosted infrastructure:

The 5 NIST Essential Characteristics of Cloud Computing
mindmap
  root((Cloud Computing))
    On-Demand Self-Service
      Provision resources without human interaction
      API-driven automation
      Instant availability
    Broad Network Access
      Available over standard networks
      Accessible from any device
      Platform-independent
    Resource Pooling
      Multi-tenant model
      Location independence
      Dynamic resource assignment
    Rapid Elasticity
      Scale up and down automatically
      Appear unlimited to consumer
      Pay only for what you use
    Measured Service
      Usage is metered
      Pay-per-use billing
      Transparent monitoring
                            
Characteristic What It Means Real-World Example
On-Demand Self-Service Provision resources (servers, storage, networks) without needing to contact a human Spin up 100 VMs via API at 2 AM on a Saturday
Broad Network Access Services available over standard networks, accessible from any device or platform Access your cloud console from a phone, laptop, or tablet
Resource Pooling Provider's resources are pooled across multiple tenants with dynamic assignment Your VM shares physical hardware with other customers (isolated)
Rapid Elasticity Resources can be scaled up or down automatically, appearing unlimited Auto-scale from 2 to 200 instances during Black Friday traffic
Measured Service Usage is metered, reported, and billed transparently (pay-per-use) Billed $0.023 per GB-month of storage actually consumed
Definition: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. — NIST SP 800-145

Service Models

Cloud service models define the boundary of responsibility between you (the customer) and the cloud provider. Think of it as a spectrum: at one end you manage everything; at the other, the provider manages everything. The four primary models are IaaS, PaaS, SaaS, and FaaS.

Cloud Service Model Stack — What You Manage vs Provider Manages
graph TB
    subgraph "On-Premises (You Manage Everything)"
        A1[Application]
        A2[Data]
        A3[Runtime]
        A4[Middleware]
        A5[Operating System]
        A6[Virtualization]
        A7[Servers]
        A8[Storage]
        A9[Networking]
    end

    subgraph "IaaS (You Manage OS and Up)"
        B1[Application]
        B2[Data]
        B3[Runtime]
        B4[Middleware]
        B5[Operating System]
        B6[Virtualization — Provider]
        B7[Servers — Provider]
        B8[Storage — Provider]
        B9[Networking — Provider]
    end

    subgraph "PaaS (You Manage Code and Data)"
        C1[Application]
        C2[Data]
        C3[Runtime — Provider]
        C4[Middleware — Provider]
        C5[OS — Provider]
        C6[Virtualization — Provider]
        C7[Servers — Provider]
        C8[Storage — Provider]
        C9[Networking — Provider]
    end

    subgraph "SaaS (Provider Manages Everything)"
        D1[Application — Provider]
        D2[Data — Provider manages infra]
        D3[Runtime — Provider]
        D4[Middleware — Provider]
        D5[OS — Provider]
        D6[Virtualization — Provider]
        D7[Servers — Provider]
        D8[Storage — Provider]
        D9[Networking — Provider]
    end
                            

IaaS — Infrastructure as a Service

IaaS provides the fundamental building blocks of cloud IT: virtual machines, storage volumes, and networks. You rent raw infrastructure and manage everything from the operating system up. It's the most flexible model but also the most operationally demanding.

You manage: OS, middleware, runtime, application, data, patching, security configuration

Provider manages: Physical hardware, hypervisor, networking fabric, physical security, power/cooling

Examples: AWS EC2, Azure Virtual Machines, GCP Compute Engine, DigitalOcean Droplets

Best for: Lift-and-shift migrations, custom OS requirements, full control over the stack, legacy applications

# Launch an IaaS VM on AWS
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.medium \
  --key-name my-key \
  --subnet-id subnet-0123456789abcdef0 \
  --security-group-ids sg-0123456789abcdef0 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-server-01}]'

# Launch an IaaS VM on Azure
az vm create \
  --resource-group my-rg \
  --name web-server-01 \
  --image Ubuntu2204 \
  --size Standard_B2s \
  --admin-username azureuser \
  --generate-ssh-keys

# Launch an IaaS VM on GCP
gcloud compute instances create web-server-01 \
  --zone=us-central1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud

PaaS — Platform as a Service

PaaS abstracts away the operating system and runtime environment. You deploy your code and data; the platform handles everything else — OS patching, load balancing, scaling, and runtime updates. This dramatically reduces operational overhead but limits customization.

You manage: Application code, data, application-level configuration

Provider manages: Runtime, middleware, OS, hardware, networking, scaling, patching

Examples: AWS Elastic Beanstalk, Azure App Service, Google App Engine, Heroku, Railway

Best for: Web applications, APIs, microservices, developer productivity, rapid prototyping

# Deploy to Azure App Service (PaaS)
az webapp up \
  --resource-group my-rg \
  --name my-web-app \
  --runtime "PYTHON:3.11" \
  --sku B1

# Deploy to Google App Engine (PaaS)
# First create app.yaml in your project root:
# runtime: python311
# instance_class: F2
# automatic_scaling:
#   min_instances: 1
#   max_instances: 10
gcloud app deploy app.yaml --project=my-project

# Deploy to AWS Elastic Beanstalk (PaaS)
eb init my-app --platform python-3.11 --region us-east-1
eb create production --instance_type t3.small

SaaS — Software as a Service

SaaS is fully managed software delivered over the internet. You don't manage any part of the technology stack — you simply use the application. The provider handles everything: the application code, data infrastructure, scaling, updates, and security.

You manage: User configuration, access control (who can use it), your data within the application

Provider manages: Literally everything else — code, infrastructure, updates, availability

Examples: Microsoft 365, Salesforce, Slack, Zoom, GitHub, Datadog, Snowflake

Best for: End-user productivity, business applications, collaboration, when you want to consume not build

FaaS — Function as a Service

FaaS (often called "serverless compute") takes abstraction further than PaaS. You write individual functions that execute in response to events. There are no servers to provision, no runtime to configure — you pay only for the milliseconds your code actually runs.

You manage: Function code, event triggers, function-level configuration

Provider manages: Execution environment, scaling (including to zero), infrastructure, container lifecycle

Examples: AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers

Best for: Event-driven architectures, webhooks, data processing pipelines, scheduled tasks, APIs with variable traffic

# Deploy an AWS Lambda function
aws lambda create-function \
  --function-name process-order \
  --runtime python3.11 \
  --handler lambda_function.lambda_handler \
  --role arn:aws:iam::123456789012:role/lambda-exec-role \
  --zip-file fileb://function.zip \
  --timeout 30 \
  --memory-size 256

# Deploy an Azure Function
func azure functionapp publish my-function-app

# Deploy a Google Cloud Function
gcloud functions deploy process-order \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --region us-central1 \
  --memory 256MB

Service Model Comparison

Layer On-Premises IaaS PaaS FaaS SaaS
Application You You You You Provider
Data You You You You Shared
Runtime You You Provider Provider Provider
Middleware You You Provider Provider Provider
Operating System You You Provider Provider Provider
Virtualization You Provider Provider Provider Provider
Servers You Provider Provider Provider Provider
Storage You Provider Provider Provider Provider
Networking You Provider Provider Provider Provider
Common Mistake: Many teams default to IaaS because it's "most familiar" (it looks like traditional VMs). But IaaS carries the highest operational burden. If you're deploying a web application and don't need custom kernel modules or OS-level tweaks, PaaS or FaaS will likely be more cost-effective and secure. Always start at the highest abstraction level that meets your requirements, and drop down only when needed.

Deployment Models

Deployment models describe where and how cloud infrastructure is hosted, owned, and shared. The choice of deployment model is driven by requirements around data sovereignty, compliance, latency, and cost.

Public Cloud

Infrastructure owned and operated by a third-party provider (AWS, Azure, GCP), delivered over the public internet. Resources are shared across multiple organizations (multi-tenant), though isolated at the hypervisor level. This is the most common deployment model.

Advantages: No upfront CapEx, global scale, broad service catalog, instant provisioning, elasticity

Disadvantages: Data leaves your premises, shared infrastructure (perception issue), vendor lock-in risk, egress costs

Private Cloud

Infrastructure dedicated to a single organization, either hosted on-premises or by a third party. Provides cloud-like self-service and elasticity but within a controlled environment. Often required for strict regulatory compliance (healthcare, government, finance).

Advantages: Full control, data sovereignty, compliance, customizable security, predictable performance

Disadvantages: High CapEx, limited scale, slow provisioning vs public cloud, requires specialized staff

Examples: VMware vSphere + vRealize, OpenStack, Azure Stack HCI, AWS Outposts

Hybrid Cloud

A combination of public and private cloud environments connected by networking, allowing data and applications to move between them. The key requirement is orchestration — the two environments must work together as a single architecture, not just coexist.

Advantages: Flexibility, keep sensitive workloads private while bursting to public, gradual migration path

Disadvantages: Complexity (networking, identity, security), skills gap, potential latency between environments

Use Cases: Cloud bursting (handle peaks in public cloud), data residency (keep EU data on-premises), gradual migration

Multi-Cloud

Using services from multiple public cloud providers simultaneously. This is distinct from hybrid (which is public + private). Multi-cloud might mean running compute on AWS, databases on GCP, and AI/ML on Azure — or distributing the same workload across providers for resilience.

Advantages: Avoid vendor lock-in, best-of-breed services, geographic reach, negotiating leverage, redundancy

Disadvantages: Massive operational complexity, skill dilution, inconsistent APIs, networking challenges, cost visibility

Factor Public Cloud Private Cloud Hybrid Cloud Multi-Cloud
CapEx Required None Very High High None
Scalability Near-infinite Limited High (burst to public) Near-infinite
Data Control Limited (provider region) Full Split Distributed
Complexity Low-Medium High Very High Extreme
Vendor Lock-in Risk High Low (if open-source) Medium Low
Best For Startups, SaaS, variable workloads Regulated industries, sensitive data Enterprises migrating, compliance Large enterprises, best-of-breed
Pragmatic Advice: Multi-cloud is frequently adopted for the wrong reasons. "Avoiding vendor lock-in" sounds compelling but the operational cost of maintaining expertise across 2-3 clouds often exceeds the switching cost you're trying to avoid. Adopt multi-cloud when you have a genuine technical reason (best-of-breed services, geographic requirements) — not as insurance.

The Shared Responsibility Model

The shared responsibility model is the single most important concept in cloud security. It defines a clear boundary: the cloud provider secures the infrastructure of the cloud, while you secure everything in the cloud. Misunderstanding this boundary is the #1 cause of cloud security breaches.

Shared Responsibility — Provider vs Customer
graph TB
    subgraph "Customer Responsibility (Security IN the Cloud)"
        C1[Customer Data]
        C2[Platform & Application Management]
        C3[Identity & Access Management]
        C4[Operating System & Network Configuration]
        C5[Client-Side Encryption]
        C6[Network Traffic Protection]
    end

    subgraph "Provider Responsibility (Security OF the Cloud)"
        P1[Physical Security — Data Centers]
        P2[Hardware — Servers, Storage, Networking]
        P3[Hypervisor & Host OS]
        P4[Global Network Infrastructure]
        P5[Managed Service Infrastructure]
        P6[Compliance Certifications]
    end

    C6 --> P1
                            

What the Cloud Provider Is Responsible For

  • Physical security: Guards, biometrics, surveillance, locked cages
  • Hardware: Server procurement, maintenance, disposal, firmware updates
  • Hypervisor/host OS: Patching and securing the virtualization layer
  • Network infrastructure: Backbone connectivity, DDoS protection at the network edge
  • Compliance: Achieving and maintaining SOC 2, ISO 27001, PCI DSS certifications for their infrastructure

What the Customer Is Responsible For

  • Data classification and encryption: Encrypting sensitive data at rest and in transit
  • Identity and access management: MFA, least privilege, role-based access control
  • Network security: Security groups, NACLs, WAF rules, VPN configuration
  • Application security: Code vulnerabilities, patching your dependencies
  • OS patching (IaaS): Keeping guest OS updated and hardened
  • Compliance: Ensuring your usage of the cloud meets YOUR regulatory requirements

How Responsibility Shifts by Service Model

Responsibility IaaS PaaS SaaS
Data Classification & Encryption Customer Customer Customer
Identity & Access Management Customer Customer Customer
Application Security Customer Customer Provider
Network Controls Customer Shared Provider
OS Patching Customer Provider Provider
Runtime & Middleware Customer Provider Provider
Physical Infrastructure Provider Provider Provider
Security Warning: Data encryption, identity management, and access control are ALWAYS the customer's responsibility, regardless of service model. An exposed S3 bucket isn't AWS's fault — it's yours. A database without MFA isn't Azure's problem — it's yours. The shared responsibility model means the provider cannot secure what they cannot see (your data, your users, your configurations).

Cloud Economics

Understanding cloud economics is critical for making sound infrastructure decisions. The shift from on-premises to cloud isn't simply "servers become subscription fees" — it fundamentally changes how organizations think about technology investment.

CapEx vs OpEx

Aspect CapEx (On-Premises) OpEx (Cloud)
Cost Type Large upfront investment Pay-as-you-go, monthly billing
Accounting Depreciated over 3-5 years Expensed in current period
Capacity Planning Must predict 3-5 years ahead Adjust monthly or hourly
Risk Over-provision or under-provision Right-size continuously
Time to Deploy Weeks to months (procurement) Minutes (API call)
Staffing Need hardware engineers, facility staff Cloud architects, DevOps engineers
Hidden Costs Power, cooling, floor space, insurance Egress, cross-region transfer, API calls

Cloud Pricing Models

Cloud providers offer multiple pricing tiers designed to reward commitment with discounts:

Pricing Model Discount Commitment Best For Risk
On-Demand 0% (baseline) None Variable workloads, testing, short-term None — pay only for what you use
Reserved (1yr) ~30-40% 1-year term Steady-state production workloads Committed even if unused
Reserved (3yr) ~50-72% 3-year term Long-running databases, core services Significant lock-in
Spot/Preemptible ~60-90% None (can be reclaimed) Batch processing, CI/CD, fault-tolerant Instances terminated with 2-min notice
Savings Plans ~30-60% $/hr spend commitment Flexible workloads across instance types Must spend minimum per hour

Total Cost of Ownership (TCO)

A fair cloud vs on-premises comparison must account for all costs, not just the sticker price of a server. TCO includes:

  • Hardware costs: Servers, storage, networking equipment, spare parts
  • Facility costs: Data center space, power, cooling, physical security, fire suppression
  • Personnel costs: Hardware engineers, network engineers, facility managers, 24/7 NOC
  • Software licenses: Hypervisor licenses, OS licenses, management tools
  • Lifecycle costs: Hardware refresh every 3-5 years, decommissioning, e-waste
  • Opportunity cost: Money tied up in depreciating assets vs invested elsewhere
When Cloud is MORE Expensive: Cloud isn't always cheaper. Steady-state workloads running 24/7 at consistent utilization are often cheaper on-premises or with reserved instances. The "cloud is always cheaper" myth fails for: large-scale databases with predictable load, GPU clusters running continuously, high-egress workloads (streaming video), and workloads that don't benefit from elasticity. Always run the TCO numbers.

Cost Optimization Strategies

Strategy Typical Savings Implementation Effort Description
Right-Sizing 20-40% Low Match instance size to actual usage (most VMs are over-provisioned)
Reserved Capacity 30-72% Low Commit to 1-3 year terms for steady workloads
Spot Instances 60-90% Medium Use interruptible capacity for fault-tolerant workloads
Auto-Scaling 20-50% Medium Scale down during off-peak, scale up during peak
Scheduled Shutdowns 40-70% Low Turn off dev/test environments nights and weekends
Storage Tiering 50-80% Low Move cold data to cheaper tiers (Glacier, Cool, Archive)
Architecture Optimization 30-60% High Move from IaaS to PaaS/serverless where appropriate

The Big Three: AWS, Azure, GCP

The public cloud market is dominated by three hyperscale providers that together control approximately 67% of global cloud spending. Each has distinct strengths, histories, and philosophical approaches to cloud services.

Amazon Web Services (AWS)

Founded: 2006 (first mover) | Market Share: ~31% | Regions: 34+ | Services: 200+

AWS was first to market and has the broadest and deepest service catalog. Its philosophy is "build primitives and let customers compose them." This gives maximum flexibility but can feel overwhelming — AWS often has 3-5 ways to accomplish the same task.

Strengths: Broadest service catalog, largest ecosystem, most mature managed services, strongest serverless platform (Lambda), deepest marketplace

Considerations: Complex pricing, naming conventions can be confusing (SQS, SNS, SES, etc.), console UX is functional but dense

Microsoft Azure

Founded: 2010 | Market Share: ~25% | Regions: 60+ | Services: 200+

Azure's strength is enterprise integration. If your organization runs Microsoft 365, Active Directory, SQL Server, or .NET, Azure provides the tightest integration. Its hybrid story (Azure Arc, Azure Stack) is the strongest in the industry.

Strengths: Enterprise/Microsoft ecosystem integration, strongest hybrid cloud (Arc, Stack HCI), Azure AD/Entra ID for identity, compliance certifications for government, excellent developer experience with VS Code + GitHub

Considerations: Service naming changes frequently, documentation quality varies, some services less mature than AWS equivalents

Google Cloud Platform (GCP)

Founded: 2008 (public 2011) | Market Share: ~11% | Regions: 40+ | Services: 150+

GCP is built on Google's internal infrastructure (Borg → Kubernetes, Spanner, BigQuery). Its strengths are data analytics, machine learning, global networking, and developer experience. GCP's philosophy favors opinionated, well-designed services over breadth.

Strengths: Superior data/analytics (BigQuery), best Kubernetes experience (GKE), global network (private backbone), strong AI/ML (Vertex AI, TPUs), clean API design

Considerations: Smaller service catalog, enterprise features maturing, perception of product deprecation risk

Service Mapping Across Providers

Category AWS Azure GCP
Virtual Machines EC2 Virtual Machines Compute Engine
Serverless Compute Lambda Functions Cloud Functions
Containers (Managed K8s) EKS AKS GKE
Container Service ECS / Fargate Container Apps Cloud Run
Object Storage S3 Blob Storage Cloud Storage
Block Storage EBS Managed Disks Persistent Disk
File Storage EFS Azure Files Filestore
Relational Database RDS / Aurora SQL Database / Cosmos DB (SQL) Cloud SQL / AlloyDB
NoSQL Database DynamoDB Cosmos DB Firestore / Bigtable
Data Warehouse Redshift Synapse Analytics BigQuery
VPC / Networking VPC Virtual Network (VNet) VPC
Load Balancer ALB / NLB / ELB Load Balancer / App Gateway Cloud Load Balancing
DNS Route 53 Azure DNS Cloud DNS
CDN CloudFront Azure CDN / Front Door Cloud CDN
IAM IAM Entra ID (Azure AD) + RBAC Cloud IAM
Monitoring CloudWatch Monitor / App Insights Cloud Monitoring
IaC Service CloudFormation ARM / Bicep Deployment Manager / Config Connector
Message Queue SQS Service Bus / Queue Storage Pub/Sub
AI/ML Platform SageMaker Azure AI / ML Studio Vertex AI
How to Choose: Don't choose a cloud provider based on feature comparison tables alone. Consider: (1) What your team already knows, (2) Your existing ecosystem (Microsoft shop → Azure, startup → AWS, data-heavy → GCP), (3) Compliance and region requirements, (4) Specific services you need (BigQuery has no true equivalent), (5) Support and partnership tier.

Cloud Architecture Patterns

Regions and Availability Zones

Cloud providers organize their infrastructure into Regions (geographic areas like us-east-1, westeurope, asia-east1) and Availability Zones (AZs — isolated data centers within a region connected by low-latency links). Understanding this hierarchy is fundamental to designing resilient architectures.

Cloud Infrastructure Hierarchy — Region → AZ → Data Center
graph TB
    subgraph "AWS Region: us-east-1 (N. Virginia)"
        subgraph "AZ: us-east-1a"
            DC1[Data Center 1]
            DC2[Data Center 2]
        end
        subgraph "AZ: us-east-1b"
            DC3[Data Center 3]
            DC4[Data Center 4]
        end
        subgraph "AZ: us-east-1c"
            DC5[Data Center 5]
            DC6[Data Center 6]
        end
    end

    DC1 ---|"< 2ms latency"| DC3
    DC3 ---|"< 2ms latency"| DC5
    DC1 ---|"< 2ms latency"| DC5
                            
Concept Description Failure Domain Example
Region Geographic area with 2-6 AZs Natural disaster, country-level outage us-east-1, eu-west-1, asia-southeast1
Availability Zone 1+ data centers with independent power/cooling/networking Single facility failure (fire, flood, power) us-east-1a, us-east-1b
Edge Location CDN point of presence for content caching Local connectivity CloudFront PoP in Chicago
Local Zone Extension of a region closer to users Local infrastructure us-east-1-chi-1 (Chicago)

High Availability Patterns

High Availability (HA) means designing systems that continue operating even when individual components fail. In cloud, this is primarily achieved by distributing resources across multiple AZs or regions.

Multi-AZ High Availability Architecture
graph TB
    Users[Users / Internet] --> LB[Load Balancer — Multi-AZ]

    subgraph "Availability Zone A"
        LB --> WebA[Web Server A]
        WebA --> AppA[App Server A]
        AppA --> DB_Primary[Database Primary]
    end

    subgraph "Availability Zone B"
        LB --> WebB[Web Server B]
        WebB --> AppB[App Server B]
        AppB --> DB_Standby[Database Standby — Sync Replication]
    end

    DB_Primary ---|"Synchronous Replication"| DB_Standby
                            

Key HA principles:

  • Eliminate single points of failure: Every component should have a redundant pair
  • Use managed services: Managed databases (RDS Multi-AZ) handle failover automatically
  • Design for failure: Assume any component can fail at any time
  • Test failover regularly: Chaos engineering (Netflix Chaos Monkey approach)

Disaster Recovery Strategies

Disaster Recovery (DR) protects against region-level failures. The four standard DR strategies trade cost against recovery speed:

Strategy RTO RPO Cost Description
Backup & Restore Hours Hours $ Regular backups to another region; restore from backup on failure
Pilot Light 10-30 min Minutes $$ Core services running (DB replication); scale up on failure
Warm Standby Minutes Seconds $$$ Scaled-down copy of production running; scale up on failure
Active-Active (Multi-Region) ~0 (automatic) ~0 $$$$ Full production in multiple regions; traffic routes around failures

RTO = Recovery Time Objective (how long until you're back online)
RPO = Recovery Point Objective (how much data you can afford to lose)

The Well-Architected Framework

All three major providers publish Well-Architected Frameworks that provide guidance across six pillars. While the details differ, the pillars are largely consistent:

Pillar Focus Key Questions
Operational Excellence Run and monitor systems, improve processes How do you respond to unplanned events? How do you evolve?
Security Protect information, systems, and assets How do you manage identities? How do you detect threats?
Reliability Recover from failures, meet demand How do you handle component failures? How do you test recovery?
Performance Efficiency Use resources efficiently as demand changes How do you select the right instance type? How do you monitor?
Cost Optimization Avoid unnecessary costs How do you govern usage? How do you decommission unused resources?
Sustainability Minimize environmental impact How do you select efficient regions? How do you right-size?

Getting Started with Cloud

Account Setup and Security

Before deploying your first resource, secure your cloud account. The majority of cloud security breaches trace back to misconfigured accounts, not sophisticated attacks.

Day-1 Security Checklist: (1) Enable MFA on root/owner account immediately, (2) Create a separate admin user — never use root for daily work, (3) Set up billing alerts at $10, $50, $100 thresholds, (4) Enable CloudTrail/Activity Log/Audit Logs from day one, (5) Configure a budget and spend cap if available.
# AWS — Initial account security setup
# 1. Create an IAM admin user (don't use root)
aws iam create-user --user-name admin-user
aws iam attach-user-policy \
  --user-name admin-user \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

# 2. Set up billing alarm (alerts at $50)
aws cloudwatch put-metric-alarm \
  --alarm-name "billing-alarm-50" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 21600 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alerts

# 3. Enable CloudTrail (audit logging)
aws cloudtrail create-trail \
  --name management-trail \
  --s3-bucket-name my-cloudtrail-bucket \
  --is-multi-region-trail
aws cloudtrail start-logging --name management-trail
# Azure — Initial account security setup
# 1. Create a resource group for organization
az group create --name core-infrastructure --location eastus

# 2. Set up budget alert
az consumption budget create \
  --budget-name monthly-budget \
  --amount 100 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2026-05-01 \
  --end-date 2027-05-01

# 3. Enable diagnostic logging
az monitor diagnostic-settings create \
  --name audit-logs \
  --resource "/subscriptions/{sub-id}" \
  --logs '[{"category":"Administrative","enabled":true}]' \
  --storage-account "/subscriptions/{sub-id}/resourceGroups/core-infrastructure/providers/Microsoft.Storage/storageAccounts/auditlogs"
# GCP — Initial account security setup
# 1. Create a project
gcloud projects create my-first-project --name="My First Project"
gcloud config set project my-first-project

# 2. Enable billing budget alerts
gcloud billing budgets create \
  --billing-account=BILLING_ACCOUNT_ID \
  --display-name="Monthly Budget" \
  --budget-amount=100 \
  --threshold-rule=percent=50 \
  --threshold-rule=percent=90 \
  --threshold-rule=percent=100

# 3. Enable audit logging
gcloud projects get-iam-policy my-first-project --format=json > policy.json
# Edit policy.json to add audit logging configuration
gcloud projects set-iam-policy my-first-project policy.json

CLI Tools Overview

Every cloud provider offers a command-line interface that enables infrastructure automation. These are essential tools for any cloud engineer:

Provider CLI Tool Install Command Auth Command
AWS aws pip install awscli or MSI installer aws configure
Azure az curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash az login
GCP gcloud curl https://sdk.cloud.google.com | bash gcloud auth login
# Quick verification commands after installation
# AWS — Check identity
aws sts get-caller-identity
# Output: AccountId, UserId, Arn

# Azure — Check account
az account show --output table
# Output: Name, SubscriptionId, TenantId, State

# GCP — Check project
gcloud config list
# Output: account, project, region, zone

Free Tier Overview

All three providers offer free tiers for learning and experimentation. These are invaluable for getting hands-on experience without financial risk:

Provider Free Credits Always-Free Highlights Gotchas
AWS 12 months of free tier 750 hrs/month t2.micro, 5GB S3, 25GB DynamoDB, 1M Lambda requests/month Some services auto-scale beyond free tier limits
Azure $200 credit (30 days) + 12 months 750 hrs B1s VM, 5GB Blob Storage, 250GB SQL Database, 1M Functions requests $200 credit expires in 30 days regardless of usage
GCP $300 credit (90 days) + always-free 1 e2-micro VM, 5GB Cloud Storage, 1TB BigQuery queries/month, 2M Cloud Functions Most generous always-free tier for compute
Pro Tip: Set up billing alerts BEFORE deploying anything — even on free tier accounts. It's easy to accidentally leave a large VM running or create a resource that doesn't qualify for free tier. A $10 alert threshold gives you early warning before any surprise bills arrive.

Hands-On Exercises

Exercise 1 20 minutes

Service Model Classification Challenge

Classify each of the following services into the correct service model (IaaS, PaaS, SaaS, or FaaS). For each, explain why it belongs to that category by identifying what the customer manages vs what the provider manages:

  1. AWS EC2 with a custom AMI
  2. Google Sheets
  3. Azure Functions triggered by a queue
  4. Heroku with a Git-push deploy
  5. DigitalOcean Droplet running Ubuntu
  6. Salesforce CRM
  7. AWS Lambda processing S3 events
  8. Google App Engine (standard environment)
  9. Microsoft 365 Exchange Online
  10. Azure Virtual Machines running Windows Server
  11. Cloudflare Workers
  12. AWS RDS (managed PostgreSQL)
  13. Snowflake Data Warehouse
  14. GitHub Codespaces
  15. GCP Compute Engine with custom image

Bonus: For services that blur the line (like managed databases), argue which model they most closely fit and why.

Cloud Models Classification Critical Thinking
Exercise 2 30 minutes

Design a High-Availability Architecture

You're designing a web application for an e-commerce company that requires 99.99% availability (less than 53 minutes downtime per year). Design the architecture on paper (or whiteboard) addressing:

  1. Compute layer: How many AZs? What happens when one AZ fails?
  2. Database layer: Primary/standby? Read replicas? Multi-region?
  3. Load balancing: Where? What type? Health checks?
  4. Static assets: CDN? Which regions?
  5. DNS: Failover routing? Latency-based?
  6. Disaster recovery: Which strategy? What's the RTO/RPO?

Draw a diagram showing the complete architecture. Label each component with the AWS/Azure/GCP service you'd use. Calculate the theoretical availability using the formula: Availability = 1 - (1 - AZ_availability)^num_AZs

Architecture High Availability Design
Exercise 3 45 minutes

Deploy Your First Cloud Resource

Sign up for a free tier account on any cloud provider and deploy a basic resource using the CLI. Follow these steps:

  1. Create account: Sign up at aws.amazon.com/free, azure.microsoft.com/free, or cloud.google.com/free
  2. Secure the account: Enable MFA, set up billing alerts at $5 and $10
  3. Install CLI: Install the provider's CLI tool and authenticate
  4. Deploy a resource: Create a small VM or storage bucket using the CLI
  5. Verify: Confirm the resource exists via both CLI and web console
  6. Clean up: Delete the resource to avoid charges
  7. Review billing: Check the billing dashboard to confirm $0 charges

Document: Take screenshots of each step. Note what surprised you about the process — what was easier or harder than expected?

Hands-On Free Tier CLI

Conclusion & Next Steps

Cloud computing is not merely a technology shift — it's an operational paradigm change. In this article, we've covered the essential foundations:

  • Service models (IaaS, PaaS, SaaS, FaaS) and their responsibility boundaries
  • Deployment models (public, private, hybrid, multi-cloud) and when to use each
  • The shared responsibility model — the most critical concept in cloud security
  • Cloud economics — CapEx vs OpEx, pricing models, and cost optimization
  • The Big Three providers and how their services map to each other
  • Architecture patterns — regions, AZs, HA, and DR strategies

With these fundamentals in place, you now have the vocabulary and mental models needed to understand how infrastructure is provisioned, managed, and automated in the cloud era.

Next in the Series

In Part 8: Infrastructure as Code, we'll learn how to define cloud infrastructure declaratively using tools like Terraform, Pulumi, and CloudFormation. You'll go from clicking buttons in a console to expressing your entire infrastructure as version-controlled code that can be reviewed, tested, and deployed automatically.