Part 7: Cloud Computing Fundamentals

Introduction

Cloud computing has fundamentally transformed how organizations build, deploy, and operate technology. Yet despite being the backbone of modern infrastructure, it remains widely misunderstood. "The cloud is just someone else's computer" is a popular quip — but it's dangerously oversimplified. Cloud computing isn't just about where your servers live; it's about an entirely different operational model for consuming technology.

In this part, we'll build a complete understanding of cloud computing from the ground up: what it truly means, how service models differ, how the major providers organize their offerings, and how to make sound architectural and economic decisions in the cloud era.

                            
                            Key Insight: Cloud computing is not a location — it's a delivery model. It represents a fundamental shift from owning and operating infrastructure (CapEx) to consuming it as a metered service (OpEx). The real value isn't "someone else's data center" — it's on-demand elasticity, global reach, and the ability to experiment at near-zero marginal cost.
                        

The 5 NIST Essential Characteristics

The National Institute of Standards and Technology (NIST) defines cloud computing through five essential characteristics that any true cloud service must exhibit. If a service lacks any of these, it's not really cloud — it's just hosted infrastructure:

The 5 NIST Essential Characteristics of Cloud Computing

mindmap
  root((Cloud Computing))
    On-Demand Self-Service
      Provision resources without human interaction
      API-driven automation
      Instant availability
    Broad Network Access
      Available over standard networks
      Accessible from any device
      Platform-independent
    Resource Pooling
      Multi-tenant model
      Location independence
      Dynamic resource assignment
    Rapid Elasticity
      Scale up and down automatically
      Appear unlimited to consumer
      Pay only for what you use
    Measured Service
      Usage is metered
      Pay-per-use billing
      Transparent monitoring

Characteristic	What It Means	Real-World Example
On-Demand Self-Service	Provision resources (servers, storage, networks) without needing to contact a human	Spin up 100 VMs via API at 2 AM on a Saturday
Broad Network Access	Services available over standard networks, accessible from any device or platform	Access your cloud console from a phone, laptop, or tablet
Resource Pooling	Provider's resources are pooled across multiple tenants with dynamic assignment	Your VM shares physical hardware with other customers (isolated)
Rapid Elasticity	Resources can be scaled up or down automatically, appearing unlimited	Auto-scale from 2 to 200 instances during Black Friday traffic
Measured Service	Usage is metered, reported, and billed transparently (pay-per-use)	Billed $0.023 per GB-month of storage actually consumed

                            
                            Definition: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. — NIST SP 800-145
                        

Service Models

Cloud service models define the boundary of responsibility between you (the customer) and the cloud provider. Think of it as a spectrum: at one end you manage everything; at the other, the provider manages everything. The four primary models are IaaS, PaaS, SaaS, and FaaS.

Cloud Service Model Stack — What You Manage vs Provider Manages

graph TB
    subgraph "On-Premises (You Manage Everything)"
        A1[Application]
        A2[Data]
        A3[Runtime]
        A4[Middleware]
        A5[Operating System]
        A6[Virtualization]
        A7[Servers]
        A8[Storage]
        A9[Networking]
    end

    subgraph "IaaS (You Manage OS and Up)"
        B1[Application]
        B2[Data]
        B3[Runtime]
        B4[Middleware]
        B5[Operating System]
        B6[Virtualization — Provider]
        B7[Servers — Provider]
        B8[Storage — Provider]
        B9[Networking — Provider]
    end

    subgraph "PaaS (You Manage Code and Data)"
        C1[Application]
        C2[Data]
        C3[Runtime — Provider]
        C4[Middleware — Provider]
        C5[OS — Provider]
        C6[Virtualization — Provider]
        C7[Servers — Provider]
        C8[Storage — Provider]
        C9[Networking — Provider]
    end

    subgraph "SaaS (Provider Manages Everything)"
        D1[Application — Provider]
        D2[Data — Provider manages infra]
        D3[Runtime — Provider]
        D4[Middleware — Provider]
        D5[OS — Provider]
        D6[Virtualization — Provider]
        D7[Servers — Provider]
        D8[Storage — Provider]
        D9[Networking — Provider]
    end

IaaS — Infrastructure as a Service

IaaS provides the fundamental building blocks of cloud IT: virtual machines, storage volumes, and networks. You rent raw infrastructure and manage everything from the operating system up. It's the most flexible model but also the most operationally demanding.

You manage: OS, middleware, runtime, application, data, patching, security configuration

Provider manages: Physical hardware, hypervisor, networking fabric, physical security, power/cooling

Examples: AWS EC2, Azure Virtual Machines, GCP Compute Engine, DigitalOcean Droplets

Best for: Lift-and-shift migrations, custom OS requirements, full control over the stack, legacy applications

# Launch an IaaS VM on AWS
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.medium \
  --key-name my-key \
  --subnet-id subnet-0123456789abcdef0 \
  --security-group-ids sg-0123456789abcdef0 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-server-01}]'

# Launch an IaaS VM on Azure
az vm create \
  --resource-group my-rg \
  --name web-server-01 \
  --image Ubuntu2204 \
  --size Standard_B2s \
  --admin-username azureuser \
  --generate-ssh-keys

# Launch an IaaS VM on GCP
gcloud compute instances create web-server-01 \
  --zone=us-central1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud

PaaS — Platform as a Service

PaaS abstracts away the operating system and runtime environment. You deploy your code and data; the platform handles everything else — OS patching, load balancing, scaling, and runtime updates. This dramatically reduces operational overhead but limits customization.

You manage: Application code, data, application-level configuration

Provider manages: Runtime, middleware, OS, hardware, networking, scaling, patching

Examples: AWS Elastic Beanstalk, Azure App Service, Google App Engine, Heroku, Railway

Best for: Web applications, APIs, microservices, developer productivity, rapid prototyping

# Deploy to Azure App Service (PaaS)
az webapp up \
  --resource-group my-rg \
  --name my-web-app \
  --runtime "PYTHON:3.11" \
  --sku B1

# Deploy to Google App Engine (PaaS)
# First create app.yaml in your project root:
# runtime: python311
# instance_class: F2
# automatic_scaling:
#   min_instances: 1
#   max_instances: 10
gcloud app deploy app.yaml --project=my-project

# Deploy to AWS Elastic Beanstalk (PaaS)
eb init my-app --platform python-3.11 --region us-east-1
eb create production --instance_type t3.small

SaaS — Software as a Service

SaaS is fully managed software delivered over the internet. You don't manage any part of the technology stack — you simply use the application. The provider handles everything: the application code, data infrastructure, scaling, updates, and security.

You manage: User configuration, access control (who can use it), your data within the application

Provider manages: Literally everything else — code, infrastructure, updates, availability

Examples: Microsoft 365, Salesforce, Slack, Zoom, GitHub, Datadog, Snowflake

Best for: End-user productivity, business applications, collaboration, when you want to consume not build

FaaS — Function as a Service

FaaS (often called "serverless compute") takes abstraction further than PaaS. You write individual functions that execute in response to events. There are no servers to provision, no runtime to configure — you pay only for the milliseconds your code actually runs.

You manage: Function code, event triggers, function-level configuration

Provider manages: Execution environment, scaling (including to zero), infrastructure, container lifecycle

Examples: AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers

Best for: Event-driven architectures, webhooks, data processing pipelines, scheduled tasks, APIs with variable traffic

# Deploy an AWS Lambda function
aws lambda create-function \
  --function-name process-order \
  --runtime python3.11 \
  --handler lambda_function.lambda_handler \
  --role arn:aws:iam::123456789012:role/lambda-exec-role \
  --zip-file fileb://function.zip \
  --timeout 30 \
  --memory-size 256

# Deploy an Azure Function
func azure functionapp publish my-function-app

# Deploy a Google Cloud Function
gcloud functions deploy process-order \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --region us-central1 \
  --memory 256MB

Service Model Comparison

Layer	On-Premises	IaaS	PaaS	FaaS	SaaS
Application	You	You	You	You	Provider
Data	You	You	You	You	Shared
Runtime	You	You	Provider	Provider	Provider
Middleware	You	You	Provider	Provider	Provider
Operating System	You	You	Provider	Provider	Provider
Virtualization	You	Provider	Provider	Provider	Provider
Servers	You	Provider	Provider	Provider	Provider
Storage	You	Provider	Provider	Provider	Provider
Networking	You	Provider	Provider	Provider	Provider

                            
                            Common Mistake: Many teams default to IaaS because it's "most familiar" (it looks like traditional VMs). But IaaS carries the highest operational burden. If you're deploying a web application and don't need custom kernel modules or OS-level tweaks, PaaS or FaaS will likely be more cost-effective and secure. Always start at the highest abstraction level that meets your requirements, and drop down only when needed.
                        

Deployment Models

Deployment models describe where and how cloud infrastructure is hosted, owned, and shared. The choice of deployment model is driven by requirements around data sovereignty, compliance, latency, and cost.

Public Cloud

Infrastructure owned and operated by a third-party provider (AWS, Azure, GCP), delivered over the public internet. Resources are shared across multiple organizations (multi-tenant), though isolated at the hypervisor level. This is the most common deployment model.

Advantages: No upfront CapEx, global scale, broad service catalog, instant provisioning, elasticity

Disadvantages: Data leaves your premises, shared infrastructure (perception issue), vendor lock-in risk, egress costs

Private Cloud

Infrastructure dedicated to a single organization, either hosted on-premises or by a third party. Provides cloud-like self-service and elasticity but within a controlled environment. Often required for strict regulatory compliance (healthcare, government, finance).

Advantages: Full control, data sovereignty, compliance, customizable security, predictable performance

Disadvantages: High CapEx, limited scale, slow provisioning vs public cloud, requires specialized staff

Examples: VMware vSphere + vRealize, OpenStack, Azure Stack HCI, AWS Outposts

Hybrid Cloud

A combination of public and private cloud environments connected by networking, allowing data and applications to move between them. The key requirement is orchestration — the two environments must work together as a single architecture, not just coexist.

Advantages: Flexibility, keep sensitive workloads private while bursting to public, gradual migration path

Disadvantages: Complexity (networking, identity, security), skills gap, potential latency between environments

Use Cases: Cloud bursting (handle peaks in public cloud), data residency (keep EU data on-premises), gradual migration

Multi-Cloud

Using services from multiple public cloud providers simultaneously. This is distinct from hybrid (which is public + private). Multi-cloud might mean running compute on AWS, databases on GCP, and AI/ML on Azure — or distributing the same workload across providers for resilience.

Advantages: Avoid vendor lock-in, best-of-breed services, geographic reach, negotiating leverage, redundancy

Disadvantages: Massive operational complexity, skill dilution, inconsistent APIs, networking challenges, cost visibility

Factor	Public Cloud	Private Cloud	Hybrid Cloud	Multi-Cloud
CapEx Required	None	Very High	High	None
Scalability	Near-infinite	Limited	High (burst to public)	Near-infinite
Data Control	Limited (provider region)	Full	Split	Distributed
Complexity	Low-Medium	High	Very High	Extreme
Vendor Lock-in Risk	High	Low (if open-source)	Medium	Low
Best For	Startups, SaaS, variable workloads	Regulated industries, sensitive data	Enterprises migrating, compliance	Large enterprises, best-of-breed

                            
                            Pragmatic Advice: Multi-cloud is frequently adopted for the wrong reasons. "Avoiding vendor lock-in" sounds compelling but the operational cost of maintaining expertise across 2-3 clouds often exceeds the switching cost you're trying to avoid. Adopt multi-cloud when you have a genuine technical reason (best-of-breed services, geographic requirements) — not as insurance.
                        

The Shared Responsibility Model

The shared responsibility model is the single most important concept in cloud security. It defines a clear boundary: the cloud provider secures the infrastructure of the cloud, while you secure everything in the cloud. Misunderstanding this boundary is the #1 cause of cloud security breaches.

Shared Responsibility — Provider vs Customer

graph TB
    subgraph "Customer Responsibility (Security IN the Cloud)"
        C1[Customer Data]
        C2[Platform & Application Management]
        C3[Identity & Access Management]
        C4[Operating System & Network Configuration]
        C5[Client-Side Encryption]
        C6[Network Traffic Protection]
    end

    subgraph "Provider Responsibility (Security OF the Cloud)"
        P1[Physical Security — Data Centers]
        P2[Hardware — Servers, Storage, Networking]
        P3[Hypervisor & Host OS]
        P4[Global Network Infrastructure]
        P5[Managed Service Infrastructure]
        P6[Compliance Certifications]
    end

    C6 --> P1

What the Cloud Provider Is Responsible For

Physical security: Guards, biometrics, surveillance, locked cages
Hardware: Server procurement, maintenance, disposal, firmware updates
Hypervisor/host OS: Patching and securing the virtualization layer
Network infrastructure: Backbone connectivity, DDoS protection at the network edge
Compliance: Achieving and maintaining SOC 2, ISO 27001, PCI DSS certifications for their infrastructure

What the Customer Is Responsible For

Data classification and encryption: Encrypting sensitive data at rest and in transit
Identity and access management: MFA, least privilege, role-based access control
Network security: Security groups, NACLs, WAF rules, VPN configuration
Application security: Code vulnerabilities, patching your dependencies
OS patching (IaaS): Keeping guest OS updated and hardened
Compliance: Ensuring your usage of the cloud meets YOUR regulatory requirements

How Responsibility Shifts by Service Model

Responsibility	IaaS	PaaS	SaaS
Data Classification & Encryption	Customer	Customer	Customer
Identity & Access Management	Customer	Customer	Customer
Application Security	Customer	Customer	Provider
Network Controls	Customer	Shared	Provider
OS Patching	Customer	Provider	Provider
Runtime & Middleware	Customer	Provider	Provider
Physical Infrastructure	Provider	Provider	Provider

                            
                            Security Warning: Data encryption, identity management, and access control are ALWAYS the customer's responsibility, regardless of service model. An exposed S3 bucket isn't AWS's fault — it's yours. A database without MFA isn't Azure's problem — it's yours. The shared responsibility model means the provider cannot secure what they cannot see (your data, your users, your configurations).
                        

Cloud Economics

Understanding cloud economics is critical for making sound infrastructure decisions. The shift from on-premises to cloud isn't simply "servers become subscription fees" — it fundamentally changes how organizations think about technology investment.

CapEx vs OpEx

Aspect	CapEx (On-Premises)	OpEx (Cloud)
Cost Type	Large upfront investment	Pay-as-you-go, monthly billing
Accounting	Depreciated over 3-5 years	Expensed in current period
Capacity Planning	Must predict 3-5 years ahead	Adjust monthly or hourly
Risk	Over-provision or under-provision	Right-size continuously
Time to Deploy	Weeks to months (procurement)	Minutes (API call)
Staffing	Need hardware engineers, facility staff	Cloud architects, DevOps engineers
Hidden Costs	Power, cooling, floor space, insurance	Egress, cross-region transfer, API calls

Cloud Pricing Models

Cloud providers offer multiple pricing tiers designed to reward commitment with discounts:

Pricing Model	Discount	Commitment	Best For	Risk
On-Demand	0% (baseline)	None	Variable workloads, testing, short-term	None — pay only for what you use
Reserved (1yr)	~30-40%	1-year term	Steady-state production workloads	Committed even if unused
Reserved (3yr)	~50-72%	3-year term	Long-running databases, core services	Significant lock-in
Spot/Preemptible	~60-90%	None (can be reclaimed)	Batch processing, CI/CD, fault-tolerant	Instances terminated with 2-min notice
Savings Plans	~30-60%	$/hr spend commitment	Flexible workloads across instance types	Must spend minimum per hour

Total Cost of Ownership (TCO)

A fair cloud vs on-premises comparison must account for all costs, not just the sticker price of a server. TCO includes:

Hardware costs: Servers, storage, networking equipment, spare parts
Facility costs: Data center space, power, cooling, physical security, fire suppression
Personnel costs: Hardware engineers, network engineers, facility managers, 24/7 NOC
Software licenses: Hypervisor licenses, OS licenses, management tools
Lifecycle costs: Hardware refresh every 3-5 years, decommissioning, e-waste
Opportunity cost: Money tied up in depreciating assets vs invested elsewhere

                            
                            When Cloud is MORE Expensive: Cloud isn't always cheaper. Steady-state workloads running 24/7 at consistent utilization are often cheaper on-premises or with reserved instances. The "cloud is always cheaper" myth fails for: large-scale databases with predictable load, GPU clusters running continuously, high-egress workloads (streaming video), and workloads that don't benefit from elasticity. Always run the TCO numbers.
                        

Cost Optimization Strategies

Strategy	Typical Savings	Implementation Effort	Description
Right-Sizing	20-40%	Low	Match instance size to actual usage (most VMs are over-provisioned)
Reserved Capacity	30-72%	Low	Commit to 1-3 year terms for steady workloads
Spot Instances	60-90%	Medium	Use interruptible capacity for fault-tolerant workloads
Auto-Scaling	20-50%	Medium	Scale down during off-peak, scale up during peak
Scheduled Shutdowns	40-70%	Low	Turn off dev/test environments nights and weekends
Storage Tiering	50-80%	Low	Move cold data to cheaper tiers (Glacier, Cool, Archive)
Architecture Optimization	30-60%	High	Move from IaaS to PaaS/serverless where appropriate

The Big Three: AWS, Azure, GCP

The public cloud market is dominated by three hyperscale providers that together control approximately 67% of global cloud spending. Each has distinct strengths, histories, and philosophical approaches to cloud services.

Amazon Web Services (AWS)

Founded: 2006 (first mover) | Market Share: ~31% | Regions: 34+ | Services: 200+

AWS was first to market and has the broadest and deepest service catalog. Its philosophy is "build primitives and let customers compose them." This gives maximum flexibility but can feel overwhelming — AWS often has 3-5 ways to accomplish the same task.

Strengths: Broadest service catalog, largest ecosystem, most mature managed services, strongest serverless platform (Lambda), deepest marketplace

Considerations: Complex pricing, naming conventions can be confusing (SQS, SNS, SES, etc.), console UX is functional but dense

Microsoft Azure

Founded: 2010 | Market Share: ~25% | Regions: 60+ | Services: 200+

Azure's strength is enterprise integration. If your organization runs Microsoft 365, Active Directory, SQL Server, or .NET, Azure provides the tightest integration. Its hybrid story (Azure Arc, Azure Stack) is the strongest in the industry.

Strengths: Enterprise/Microsoft ecosystem integration, strongest hybrid cloud (Arc, Stack HCI), Azure AD/Entra ID for identity, compliance certifications for government, excellent developer experience with VS Code + GitHub

Considerations: Service naming changes frequently, documentation quality varies, some services less mature than AWS equivalents

Google Cloud Platform (GCP)

Founded: 2008 (public 2011) | Market Share: ~11% | Regions: 40+ | Services: 150+

GCP is built on Google's internal infrastructure (Borg → Kubernetes, Spanner, BigQuery). Its strengths are data analytics, machine learning, global networking, and developer experience. GCP's philosophy favors opinionated, well-designed services over breadth.

Strengths: Superior data/analytics (BigQuery), best Kubernetes experience (GKE), global network (private backbone), strong AI/ML (Vertex AI, TPUs), clean API design

Considerations: Smaller service catalog, enterprise features maturing, perception of product deprecation risk

Service Mapping Across Providers

Category	AWS	Azure	GCP
Virtual Machines	EC2	Virtual Machines	Compute Engine
Serverless Compute	Lambda	Functions	Cloud Functions
Containers (Managed K8s)	EKS	AKS	GKE
Container Service	ECS / Fargate	Container Apps	Cloud Run
Object Storage	S3	Blob Storage	Cloud Storage
Block Storage	EBS	Managed Disks	Persistent Disk
File Storage	EFS	Azure Files	Filestore
Relational Database	RDS / Aurora	SQL Database / Cosmos DB (SQL)	Cloud SQL / AlloyDB
NoSQL Database	DynamoDB	Cosmos DB	Firestore / Bigtable
Data Warehouse	Redshift	Synapse Analytics	BigQuery
VPC / Networking	VPC	Virtual Network (VNet)	VPC
Load Balancer	ALB / NLB / ELB	Load Balancer / App Gateway	Cloud Load Balancing
DNS	Route 53	Azure DNS	Cloud DNS
CDN	CloudFront	Azure CDN / Front Door	Cloud CDN
IAM	IAM	Entra ID (Azure AD) + RBAC	Cloud IAM
Monitoring	CloudWatch	Monitor / App Insights	Cloud Monitoring
IaC Service	CloudFormation	ARM / Bicep	Deployment Manager / Config Connector
Message Queue	SQS	Service Bus / Queue Storage	Pub/Sub
AI/ML Platform	SageMaker	Azure AI / ML Studio	Vertex AI

                            
                            How to Choose: Don't choose a cloud provider based on feature comparison tables alone. Consider: (1) What your team already knows, (2) Your existing ecosystem (Microsoft shop → Azure, startup → AWS, data-heavy → GCP), (3) Compliance and region requirements, (4) Specific services you need (BigQuery has no true equivalent), (5) Support and partnership tier.
                        

Cloud Architecture Patterns

Regions and Availability Zones

Cloud providers organize their infrastructure into Regions (geographic areas like us-east-1, westeurope, asia-east1) and Availability Zones (AZs — isolated data centers within a region connected by low-latency links). Understanding this hierarchy is fundamental to designing resilient architectures.

Cloud Infrastructure Hierarchy — Region → AZ → Data Center

graph TB
    subgraph "AWS Region: us-east-1 (N. Virginia)"
        subgraph "AZ: us-east-1a"
            DC1[Data Center 1]
            DC2[Data Center 2]
        end
        subgraph "AZ: us-east-1b"
            DC3[Data Center 3]
            DC4[Data Center 4]
        end
        subgraph "AZ: us-east-1c"
            DC5[Data Center 5]
            DC6[Data Center 6]
        end
    end

    DC1 ---|"< 2ms latency"| DC3
    DC3 ---|"< 2ms latency"| DC5
    DC1 ---|"< 2ms latency"| DC5

Concept	Description	Failure Domain	Example
Region	Geographic area with 2-6 AZs	Natural disaster, country-level outage	us-east-1, eu-west-1, asia-southeast1
Availability Zone	1+ data centers with independent power/cooling/networking	Single facility failure (fire, flood, power)	us-east-1a, us-east-1b
Edge Location	CDN point of presence for content caching	Local connectivity	CloudFront PoP in Chicago
Local Zone	Extension of a region closer to users	Local infrastructure	us-east-1-chi-1 (Chicago)

High Availability Patterns

High Availability (HA) means designing systems that continue operating even when individual components fail. In cloud, this is primarily achieved by distributing resources across multiple AZs or regions.

Multi-AZ High Availability Architecture

graph TB
    Users[Users / Internet] --> LB[Load Balancer — Multi-AZ]

    subgraph "Availability Zone A"
        LB --> WebA[Web Server A]
        WebA --> AppA[App Server A]
        AppA --> DB_Primary[Database Primary]
    end

    subgraph "Availability Zone B"
        LB --> WebB[Web Server B]
        WebB --> AppB[App Server B]
        AppB --> DB_Standby[Database Standby — Sync Replication]
    end

    DB_Primary ---|"Synchronous Replication"| DB_Standby

Key HA principles:

Eliminate single points of failure: Every component should have a redundant pair
Use managed services: Managed databases (RDS Multi-AZ) handle failover automatically
Design for failure: Assume any component can fail at any time
Test failover regularly: Chaos engineering (Netflix Chaos Monkey approach)

Disaster Recovery Strategies

Disaster Recovery (DR) protects against region-level failures. The four standard DR strategies trade cost against recovery speed:

Strategy	RTO	RPO	Cost	Description
Backup & Restore	Hours	Hours	$	Regular backups to another region; restore from backup on failure
Pilot Light	10-30 min	Minutes	$$	Core services running (DB replication); scale up on failure
Warm Standby	Minutes	Seconds	$$$	Scaled-down copy of production running; scale up on failure
Active-Active (Multi-Region)	~0 (automatic)	~0	$$$$	Full production in multiple regions; traffic routes around failures

RTO = Recovery Time Objective (how long until you're back online)
RPO = Recovery Point Objective (how much data you can afford to lose)

The Well-Architected Framework

All three major providers publish Well-Architected Frameworks that provide guidance across six pillars. While the details differ, the pillars are largely consistent:

Pillar	Focus	Key Questions
Operational Excellence	Run and monitor systems, improve processes	How do you respond to unplanned events? How do you evolve?
Security	Protect information, systems, and assets	How do you manage identities? How do you detect threats?
Reliability	Recover from failures, meet demand	How do you handle component failures? How do you test recovery?
Performance Efficiency	Use resources efficiently as demand changes	How do you select the right instance type? How do you monitor?
Cost Optimization	Avoid unnecessary costs	How do you govern usage? How do you decommission unused resources?
Sustainability	Minimize environmental impact	How do you select efficient regions? How do you right-size?

Getting Started with Cloud

Account Setup and Security

Before deploying your first resource, secure your cloud account. The majority of cloud security breaches trace back to misconfigured accounts, not sophisticated attacks.

                            
                            Day-1 Security Checklist: (1) Enable MFA on root/owner account immediately, (2) Create a separate admin user — never use root for daily work, (3) Set up billing alerts at $10, $50, $100 thresholds, (4) Enable CloudTrail/Activity Log/Audit Logs from day one, (5) Configure a budget and spend cap if available.
                        

# AWS — Initial account security setup
# 1. Create an IAM admin user (don't use root)
aws iam create-user --user-name admin-user
aws iam attach-user-policy \
  --user-name admin-user \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

# 2. Set up billing alarm (alerts at $50)
aws cloudwatch put-metric-alarm \
  --alarm-name "billing-alarm-50" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 21600 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alerts

# 3. Enable CloudTrail (audit logging)
aws cloudtrail create-trail \
  --name management-trail \
  --s3-bucket-name my-cloudtrail-bucket \
  --is-multi-region-trail
aws cloudtrail start-logging --name management-trail

# Azure — Initial account security setup
# 1. Create a resource group for organization
az group create --name core-infrastructure --location eastus

# 2. Set up budget alert
az consumption budget create \
  --budget-name monthly-budget \
  --amount 100 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2026-05-01 \
  --end-date 2027-05-01

# 3. Enable diagnostic logging
az monitor diagnostic-settings create \
  --name audit-logs \
  --resource "/subscriptions/{sub-id}" \
  --logs '[{"category":"Administrative","enabled":true}]' \
  --storage-account "/subscriptions/{sub-id}/resourceGroups/core-infrastructure/providers/Microsoft.Storage/storageAccounts/auditlogs"

# GCP — Initial account security setup
# 1. Create a project
gcloud projects create my-first-project --name="My First Project"
gcloud config set project my-first-project

# 2. Enable billing budget alerts
gcloud billing budgets create \
  --billing-account=BILLING_ACCOUNT_ID \
  --display-name="Monthly Budget" \
  --budget-amount=100 \
  --threshold-rule=percent=50 \
  --threshold-rule=percent=90 \
  --threshold-rule=percent=100

# 3. Enable audit logging
gcloud projects get-iam-policy my-first-project --format=json > policy.json
# Edit policy.json to add audit logging configuration
gcloud projects set-iam-policy my-first-project policy.json

CLI Tools Overview

Every cloud provider offers a command-line interface that enables infrastructure automation. These are essential tools for any cloud engineer:

Provider	CLI Tool	Install Command	Auth Command
AWS	`aws`	`pip install awscli` or MSI installer	`aws configure`
Azure	`az`	`curl -sL https://aka.ms/InstallAzureCLIDeb \| sudo bash`	`az login`
GCP	`gcloud`	`curl https://sdk.cloud.google.com \| bash`	`gcloud auth login`

# Quick verification commands after installation
# AWS — Check identity
aws sts get-caller-identity
# Output: AccountId, UserId, Arn

# Azure — Check account
az account show --output table
# Output: Name, SubscriptionId, TenantId, State

# GCP — Check project
gcloud config list
# Output: account, project, region, zone

Free Tier Overview

All three providers offer free tiers for learning and experimentation. These are invaluable for getting hands-on experience without financial risk:

Provider	Free Credits	Always-Free Highlights	Gotchas
AWS	12 months of free tier	750 hrs/month t2.micro, 5GB S3, 25GB DynamoDB, 1M Lambda requests/month	Some services auto-scale beyond free tier limits
Azure	$200 credit (30 days) + 12 months	750 hrs B1s VM, 5GB Blob Storage, 250GB SQL Database, 1M Functions requests	$200 credit expires in 30 days regardless of usage
GCP	$300 credit (90 days) + always-free	1 e2-micro VM, 5GB Cloud Storage, 1TB BigQuery queries/month, 2M Cloud Functions	Most generous always-free tier for compute

                            
                            Pro Tip: Set up billing alerts BEFORE deploying anything — even on free tier accounts. It's easy to accidentally leave a large VM running or create a resource that doesn't qualify for free tier. A $10 alert threshold gives you early warning before any surprise bills arrive.
                        

Hands-On Exercises

Exercise 1 20 minutes

Service Model Classification Challenge

Classify each of the following services into the correct service model (IaaS, PaaS, SaaS, or FaaS). For each, explain why it belongs to that category by identifying what the customer manages vs what the provider manages:

AWS EC2 with a custom AMI
Google Sheets
Azure Functions triggered by a queue
Heroku with a Git-push deploy
DigitalOcean Droplet running Ubuntu
Salesforce CRM
AWS Lambda processing S3 events
Google App Engine (standard environment)
Microsoft 365 Exchange Online
Azure Virtual Machines running Windows Server
Cloudflare Workers
AWS RDS (managed PostgreSQL)
Snowflake Data Warehouse
GitHub Codespaces
GCP Compute Engine with custom image

Bonus: For services that blur the line (like managed databases), argue which model they most closely fit and why.

Cloud Models Classification Critical Thinking

Exercise 2 30 minutes

Design a High-Availability Architecture

You're designing a web application for an e-commerce company that requires 99.99% availability (less than 53 minutes downtime per year). Design the architecture on paper (or whiteboard) addressing:

Compute layer: How many AZs? What happens when one AZ fails?
Database layer: Primary/standby? Read replicas? Multi-region?
Load balancing: Where? What type? Health checks?
Static assets: CDN? Which regions?
DNS: Failover routing? Latency-based?
Disaster recovery: Which strategy? What's the RTO/RPO?

Draw a diagram showing the complete architecture. Label each component with the AWS/Azure/GCP service you'd use. Calculate the theoretical availability using the formula: Availability = 1 - (1 - AZ_availability)^num_AZs

Architecture High Availability Design

Exercise 3 45 minutes

Deploy Your First Cloud Resource

Create account: Sign up at aws.amazon.com/free, azure.microsoft.com/free, or cloud.google.com/free
Secure the account: Enable MFA, set up billing alerts at $5 and $10
Install CLI: Install the provider's CLI tool and authenticate
Deploy a resource: Create a small VM or storage bucket using the CLI
Verify: Confirm the resource exists via both CLI and web console
Clean up: Delete the resource to avoid charges
Review billing: Check the billing dashboard to confirm $0 charges

Document: Take screenshots of each step. Note what surprised you about the process — what was easier or harder than expected?

Hands-On Free Tier CLI

Conclusion & Next Steps

Cloud computing is not merely a technology shift — it's an operational paradigm change. In this article, we've covered the essential foundations:

Service models (IaaS, PaaS, SaaS, FaaS) and their responsibility boundaries
Deployment models (public, private, hybrid, multi-cloud) and when to use each
The shared responsibility model — the most critical concept in cloud security
Cloud economics — CapEx vs OpEx, pricing models, and cost optimization
The Big Three providers and how their services map to each other
Architecture patterns — regions, AZs, HA, and DR strategies

With these fundamentals in place, you now have the vocabulary and mental models needed to understand how infrastructure is provisioned, managed, and automated in the cloud era.

Next in the Series

In Part 8: Infrastructure as Code, we'll learn how to define cloud infrastructure declaratively using tools like Terraform, Pulumi, and CloudFormation. You'll go from clicking buttons in a console to expressing your entire infrastructure as version-controlled code that can be reviewed, tested, and deployed automatically.

Previous Part 6: Infrastructure Storage Next Part 8: Infrastructure as Code

Cookie Consent