Introduction
Compute is the most fundamental infrastructure resource. Without it, nothing executes. Every application, database, queue, and orchestrator ultimately depends on a CPU executing instructions.
This part takes you from the physical transistors inside a server all the way to serverless functions. By the end, you will understand exactly what happens when you type aws ec2 run-instances or docker run.
Physical Servers
A modern server is a carefully engineered machine optimised for reliability, density, and performance. Understanding its components helps you make better decisions about cloud instance types.
CPU Architecture
The CPU is the brain of any server. Modern server CPUs (Intel Xeon, AMD EPYC, AWS Graviton) have fundamentally different architectures from desktop processors:
| Feature | Desktop CPU | Server CPU |
|---|---|---|
| Core count | 8–16 cores | 32–128 cores |
| Memory channels | 2 channels | 8–12 channels |
| Max RAM | 128 GB | 2–6 TB |
| PCIe lanes | 24–28 | 128+ |
| ECC memory | Optional | Required |
| Socket support | Single | Dual/Quad |
# Inspect CPU architecture on a Linux server
lscpu
# Example output (AWS c5.xlarge):
# Architecture: x86_64
# CPU(s): 4
# Thread(s) per core: 2
# Core(s) per socket: 2
# Socket(s): 1
# NUMA node(s): 1
# Model name: Intel(R) Xeon(R) Platinum 8275CL
# CPU MHz: 3000.000
# L1d cache: 32K
# L1i cache: 32K
# L2 cache: 1024K
# L3 cache: 36608K
NUMA Topology
Non-Uniform Memory Access (NUMA) is a memory architecture where CPUs have faster access to “local” memory and slower access to “remote” memory on another socket.
flowchart LR
subgraph Socket0[Socket 0 — NUMA Node 0]
CPU0[CPU Cores 0-31]
MEM0[Local RAM
256 GB]
end
subgraph Socket1[Socket 1 — NUMA Node 1]
CPU1[CPU Cores 32-63]
MEM1[Local RAM
256 GB]
end
CPU0 --- MEM0
CPU1 --- MEM1
Socket0 <-->|"Interconnect
(slower)"| Socket1
# Check NUMA topology
numactl --hardware
# Example output:
# available: 2 nodes (0-1)
# node 0 cpus: 0 1 2 3 4 5 6 7
# node 0 size: 262144 MB
# node 1 cpus: 8 9 10 11 12 13 14 15
# node 1 size: 262144 MB
# node distances:
# node 0 1
# 0: 10 21
# 1: 21 10
# Pin a process to NUMA node 0
numactl --cpunodebind=0 --membind=0 ./my-database-server
Memory Architecture
Server memory is organised in a hierarchy, from fastest (CPU registers) to slowest (disk):
| Level | Size | Latency | Purpose |
|---|---|---|---|
| L1 Cache | 32–64 KB per core | ~1 ns | Hot data for current instruction |
| L2 Cache | 256 KB–1 MB per core | ~4 ns | Recent working set |
| L3 Cache | 16–64 MB shared | ~10 ns | Shared across cores |
| RAM (DDR5) | 32 GB–6 TB | ~80 ns | Active application data |
| NVMe SSD | 1–30 TB | ~100 μs | Persistent storage |
Disk Systems
Storage performance is measured in three dimensions:
- IOPS — input/output operations per second (random read/write speed)
- Throughput — MB/s (sequential read/write speed)
- Latency — time for a single I/O operation to complete
# Benchmark disk performance with fio
fio --name=randread --ioengine=libaio --iodepth=32 \
--rw=randread --bs=4k --direct=1 --size=1G \
--numjobs=4 --runtime=60 --group_reporting
# Check current disk I/O stats
iostat -x 1 5
Linux Server Administration
Linux powers over 90% of cloud servers. Understanding Linux process management and resource controls is essential for infrastructure engineering.
Process Isolation
Every application running on a Linux server is a process — an isolated instance of a running program with its own virtual memory space.
# View running processes with resource usage
ps aux --sort=-%mem | head -20
# Real-time process monitoring
top -bn1 | head -30
# Process tree (parent-child relationships)
pstree -p | head -30
# Check open file descriptors for a process
ls -la /proc/$(pgrep nginx | head -1)/fd/ 2>/dev/null | head -20
Resource Management
Linux provides several mechanisms to control how resources are allocated to processes:
# Set CPU affinity (pin process to specific cores)
taskset -c 0,1 ./my-application
# Set process priority (nice value: -20 to 19)
nice -n -10 ./high-priority-process
# Limit memory usage with ulimit
ulimit -v 4194304 # Limit virtual memory to 4GB
# View system resource limits
ulimit -a
cgroups & Namespaces: The Foundation of Containers
Two Linux kernel features make containers possible:
# Create a cgroup that limits CPU to 50%
sudo mkdir -p /sys/fs/cgroup/cpu/my-app
echo 50000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_quota_us
echo 100000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_period_us
# Create a cgroup that limits memory to 512MB
sudo mkdir -p /sys/fs/cgroup/memory/my-app
echo 536870912 | sudo tee /sys/fs/cgroup/memory/my-app/memory.limit_in_bytes
# Run a process inside a new namespace (isolated PID, network, mount)
sudo unshare --pid --net --mount --fork /bin/bash
The Four Compute Models
Modern infrastructure offers four distinct ways to run code, each with different trade-offs:
flowchart LR
BM[Bare Metal
Full Control] --> VM[Virtual Machine
OS-Level Isolation]
VM --> CT[Container
Process Isolation]
CT --> SL[Serverless
Function-Level]
Model 1: Bare Metal
Direct access to physical hardware. No hypervisor overhead. Maximum performance but maximum operational burden.
When to use: High-frequency trading, GPU clusters for ML training, databases requiring predictable latency, workloads needing hardware-specific features (SR-IOV, DPDK).
Model 2: Virtual Machines
Isolated OS environments on shared hardware. Full operating system per VM. The workhorse of cloud computing.
# Provision a VM on AWS
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type c5.2xlarge \
--key-name my-key \
--subnet-id subnet-abc123 \
--security-group-ids sg-abc123
# Provision a VM on Azure
az vm create \
--resource-group my-rg \
--name my-vm \
--image Ubuntu2204 \
--size Standard_D4s_v3 \
--admin-username azureuser \
--generate-ssh-keys
Model 3: Containers
Application packages with dependencies, sharing the host kernel. Fast startup, high density, portable across environments.
# Build a container image
cat <<'EOF' > Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
USER node
CMD ["node", "server.js"]
EOF
docker build -t my-api:1.0 .
docker run -d -p 3000:3000 --memory=512m --cpus=1 my-api:1.0
Model 4: Serverless
Upload code, let the cloud handle everything else. Scales to zero when idle, scales to thousands when busy.
# Deploy a Lambda function
zip function.zip lambda_function.py
aws lambda create-function \
--function-name process-order \
--runtime python3.11 \
--handler lambda_function.handler \
--role arn:aws:iam::123456789:role/lambda-exec \
--zip-file fileb://function.zip \
--memory-size 256 \
--timeout 30
# Invoke it
aws lambda invoke \
--function-name process-order \
--payload '{"order_id": "12345"}' \
response.json
cat response.json
Exercises
Explore Your System’s Compute Resources
Run the following commands on a Linux system (or WSL on Windows) and document what you find about your CPU, memory, and processes:
# CPU info
lscpu | grep -E "^(Architecture|CPU|Thread|Core|Socket|Model name|NUMA)"
# Memory info
free -h
cat /proc/meminfo | head -10
# Disk performance (read speed)
sudo hdparm -tT /dev/sda 2>/dev/null || echo "Try: dd if=/dev/zero of=/tmp/test bs=1M count=1024"
# Top 10 processes by memory
ps aux --sort=-%mem | head -10
Choose the Right Instance Type
For each workload, recommend the best AWS EC2 instance family and explain your reasoning:
- A web application serving 1000 requests/second with moderate CPU needs
- A PostgreSQL database with 500 GB of data and heavy random I/O
- A machine learning training job using 8 GPUs
- A batch processing job analysing 10 TB of log files
- An in-memory Redis cache storing 256 GB of session data
Hint: Instance families — t3 (burstable), m5 (general), c5 (compute), r5 (memory), i3 (storage), p4 (GPU)
Conclusion & Next Steps
You now understand the compute domain from silicon to serverless:
- Physical servers — CPUs, NUMA, memory hierarchy, disk performance
- Linux fundamentals — processes, resource management, cgroups, namespaces
- Four compute models — bare metal, VMs, containers, serverless
- Instance type selection — matching workload requirements to hardware capabilities
Next in the Series
In Part 4: Virtualization Deep Dive, we explore hypervisors in detail — how they work, how vCPUs and virtual memory are implemented, and how cloud providers build their compute platforms on virtualization.