Part 3: Servers & Compute

Introduction

Compute is the most fundamental infrastructure resource. Without it, nothing executes. Every application, database, queue, and orchestrator ultimately depends on a CPU executing instructions.

This part takes you from the physical transistors inside a server all the way to serverless functions. By the end, you will understand exactly what happens when you type aws ec2 run-instances or docker run.

                            
                            Why This Matters: Even if you never touch physical hardware, understanding how CPUs, memory, and disk work helps you choose the right cloud instance types, debug performance issues, and write better infrastructure code. A t3.micro behaves very differently from a c5.4xlarge — and this part explains why.
                        

Physical Servers

A modern server is a carefully engineered machine optimised for reliability, density, and performance. Understanding its components helps you make better decisions about cloud instance types.

CPU Architecture

The CPU is the brain of any server. Modern server CPUs (Intel Xeon, AMD EPYC, AWS Graviton) have fundamentally different architectures from desktop processors:

Feature	Desktop CPU	Server CPU
Core count	8–16 cores	32–128 cores
Memory channels	2 channels	8–12 channels
Max RAM	128 GB	2–6 TB
PCIe lanes	24–28	128+
ECC memory	Optional	Required
Socket support	Single	Dual/Quad

# Inspect CPU architecture on a Linux server
lscpu

# Example output (AWS c5.xlarge):
# Architecture:        x86_64
# CPU(s):              4
# Thread(s) per core:  2
# Core(s) per socket:  2
# Socket(s):           1
# NUMA node(s):        1
# Model name:          Intel(R) Xeon(R) Platinum 8275CL
# CPU MHz:             3000.000
# L1d cache:           32K
# L1i cache:           32K
# L2 cache:            1024K
# L3 cache:            36608K

NUMA Topology

Non-Uniform Memory Access (NUMA) is a memory architecture where CPUs have faster access to “local” memory and slower access to “remote” memory on another socket.

NUMA Architecture (Dual-Socket Server)

                                flowchart LR
                                    subgraph Socket0[Socket 0 — NUMA Node 0]
                                        CPU0[CPU Cores 0-31]
                                        MEM0[Local RAM
256 GB]
                                    end
                                    subgraph Socket1[Socket 1 — NUMA Node 1]
                                        CPU1[CPU Cores 32-63]
                                        MEM1[Local RAM
256 GB]
                                    end
                                    CPU0 --- MEM0
                                    CPU1 --- MEM1
                                    Socket0 <-->|"Interconnect
(slower)"| Socket1

                            
                            Performance Impact: Accessing remote NUMA memory can be 40–100% slower than local memory. In cloud environments, this is why instance types specify “1 NUMA node” or “2 NUMA nodes” — workloads sensitive to memory latency (databases, caches) should be pinned to a single NUMA node.
                        

# Check NUMA topology
numactl --hardware

# Example output:
# available: 2 nodes (0-1)
# node 0 cpus: 0 1 2 3 4 5 6 7
# node 0 size: 262144 MB
# node 1 cpus: 8 9 10 11 12 13 14 15
# node 1 size: 262144 MB
# node distances:
# node   0   1
#   0:  10  21
#   1:  21  10

# Pin a process to NUMA node 0
numactl --cpunodebind=0 --membind=0 ./my-database-server

Memory Architecture

Server memory is organised in a hierarchy, from fastest (CPU registers) to slowest (disk):

Level	Size	Latency	Purpose
L1 Cache	32–64 KB per core	~1 ns	Hot data for current instruction
L2 Cache	256 KB–1 MB per core	~4 ns	Recent working set
L3 Cache	16–64 MB shared	~10 ns	Shared across cores
RAM (DDR5)	32 GB–6 TB	~80 ns	Active application data
NVMe SSD	1–30 TB	~100 μs	Persistent storage

Disk Systems

Storage performance is measured in three dimensions:

IOPS — input/output operations per second (random read/write speed)
Throughput — MB/s (sequential read/write speed)
Latency — time for a single I/O operation to complete

# Benchmark disk performance with fio
fio --name=randread --ioengine=libaio --iodepth=32 \
    --rw=randread --bs=4k --direct=1 --size=1G \
    --numjobs=4 --runtime=60 --group_reporting

# Check current disk I/O stats
iostat -x 1 5

Linux Server Administration

Linux powers over 90% of cloud servers. Understanding Linux process management and resource controls is essential for infrastructure engineering.

Process Isolation

Every application running on a Linux server is a process — an isolated instance of a running program with its own virtual memory space.

# View running processes with resource usage
ps aux --sort=-%mem | head -20

# Real-time process monitoring
top -bn1 | head -30

# Process tree (parent-child relationships)
pstree -p | head -30

# Check open file descriptors for a process
ls -la /proc/$(pgrep nginx | head -1)/fd/ 2>/dev/null | head -20

Resource Management

Linux provides several mechanisms to control how resources are allocated to processes:

# Set CPU affinity (pin process to specific cores)
taskset -c 0,1 ./my-application

# Set process priority (nice value: -20 to 19)
nice -n -10 ./high-priority-process

# Limit memory usage with ulimit
ulimit -v 4194304  # Limit virtual memory to 4GB

# View system resource limits
ulimit -a

cgroups & Namespaces: The Foundation of Containers

Two Linux kernel features make containers possible:

                            
                            cgroups (Control Groups) limit and account for resource usage (CPU, memory, disk I/O, network) for groups of processes. Namespaces isolate what processes can see (PIDs, network interfaces, mount points, hostnames). Together, they create lightweight isolation without a full VM.
                        

# Create a cgroup that limits CPU to 50%
sudo mkdir -p /sys/fs/cgroup/cpu/my-app
echo 50000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_quota_us
echo 100000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_period_us

# Create a cgroup that limits memory to 512MB
sudo mkdir -p /sys/fs/cgroup/memory/my-app
echo 536870912 | sudo tee /sys/fs/cgroup/memory/my-app/memory.limit_in_bytes

# Run a process inside a new namespace (isolated PID, network, mount)
sudo unshare --pid --net --mount --fork /bin/bash

The Four Compute Models

Modern infrastructure offers four distinct ways to run code, each with different trade-offs:

Compute Models — Abstraction vs Control

                                flowchart LR
                                    BM[Bare Metal
Full Control] --> VM[Virtual Machine
OS-Level Isolation]
                                    VM --> CT[Container
Process Isolation]
                                    CT --> SL[Serverless
Function-Level]

Model 1: Bare Metal

Direct access to physical hardware. No hypervisor overhead. Maximum performance but maximum operational burden.

When to use: High-frequency trading, GPU clusters for ML training, databases requiring predictable latency, workloads needing hardware-specific features (SR-IOV, DPDK).

Model 2: Virtual Machines

Isolated OS environments on shared hardware. Full operating system per VM. The workhorse of cloud computing.

# Provision a VM on AWS
aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type c5.2xlarge \
    --key-name my-key \
    --subnet-id subnet-abc123 \
    --security-group-ids sg-abc123

# Provision a VM on Azure
az vm create \
    --resource-group my-rg \
    --name my-vm \
    --image Ubuntu2204 \
    --size Standard_D4s_v3 \
    --admin-username azureuser \
    --generate-ssh-keys

Model 3: Containers

Application packages with dependencies, sharing the host kernel. Fast startup, high density, portable across environments.

# Build a container image
cat <<'EOF' > Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
USER node
CMD ["node", "server.js"]
EOF

docker build -t my-api:1.0 .
docker run -d -p 3000:3000 --memory=512m --cpus=1 my-api:1.0

Model 4: Serverless

Upload code, let the cloud handle everything else. Scales to zero when idle, scales to thousands when busy.

# Deploy a Lambda function
zip function.zip lambda_function.py

aws lambda create-function \
    --function-name process-order \
    --runtime python3.11 \
    --handler lambda_function.handler \
    --role arn:aws:iam::123456789:role/lambda-exec \
    --zip-file fileb://function.zip \
    --memory-size 256 \
    --timeout 30

# Invoke it
aws lambda invoke \
    --function-name process-order \
    --payload '{"order_id": "12345"}' \
    response.json

cat response.json

Exercises

Exercise 1 Server Exploration

Explore Your System’s Compute Resources

Run the following commands on a Linux system (or WSL on Windows) and document what you find about your CPU, memory, and processes:

Hands-On Linux

# CPU info
lscpu | grep -E "^(Architecture|CPU|Thread|Core|Socket|Model name|NUMA)"

# Memory info
free -h
cat /proc/meminfo | head -10

# Disk performance (read speed)
sudo hdparm -tT /dev/sda 2>/dev/null || echo "Try: dd if=/dev/zero of=/tmp/test bs=1M count=1024"

# Top 10 processes by memory
ps aux --sort=-%mem | head -10

Exercise 2 Instance Type Selection

Choose the Right Instance Type

For each workload, recommend the best AWS EC2 instance family and explain your reasoning:

A web application serving 1000 requests/second with moderate CPU needs
A PostgreSQL database with 500 GB of data and heavy random I/O
A machine learning training job using 8 GPUs
A batch processing job analysing 10 TB of log files
An in-memory Redis cache storing 256 GB of session data

Hint: Instance families — t3 (burstable), m5 (general), c5 (compute), r5 (memory), i3 (storage), p4 (GPU)

AWS Architecture

Conclusion & Next Steps

You now understand the compute domain from silicon to serverless:

Physical servers — CPUs, NUMA, memory hierarchy, disk performance
Linux fundamentals — processes, resource management, cgroups, namespaces
Four compute models — bare metal, VMs, containers, serverless
Instance type selection — matching workload requirements to hardware capabilities

Next in the Series

In Part 4: Virtualization Deep Dive, we explore hypervisors in detail — how they work, how vCPUs and virtual memory are implemented, and how cloud providers build their compute platforms on virtualization.

Previous Part 2: Evolution of Infrastructure Next Part 4: Virtualization Deep Dive

Cookie Consent

Part 3: Servers & Compute

Table of Contents

Introduction

Physical Servers

CPU Architecture

NUMA Topology

Memory Architecture

Disk Systems

Linux Server Administration

Process Isolation

Resource Management

cgroups & Namespaces: The Foundation of Containers

The Four Compute Models

Model 1: Bare Metal

Model 2: Virtual Machines

Model 3: Containers

Model 4: Serverless

Exercises

Explore Your System’s Compute Resources

Choose the Right Instance Type

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 3: Servers & Compute

Table of Contents

Introduction

Physical Servers

CPU Architecture

NUMA Topology

Memory Architecture

Disk Systems

Linux Server Administration

Process Isolation

Resource Management

cgroups & Namespaces: The Foundation of Containers

The Four Compute Models

Model 1: Bare Metal

Model 2: Virtual Machines

Model 3: Containers

Model 4: Serverless

Exercises

Explore Your System’s Compute Resources

Choose the Right Instance Type

Conclusion & Next Steps

Next in the Series

Continue the Series

Part 2: Evolution of Infrastructure

Part 4: Virtualization Deep Dive

Part 5: Infrastructure Networking