Back to Infrastructure & Cloud Automation Series

Part 3: Servers & Compute

May 14, 2026 Wasil Zafar 45 min read

Deep dive into the compute domain — CPU architectures, NUMA topology, memory systems, Linux server administration, and the four compute models powering modern infrastructure.

Table of Contents

  1. Introduction
  2. Physical Servers
  3. Linux Server Administration
  4. The Four Compute Models
  5. Exercises
  6. Conclusion & Next Steps

Introduction

Compute is the most fundamental infrastructure resource. Without it, nothing executes. Every application, database, queue, and orchestrator ultimately depends on a CPU executing instructions.

This part takes you from the physical transistors inside a server all the way to serverless functions. By the end, you will understand exactly what happens when you type aws ec2 run-instances or docker run.

Why This Matters: Even if you never touch physical hardware, understanding how CPUs, memory, and disk work helps you choose the right cloud instance types, debug performance issues, and write better infrastructure code. A t3.micro behaves very differently from a c5.4xlarge — and this part explains why.

Physical Servers

A modern server is a carefully engineered machine optimised for reliability, density, and performance. Understanding its components helps you make better decisions about cloud instance types.

CPU Architecture

The CPU is the brain of any server. Modern server CPUs (Intel Xeon, AMD EPYC, AWS Graviton) have fundamentally different architectures from desktop processors:

Feature Desktop CPU Server CPU
Core count 8–16 cores 32–128 cores
Memory channels 2 channels 8–12 channels
Max RAM 128 GB 2–6 TB
PCIe lanes 24–28 128+
ECC memory Optional Required
Socket support Single Dual/Quad
# Inspect CPU architecture on a Linux server
lscpu

# Example output (AWS c5.xlarge):
# Architecture:        x86_64
# CPU(s):              4
# Thread(s) per core:  2
# Core(s) per socket:  2
# Socket(s):           1
# NUMA node(s):        1
# Model name:          Intel(R) Xeon(R) Platinum 8275CL
# CPU MHz:             3000.000
# L1d cache:           32K
# L1i cache:           32K
# L2 cache:            1024K
# L3 cache:            36608K

NUMA Topology

Non-Uniform Memory Access (NUMA) is a memory architecture where CPUs have faster access to “local” memory and slower access to “remote” memory on another socket.

NUMA Architecture (Dual-Socket Server)
                                flowchart LR
                                    subgraph Socket0[Socket 0 — NUMA Node 0]
                                        CPU0[CPU Cores 0-31]
                                        MEM0[Local RAM
256 GB] end subgraph Socket1[Socket 1 — NUMA Node 1] CPU1[CPU Cores 32-63] MEM1[Local RAM
256 GB] end CPU0 --- MEM0 CPU1 --- MEM1 Socket0 <-->|"Interconnect
(slower)"| Socket1
Performance Impact: Accessing remote NUMA memory can be 40–100% slower than local memory. In cloud environments, this is why instance types specify “1 NUMA node” or “2 NUMA nodes” — workloads sensitive to memory latency (databases, caches) should be pinned to a single NUMA node.
# Check NUMA topology
numactl --hardware

# Example output:
# available: 2 nodes (0-1)
# node 0 cpus: 0 1 2 3 4 5 6 7
# node 0 size: 262144 MB
# node 1 cpus: 8 9 10 11 12 13 14 15
# node 1 size: 262144 MB
# node distances:
# node   0   1
#   0:  10  21
#   1:  21  10

# Pin a process to NUMA node 0
numactl --cpunodebind=0 --membind=0 ./my-database-server

Memory Architecture

Server memory is organised in a hierarchy, from fastest (CPU registers) to slowest (disk):

Level Size Latency Purpose
L1 Cache 32–64 KB per core ~1 ns Hot data for current instruction
L2 Cache 256 KB–1 MB per core ~4 ns Recent working set
L3 Cache 16–64 MB shared ~10 ns Shared across cores
RAM (DDR5) 32 GB–6 TB ~80 ns Active application data
NVMe SSD 1–30 TB ~100 μs Persistent storage

Disk Systems

Storage performance is measured in three dimensions:

  • IOPS — input/output operations per second (random read/write speed)
  • Throughput — MB/s (sequential read/write speed)
  • Latency — time for a single I/O operation to complete
# Benchmark disk performance with fio
fio --name=randread --ioengine=libaio --iodepth=32 \
    --rw=randread --bs=4k --direct=1 --size=1G \
    --numjobs=4 --runtime=60 --group_reporting

# Check current disk I/O stats
iostat -x 1 5

Linux Server Administration

Linux powers over 90% of cloud servers. Understanding Linux process management and resource controls is essential for infrastructure engineering.

Process Isolation

Every application running on a Linux server is a process — an isolated instance of a running program with its own virtual memory space.

# View running processes with resource usage
ps aux --sort=-%mem | head -20

# Real-time process monitoring
top -bn1 | head -30

# Process tree (parent-child relationships)
pstree -p | head -30

# Check open file descriptors for a process
ls -la /proc/$(pgrep nginx | head -1)/fd/ 2>/dev/null | head -20

Resource Management

Linux provides several mechanisms to control how resources are allocated to processes:

# Set CPU affinity (pin process to specific cores)
taskset -c 0,1 ./my-application

# Set process priority (nice value: -20 to 19)
nice -n -10 ./high-priority-process

# Limit memory usage with ulimit
ulimit -v 4194304  # Limit virtual memory to 4GB

# View system resource limits
ulimit -a

cgroups & Namespaces: The Foundation of Containers

Two Linux kernel features make containers possible:

cgroups (Control Groups) limit and account for resource usage (CPU, memory, disk I/O, network) for groups of processes. Namespaces isolate what processes can see (PIDs, network interfaces, mount points, hostnames). Together, they create lightweight isolation without a full VM.
# Create a cgroup that limits CPU to 50%
sudo mkdir -p /sys/fs/cgroup/cpu/my-app
echo 50000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_quota_us
echo 100000 | sudo tee /sys/fs/cgroup/cpu/my-app/cpu.cfs_period_us

# Create a cgroup that limits memory to 512MB
sudo mkdir -p /sys/fs/cgroup/memory/my-app
echo 536870912 | sudo tee /sys/fs/cgroup/memory/my-app/memory.limit_in_bytes

# Run a process inside a new namespace (isolated PID, network, mount)
sudo unshare --pid --net --mount --fork /bin/bash

The Four Compute Models

Modern infrastructure offers four distinct ways to run code, each with different trade-offs:

Compute Models — Abstraction vs Control
                                flowchart LR
                                    BM[Bare Metal
Full Control] --> VM[Virtual Machine
OS-Level Isolation] VM --> CT[Container
Process Isolation] CT --> SL[Serverless
Function-Level]

Model 1: Bare Metal

Direct access to physical hardware. No hypervisor overhead. Maximum performance but maximum operational burden.

When to use: High-frequency trading, GPU clusters for ML training, databases requiring predictable latency, workloads needing hardware-specific features (SR-IOV, DPDK).

Model 2: Virtual Machines

Isolated OS environments on shared hardware. Full operating system per VM. The workhorse of cloud computing.

# Provision a VM on AWS
aws ec2 run-instances \
    --image-id ami-0c55b159cbfafe1f0 \
    --instance-type c5.2xlarge \
    --key-name my-key \
    --subnet-id subnet-abc123 \
    --security-group-ids sg-abc123

# Provision a VM on Azure
az vm create \
    --resource-group my-rg \
    --name my-vm \
    --image Ubuntu2204 \
    --size Standard_D4s_v3 \
    --admin-username azureuser \
    --generate-ssh-keys

Model 3: Containers

Application packages with dependencies, sharing the host kernel. Fast startup, high density, portable across environments.

# Build a container image
cat <<'EOF' > Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
USER node
CMD ["node", "server.js"]
EOF

docker build -t my-api:1.0 .
docker run -d -p 3000:3000 --memory=512m --cpus=1 my-api:1.0

Model 4: Serverless

Upload code, let the cloud handle everything else. Scales to zero when idle, scales to thousands when busy.

# Deploy a Lambda function
zip function.zip lambda_function.py

aws lambda create-function \
    --function-name process-order \
    --runtime python3.11 \
    --handler lambda_function.handler \
    --role arn:aws:iam::123456789:role/lambda-exec \
    --zip-file fileb://function.zip \
    --memory-size 256 \
    --timeout 30

# Invoke it
aws lambda invoke \
    --function-name process-order \
    --payload '{"order_id": "12345"}' \
    response.json

cat response.json

Exercises

Exercise 1 Server Exploration
Explore Your System’s Compute Resources

Run the following commands on a Linux system (or WSL on Windows) and document what you find about your CPU, memory, and processes:

Hands-On Linux
# CPU info
lscpu | grep -E "^(Architecture|CPU|Thread|Core|Socket|Model name|NUMA)"

# Memory info
free -h
cat /proc/meminfo | head -10

# Disk performance (read speed)
sudo hdparm -tT /dev/sda 2>/dev/null || echo "Try: dd if=/dev/zero of=/tmp/test bs=1M count=1024"

# Top 10 processes by memory
ps aux --sort=-%mem | head -10
Exercise 2 Instance Type Selection
Choose the Right Instance Type

For each workload, recommend the best AWS EC2 instance family and explain your reasoning:

  1. A web application serving 1000 requests/second with moderate CPU needs
  2. A PostgreSQL database with 500 GB of data and heavy random I/O
  3. A machine learning training job using 8 GPUs
  4. A batch processing job analysing 10 TB of log files
  5. An in-memory Redis cache storing 256 GB of session data

Hint: Instance families — t3 (burstable), m5 (general), c5 (compute), r5 (memory), i3 (storage), p4 (GPU)

AWS Architecture

Conclusion & Next Steps

You now understand the compute domain from silicon to serverless:

  • Physical servers — CPUs, NUMA, memory hierarchy, disk performance
  • Linux fundamentals — processes, resource management, cgroups, namespaces
  • Four compute models — bare metal, VMs, containers, serverless
  • Instance type selection — matching workload requirements to hardware capabilities

Next in the Series

In Part 4: Virtualization Deep Dive, we explore hypervisors in detail — how they work, how vCPUs and virtual memory are implemented, and how cloud providers build their compute platforms on virtualization.