Introduction
The operating system kernel is the core of any OS—it manages hardware resources, provides services to applications, and enforces security boundaries. Understanding kernel architecture is essential for systems programming and performance optimization.
Series Context: This is Part 8 of 24 in the Computer Architecture & Operating Systems Mastery series. Having covered CPU execution and pipelining, we now transition to operating system fundamentals.
1
Part 1: Foundations of Computer Systems
System overview, architectures, OS role
2
Digital Logic & CPU Building Blocks
Gates, registers, datapath, microarchitecture
3
Instruction Set Architecture (ISA)
RISC vs CISC, instruction formats, addressing
4
Assembly Language & Machine Code
Registers, stack, calling conventions
5
Assemblers, Linkers & Loaders
Object files, ELF, dynamic linking
6
Compilers & Program Translation
Lexing, parsing, code generation
7
CPU Execution & Pipelining
Fetch-decode-execute, hazards, prediction
8
OS Architecture & Kernel Design
Monolithic, microkernel, system calls
You Are Here
9
Processes & Program Execution
Process lifecycle, PCB, fork/exec
10
Threads & Concurrency
Threading models, pthreads, race conditions
11
CPU Scheduling Algorithms
FCFS, RR, CFS, real-time scheduling
12
Synchronization & Coordination
Locks, semaphores, classic problems
13
Deadlocks & Prevention
Coffman conditions, Banker's algorithm
14
Memory Hierarchy & Cache
L1/L2/L3, cache coherence, NUMA
15
Memory Management Fundamentals
Address spaces, fragmentation, allocation
16
Virtual Memory & Paging
Page tables, TLB, demand paging
17
File Systems & Storage
Inodes, journaling, ext4, NTFS
18
I/O Systems & Device Drivers
Interrupts, DMA, disk scheduling
19
Multiprocessor Systems
SMP, NUMA, cache coherence
20
OS Security & Protection
Privilege levels, ASLR, sandboxing
21
Virtualization & Containers
Hypervisors, namespaces, cgroups
22
Advanced Kernel Internals
Linux subsystems, kernel debugging
23
Case Studies
Linux vs Windows vs macOS
24
Capstone Projects
Shell, thread pool, paging simulator
What is the Kernel?
The kernel is the core component of an operating system—the software that runs at the highest privilege level and has direct access to hardware. It's the bridge between applications and hardware.
OS Layer Model
Operating System Architecture:
══════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────┐
│ User Applications │
│ (Browser, Editor, Games, etc.) │
├─────────────────────────────────────────────────────────────┤
│ System Libraries │
│ (libc, libm, libpthread) │
├─────────────────────────────────────────────────────────────┤
│ System Call Interface │ ← User/Kernel boundary
╠═════════════════════════════════════════════════════════════╣
│ KERNEL │
│ ┌─────────────┬─────────────┬─────────────┬──────────────┐ │
│ │ Process │ Memory │ File │ Device │ │
│ │ Management │ Management │ Systems │ Drivers │ │
│ └─────────────┴─────────────┴─────────────┴──────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Hardware Abstraction Layer (HAL) │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ HARDWARE │
│ (CPU, Memory, Disks, Network) │
└─────────────────────────────────────────────────────────────┘
The kernel's responsibilities:
1. Process management: Create, schedule, terminate processes
2. Memory management: Virtual memory, allocation, protection
3. File systems: Organize data on storage devices
4. Device drivers: Communicate with hardware
5. Security: Enforce access controls and isolation
Kernel vs OS: The kernel is just one part of an OS. A complete OS also includes system utilities (ls, ps), libraries (libc), shells (bash), and services (systemd). Linux is technically just the kernel—GNU/Linux includes the complete OS.
Kernel Types
Operating systems use different kernel architectures, each with distinct trade-offs between performance, security, and complexity.
Monolithic Kernels
In a monolithic kernel, all OS services run in a single address space in kernel mode. The entire kernel is one large program.
Monolithic Kernel Architecture
Monolithic Kernel (Linux, FreeBSD, early Unix):
══════════════════════════════════════════════════════════════
User Space (Ring 3)
┌─────────────────────────────────────────────────────────────┐
│ Application A Application B Application C │
└─────────────────────────────────────────────────────────────┘
│ System Call │
════════════════════╪═════════════╪════════════════════════════
Kernel Space (Ring 0) ↓
┌─────────────────────────────────────────────────────────────┐
│ ┌───────────┬───────────┬───────────┬───────────┬────────┐ │
│ │ Scheduler │ Memory │ VFS │ Network │ IPC │ │
│ │ │ Manager │ │ Stack │ │ │
│ └───────────┴───────────┴───────────┴───────────┴────────┘ │
│ ┌───────────┬───────────┬───────────┬───────────┐ │
│ │ ext4 │ btrfs │ NFS │ procfs │ File │
│ │ driver │ driver │ driver │ driver │ Systems │
│ └───────────┴───────────┴───────────┴───────────┘ │
│ ┌───────────┬───────────┬───────────┬───────────┐ │
│ │ Disk │ Network │ USB │ GPU │ Device │
│ │ Driver │ Driver │ Driver │ Driver │ Drivers │
│ └───────────┴───────────┴───────────┴───────────┘ │
│ All run in Ring 0! │
└─────────────────────────────────────────────────────────────┘
Advantages:
✓ Fast - no context switches between kernel components
✓ Efficient - direct function calls, shared memory
✓ Proven - Linux, Unix have decades of refinement
Disadvantages:
✗ Large attack surface - bug anywhere can crash/compromise system
✗ No isolation - driver bug can corrupt entire kernel
✗ Hard to extend - changes require recompilation
Microkernels
A microkernel provides only minimal services (IPC, scheduling, memory primitives). Everything else runs in user space as servers.
Microkernel Architecture
Microkernel (Minix, QNX, seL4, L4):
══════════════════════════════════════════════════════════════
User Space (Ring 3)
┌─────────────────────────────────────────────────────────────┐
│ Application A Application B Application C │
│ │ │ │ │
│ └────────────────┼──────────────────┘ │
│ ↓ │
│ ┌────────────┬────────────┬────────────┬────────────┐ │
│ │ File System│ Network │ Device │ Memory │ │
│ │ Server │ Server │ Server │ Server │ │
│ │ (user) │ (user) │ (user) │ (user) │ │
│ └────────────┴────────────┴────────────┴────────────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ ↓ IPC │
└──────────────────────────┼──────────────────────────────────┘
═══════════════════════════╪══════════════════════════════════
Kernel Space (Ring 0) ↓
┌─────────────────────────────────────────────────────────────┐
│ ┌───────────────────────────────────────────────────────┐ │
│ │ MICROKERNEL (~10K lines of code) │ │
│ │ • Basic scheduling • IPC (message passing) │ │
│ │ • Address space mgmt • Basic memory primitives │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Advantages:
✓ Small trusted computing base - fewer bugs in Ring 0
✓ Isolation - driver crash doesn't crash kernel
✓ Security - minimal attack surface
✓ Flexibility - easy to swap/upgrade servers
Disadvantages:
✗ IPC overhead - user↔kernel↔user for every service
✗ Performance - historically 2-10x slower than monolithic
✗ Complexity - distributed system debugging is hard
Hybrid Kernels
Hybrid kernels combine aspects of both: a monolithic core with some services in user space. This is a practical compromise.
Hybrid Kernel (Windows NT, macOS XNU):
══════════════════════════════════════════════════════════════
Windows NT Architecture:
User Mode
┌─────────────────────────────────────────────────────────────┐
│ Win32 Apps .NET Apps Windows Subsystem for Linux │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ ┌───────────┬───────────┬───────────────────────┐ │
│ │ Win32 │ .NET │ LXSS (Pico │ Subsystem │
│ │ Subsystem │ Runtime │ Provider) │ Processes │
│ └───────────┴───────────┴───────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════
Kernel Mode
┌─────────────────────────────────────────────────────────────┐
│ Executive Layer (Object Manager, Security, I/O Manager) │
│ ┌───────────┬───────────┬───────────┬───────────────────┐ │
│ │ Process │ Memory │ Cache │ Plug & Play │ │
│ │ Manager │ Manager │ Manager │ Manager │ │
│ └───────────┴───────────┴───────────┴───────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Microkernel (scheduling, sync, IPC) │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Hardware Abstraction Layer (HAL) │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Hybrid = Microkernel ideas with monolithic performance
Kernel Architecture Comparison
| Aspect | Monolithic | Microkernel | Hybrid |
| Performance |
✅ Excellent |
⚠️ IPC overhead |
✅ Good |
| Security |
⚠️ Large TCB |
✅ Small TCB |
✅ Moderate |
| Reliability |
⚠️ One bug = crash |
✅ Isolated failures |
✅ Moderate |
| Code Size |
Large (~25M LOC) |
Tiny (~10K LOC) |
Medium |
| Examples |
Linux, FreeBSD |
QNX, seL4, MINIX |
Windows, macOS |
System Calls
Applications can't directly access hardware—they must request services from the kernel through system calls (syscalls). This is the controlled gateway between user space and kernel space.
System Call Mechanism
System Call Flow
System Call Execution (x86-64 Linux):
══════════════════════════════════════════════════════════════
Application calls write():
┌─────────────────────────────────────────────────────────────┐
│ User Space │
│ │
│ printf("Hello") │
│ │ │
│ ↓ │
│ libc: write(1, "Hello", 5) ← Library wrapper │
│ │ │
│ ↓ Prepare syscall │
│ mov rax, 1 # syscall number (1 = write) │
│ mov rdi, 1 # fd = stdout │
│ mov rsi, buf # buffer address │
│ mov rdx, 5 # count │
│ syscall # Trap to kernel! │
│ │ │
└───────┼──────────────────────────────────────────────────────┘
│ Hardware trap (interrupt)
│ • Save user registers to kernel stack
│ • Switch to kernel stack
│ • Change privilege level (Ring 3 → Ring 0)
↓
┌───────┴──────────────────────────────────────────────────────┐
│ Kernel Space │
│ │
│ syscall_entry: │
│ │ │
│ ↓ │
│ syscall_table[rax]() ← Look up handler by syscall # │
│ │ │
│ ↓ │
│ sys_write(fd=1, buf="Hello", count=5) │
│ │ │
│ ↓ Validate parameters, perform operation │
│ • Check fd is valid file descriptor │
│ • Check buffer is in user's address space │
│ • Write data to stdout │
│ │ │
│ ↓ │
│ Return value → rax (bytes written, or -errno) │
│ │ │
│ sysret / iret ← Return to user mode │
│ │ │
└───────┼──────────────────────────────────────────────────────┘
↓
┌───────┴──────────────────────────────────────────────────────┐
│ User Space │
│ • Restore user registers │
│ • Resume at instruction after syscall │
│ • Check return value (rax) │
└─────────────────────────────────────────────────────────────┘
Categories of System Calls
System Call Categories:
══════════════════════════════════════════════════════════════
1. PROCESS CONTROL
├── fork() Create new process (copy parent)
├── exec() Replace process image with new program
├── wait() Wait for child process to terminate
├── exit() Terminate current process
└── kill() Send signal to process
2. FILE OPERATIONS
├── open() Open file, return file descriptor
├── close() Close file descriptor
├── read() Read bytes from file descriptor
├── write() Write bytes to file descriptor
├── lseek() Move read/write position
└── stat() Get file metadata
3. DEVICE MANAGEMENT
├── ioctl() Device-specific control operations
├── mmap() Map file/device to memory
└── poll() Wait for events on file descriptors
4. INFORMATION MAINTENANCE
├── getpid() Get process ID
├── getuid() Get user ID
├── time() Get current time
└── uname() Get system information
5. COMMUNICATION
├── pipe() Create pipe for IPC
├── socket() Create network socket
├── connect() Connect to remote socket
└── sendmsg() Send message on socket
6. MEMORY MANAGEMENT
├── brk() Change data segment size
├── mmap() Map memory (anonymous or file-backed)
└── mprotect() Set memory protection
Implementation Details
Tracing System Calls
# Linux: strace shows all system calls made by a program
$ strace -c ls /tmp
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
29.41 0.000050 50 1 execve
17.65 0.000030 3 10 mmap
11.76 0.000020 3 7 close
11.76 0.000020 2 8 fstat
5.88 0.000010 2 5 openat
5.88 0.000010 2 5 read
...
------ ----------- ----------- --------- --------- ----------------
100.00 0.000170 2 72 4 total
# Show individual syscalls with arguments
$ strace ls /tmp 2>&1 | head -20
execve("/bin/ls", ["ls", "/tmp"], 0x7ffd... /* 50 vars */) = 0
brk(NULL) = 0x55a8c8b52000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8
openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents64(3, /* 15 entries */, 32768) = 456
write(1, "file1.txt file2.txt\n", 21) = 21
close(3) = 0
Syscall Cost: Each system call costs ~100-1000 CPU cycles due to mode switch, cache effects, and register saving. That's why buffered I/O (fwrite vs write) is faster—it batches many small writes into fewer syscalls.
Interrupt Handling
Interrupts are signals that demand the CPU's immediate attention. They're how hardware devices communicate with the processor and how the OS implements multitasking.
Interrupt Types
Types of Interrupts
Interrupt Classification:
══════════════════════════════════════════════════════════════
1. HARDWARE INTERRUPTS (External, Asynchronous)
┌─────────────────────────────────────────────────────────┐
│ Device signals CPU via interrupt request (IRQ) line │
│ │
│ Examples: │
│ • Timer (IRQ 0) - Periodic tick for scheduling │
│ • Keyboard (IRQ 1) - Key press/release │
│ • Disk (IRQ 14/15) - I/O operation complete │
│ • Network (IRQ 11) - Packet arrived │
│ • USB (varies) - Device connected/data ready │
└─────────────────────────────────────────────────────────┘
2. SOFTWARE INTERRUPTS / TRAPS (Internal, Synchronous)
┌─────────────────────────────────────────────────────────┐
│ Triggered by executing instruction (INT, SYSCALL) │
│ │
│ Examples: │
│ • System calls (int 0x80 or syscall instruction) │
│ • Breakpoints (int 3) for debuggers │
│ • Software-initiated timer │
└─────────────────────────────────────────────────────────┘
3. EXCEPTIONS (CPU-generated, Synchronous)
┌─────────────────────────────────────────────────────────┐
│ CPU detects error condition during instruction │
│ │
│ Types: │
│ • Faults (recoverable): Page fault, GPF │
│ → Handler fixes problem, instruction re-executed │
│ • Traps (intentional): Breakpoint, syscall │
│ → Resume at next instruction │
│ • Aborts (fatal): Hardware error, double fault │
│ → Process/system terminated │
│ │
│ Common exceptions (x86): │
│ • #DE (0) - Divide by zero │
│ • #PF (14) - Page fault │
│ • #GP (13) - General protection fault │
│ • #UD (6) - Invalid opcode │
└─────────────────────────────────────────────────────────┘
Interrupt Processing
Interrupt Handling Sequence:
══════════════════════════════════════════════════════════════
1. Device raises interrupt (IRQ line goes high)
↓
2. CPU finishes current instruction
↓
3. CPU checks if interrupts are enabled (IF flag)
↓
4. CPU pushes state to stack:
• Flags register (EFLAGS/RFLAGS)
• Code segment (CS)
• Instruction pointer (EIP/RIP)
• (For privilege change: SS, ESP/RSP)
↓
5. CPU looks up handler in Interrupt Descriptor Table (IDT)
Handler address = IDT[interrupt_number]
↓
6. CPU jumps to interrupt handler (ISR)
• Interrupts may be disabled
• Running in kernel mode
↓
7. ISR executes:
• Save additional registers (if needed)
• Identify interrupt source
• Handle the interrupt
• Acknowledge interrupt controller
• Restore registers
↓
8. ISR executes IRET instruction:
• Pop RIP, CS, RFLAGS from stack
• Resume interrupted code
↓
9. Original program continues (unaware of interruption)
Interrupt Controllers
APIC Architecture
Modern Interrupt Architecture (x86 APIC):
══════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────┐
│ Devices │
│ (Keyboard, Disk, Network, USB, etc.) │
└──────────────┬───────────────────────────────┘
│ Interrupt signals
▼
┌──────────────────────────────────────────────┐
│ I/O APIC │
│ (Routes device interrupts to CPUs) │
│ • Interrupt redirection table │
│ • Priority-based routing │
│ • Multi-CPU distribution │
└──────────────┬───────────────────────────────┘
│ Messages over system bus
┌──────────────┴───────────────┬───────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Local APIC │ │ Local APIC │ │ Local APIC │
│ (CPU 0) │ │ (CPU 1) │ │ (CPU N) │
│ │ │ │ │ │
│ • Timer │ │ • Timer │ │ • Timer │
│ • IPI │ │ • IPI │ │ • IPI │
│ • Priority │ │ • Priority │ │ • Priority │
└──────────────┘ └──────────────┘ └──────────────┘
Each CPU has a Local APIC for:
• Local timer interrupts (scheduling tick)
• Inter-Processor Interrupts (IPI) for CPU-to-CPU signaling
• Interrupt prioritization
Privilege Levels
CPUs implement hardware-enforced privilege levels to isolate the kernel from user programs. This is fundamental to OS security.
Protection Rings
x86 Protection Rings
x86 Protection Rings:
══════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────┐
│ Ring 3 │
│ User Applications │
│ (Least privileged, restricted access) │
│ ┌──────────────────────────────────────────────┐ │
│ │ Ring 2 │ │
│ │ Device Drivers (rarely used) │ │
│ │ ┌──────────────────────────────────────┐ │ │
│ │ │ Ring 1 │ │ │
│ │ │ Device Drivers (rarely used) │ │ │
│ │ │ ┌──────────────────────────────┐ │ │ │
│ │ │ │ Ring 0 │ │ │ │
│ │ │ │ Operating System Kernel │ │ │ │
│ │ │ │ (Most privileged, full │ │ │ │
│ │ │ │ hardware access) │ │ │ │
│ │ │ └──────────────────────────────┘ │ │ │
│ │ └──────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
In practice, most OSes use only Ring 0 and Ring 3:
• Ring 0: Kernel (full access to CPU, memory, I/O)
• Ring 3: User applications (restricted)
Ring 0 can: Ring 3 cannot:
✓ Execute privileged instr ✗ CLI/STI (interrupt control)
✓ Access I/O ports ✗ IN/OUT (direct I/O)
✓ Access all memory ✗ Access kernel memory
✓ Modify page tables ✗ MOV to CR3 (page table base)
✓ Change privilege level ✗ Direct hardware access
Mode Switching
Mode Switching (Ring 3 ↔ Ring 0):
══════════════════════════════════════════════════════════════
User Mode (Ring 3) Kernel Mode (Ring 0)
┌───────────────────┐ ┌───────────────────┐
│ Application │ │ Kernel │
│ │ syscall │ │
│ printf("Hi") ────┼───────────→│ sys_write() │
│ │ │ │
│ │ sysret │ │
│ (continues) ←───┼────────────│ return │
│ │ │ │
└───────────────────┘ └───────────────────┘
Transitions User → Kernel:
1. System call (syscall/int 0x80) - Intentional request
2. Exception (page fault, div-by-0) - Error condition
3. Hardware interrupt (timer, I/O) - External event
Transitions Kernel → User:
1. Return from syscall (sysret/iret)
2. Return from exception handler
3. Return from interrupt handler
4. Starting new user process
Mode Switch Overhead:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Operation Cycles (approximate)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Save/restore registers ~20-50 cycles
Change privilege level ~50-100 cycles
TLB flush (if needed) ~100-1000 cycles
Cache effects Variable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total syscall cost ~200-1000 cycles
Security Foundation: Privilege separation is the foundation of OS security. Without it, any program could read your passwords, install rootkits, or crash the system. Hardware enforcement means even a buggy/malicious program can't bypass these restrictions.
Conclusion & Next Steps
We've explored the fundamental architecture of operating system kernels—the software that bridges applications and hardware. Key takeaways:
- Kernel Types: Monolithic (Linux) for performance, microkernels (seL4) for security, hybrids (Windows) for balance
- System Calls: The controlled gateway between user space and kernel space (~200-1000 cycles each)
- Interrupts: Hardware signals, software traps, and exceptions that demand CPU attention
- Privilege Levels: Hardware-enforced isolation (Ring 0/Ring 3) that protects the system
Key Insight: The kernel is trusted code that runs with full hardware access. This design (user/kernel separation) is why you can run untrusted programs without them crashing your system or stealing your data.
Next in the Series
In Part 9: Processes & Program Execution, we'll explore how the kernel creates and manages processes—the running instances of programs—including process states, context switching, and the fork/exec model.