Back to Technology

Part 8: OS Architecture & Kernel Design

January 31, 2026 Wasil Zafar 30 min read

Explore operating system architectures—from monolithic kernels to microkernels—and understand system calls, interrupt handling, and privilege levels.

Table of Contents

  1. Introduction
  2. Kernel Types
  3. System Calls
  4. Interrupt Handling
  5. Privilege Levels
  6. Conclusion & Next Steps

Introduction

The operating system kernel is the core of any OS—it manages hardware resources, provides services to applications, and enforces security boundaries. Understanding kernel architecture is essential for systems programming and performance optimization.

Series Context: This is Part 8 of 24 in the Computer Architecture & Operating Systems Mastery series. Having covered CPU execution and pipelining, we now transition to operating system fundamentals.

Computer Architecture & OS Mastery

Your 24-step learning path • Currently on Step 8
1
Part 1: Foundations of Computer Systems
System overview, architectures, OS role
2
Digital Logic & CPU Building Blocks
Gates, registers, datapath, microarchitecture
3
Instruction Set Architecture (ISA)
RISC vs CISC, instruction formats, addressing
4
Assembly Language & Machine Code
Registers, stack, calling conventions
5
Assemblers, Linkers & Loaders
Object files, ELF, dynamic linking
6
Compilers & Program Translation
Lexing, parsing, code generation
7
CPU Execution & Pipelining
Fetch-decode-execute, hazards, prediction
8
OS Architecture & Kernel Design
Monolithic, microkernel, system calls
You Are Here
9
Processes & Program Execution
Process lifecycle, PCB, fork/exec
10
Threads & Concurrency
Threading models, pthreads, race conditions
11
CPU Scheduling Algorithms
FCFS, RR, CFS, real-time scheduling
12
Synchronization & Coordination
Locks, semaphores, classic problems
13
Deadlocks & Prevention
Coffman conditions, Banker's algorithm
14
Memory Hierarchy & Cache
L1/L2/L3, cache coherence, NUMA
15
Memory Management Fundamentals
Address spaces, fragmentation, allocation
16
Virtual Memory & Paging
Page tables, TLB, demand paging
17
File Systems & Storage
Inodes, journaling, ext4, NTFS
18
I/O Systems & Device Drivers
Interrupts, DMA, disk scheduling
19
Multiprocessor Systems
SMP, NUMA, cache coherence
20
OS Security & Protection
Privilege levels, ASLR, sandboxing
21
Virtualization & Containers
Hypervisors, namespaces, cgroups
22
Advanced Kernel Internals
Linux subsystems, kernel debugging
23
Case Studies
Linux vs Windows vs macOS
24
Capstone Projects
Shell, thread pool, paging simulator

What is the Kernel?

The kernel is the core component of an operating system—the software that runs at the highest privilege level and has direct access to hardware. It's the bridge between applications and hardware.

OS Layer Model

Operating System Architecture:
══════════════════════════════════════════════════════════════

┌─────────────────────────────────────────────────────────────┐
│                      User Applications                       │
│              (Browser, Editor, Games, etc.)                  │
├─────────────────────────────────────────────────────────────┤
│                    System Libraries                          │
│               (libc, libm, libpthread)                       │
├─────────────────────────────────────────────────────────────┤
│                    System Call Interface                     │  ← User/Kernel boundary
╠═════════════════════════════════════════════════════════════╣
│                         KERNEL                               │
│  ┌─────────────┬─────────────┬─────────────┬──────────────┐ │
│  │  Process    │   Memory    │    File     │    Device    │ │
│  │  Management │  Management │   Systems   │   Drivers    │ │
│  └─────────────┴─────────────┴─────────────┴──────────────┘ │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │         Hardware Abstraction Layer (HAL)                │ │
│  └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                        HARDWARE                              │
│              (CPU, Memory, Disks, Network)                   │
└─────────────────────────────────────────────────────────────┘

The kernel's responsibilities:
1. Process management: Create, schedule, terminate processes
2. Memory management: Virtual memory, allocation, protection
3. File systems: Organize data on storage devices
4. Device drivers: Communicate with hardware
5. Security: Enforce access controls and isolation
Kernel vs OS: The kernel is just one part of an OS. A complete OS also includes system utilities (ls, ps), libraries (libc), shells (bash), and services (systemd). Linux is technically just the kernel—GNU/Linux includes the complete OS.

Kernel Types

Operating systems use different kernel architectures, each with distinct trade-offs between performance, security, and complexity.

Monolithic Kernels

In a monolithic kernel, all OS services run in a single address space in kernel mode. The entire kernel is one large program.

Monolithic Kernel Architecture

Monolithic Kernel (Linux, FreeBSD, early Unix):
══════════════════════════════════════════════════════════════

User Space (Ring 3)
┌─────────────────────────────────────────────────────────────┐
│  Application A      Application B      Application C        │
└─────────────────────────────────────────────────────────────┘
                    │ System Call │
════════════════════╪═════════════╪════════════════════════════
Kernel Space (Ring 0)        ↓
┌─────────────────────────────────────────────────────────────┐
│ ┌───────────┬───────────┬───────────┬───────────┬────────┐ │
│ │ Scheduler │  Memory   │   VFS     │  Network  │  IPC   │ │
│ │           │  Manager  │           │  Stack    │        │ │
│ └───────────┴───────────┴───────────┴───────────┴────────┘ │
│ ┌───────────┬───────────┬───────────┬───────────┐          │
│ │   ext4    │   btrfs   │    NFS    │   procfs  │ File    │
│ │  driver   │  driver   │  driver   │  driver   │ Systems │
│ └───────────┴───────────┴───────────┴───────────┘          │
│ ┌───────────┬───────────┬───────────┬───────────┐          │
│ │   Disk    │  Network  │   USB     │   GPU     │ Device  │
│ │  Driver   │  Driver   │  Driver   │  Driver   │ Drivers │
│ └───────────┴───────────┴───────────┴───────────┘          │
│                    All run in Ring 0!                       │
└─────────────────────────────────────────────────────────────┘

Advantages:
✓ Fast - no context switches between kernel components
✓ Efficient - direct function calls, shared memory
✓ Proven - Linux, Unix have decades of refinement

Disadvantages:
✗ Large attack surface - bug anywhere can crash/compromise system
✗ No isolation - driver bug can corrupt entire kernel
✗ Hard to extend - changes require recompilation

Microkernels

A microkernel provides only minimal services (IPC, scheduling, memory primitives). Everything else runs in user space as servers.

Microkernel Architecture

Microkernel (Minix, QNX, seL4, L4):
══════════════════════════════════════════════════════════════

User Space (Ring 3)
┌─────────────────────────────────────────────────────────────┐
│  Application A      Application B      Application C        │
│         │                 │                  │               │
│         └────────────────┼──────────────────┘               │
│                          ↓                                   │
│  ┌────────────┬────────────┬────────────┬────────────┐      │
│  │ File System│   Network  │   Device   │   Memory   │      │
│  │   Server   │   Server   │   Server   │   Server   │      │
│  │  (user)    │   (user)   │   (user)   │   (user)   │      │
│  └────────────┴────────────┴────────────┴────────────┘      │
│         │            │            │            │             │
│         └────────────┴────────────┴────────────┘             │
│                          ↓ IPC                               │
└──────────────────────────┼──────────────────────────────────┘
═══════════════════════════╪══════════════════════════════════
Kernel Space (Ring 0)      ↓
┌─────────────────────────────────────────────────────────────┐
│  ┌───────────────────────────────────────────────────────┐  │
│  │           MICROKERNEL (~10K lines of code)            │  │
│  │  • Basic scheduling    • IPC (message passing)       │  │
│  │  • Address space mgmt  • Basic memory primitives     │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Advantages:
✓ Small trusted computing base - fewer bugs in Ring 0
✓ Isolation - driver crash doesn't crash kernel
✓ Security - minimal attack surface
✓ Flexibility - easy to swap/upgrade servers

Disadvantages:
✗ IPC overhead - user↔kernel↔user for every service
✗ Performance - historically 2-10x slower than monolithic
✗ Complexity - distributed system debugging is hard

Hybrid Kernels

Hybrid kernels combine aspects of both: a monolithic core with some services in user space. This is a practical compromise.

Hybrid Kernel (Windows NT, macOS XNU):
══════════════════════════════════════════════════════════════
Windows NT Architecture:

User Mode
┌─────────────────────────────────────────────────────────────┐
│  Win32 Apps      .NET Apps      Windows Subsystem for Linux │
│         │            │                     │                 │
│         ↓            ↓                     ↓                 │
│  ┌───────────┬───────────┬───────────────────────┐          │
│  │  Win32    │   .NET    │    LXSS (Pico        │ Subsystem │
│  │ Subsystem │  Runtime  │    Provider)         │ Processes │
│  └───────────┴───────────┴───────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════
Kernel Mode
┌─────────────────────────────────────────────────────────────┐
│  Executive Layer (Object Manager, Security, I/O Manager)    │
│  ┌───────────┬───────────┬───────────┬───────────────────┐  │
│  │ Process   │  Memory   │   Cache   │   Plug & Play     │  │
│  │ Manager   │  Manager  │  Manager  │    Manager        │  │
│  └───────────┴───────────┴───────────┴───────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │            Microkernel (scheduling, sync, IPC)        │  │
│  └───────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │        Hardware Abstraction Layer (HAL)               │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Hybrid = Microkernel ideas with monolithic performance

Kernel Architecture Comparison

AspectMonolithicMicrokernelHybrid
Performance ✅ Excellent ⚠️ IPC overhead ✅ Good
Security ⚠️ Large TCB ✅ Small TCB ✅ Moderate
Reliability ⚠️ One bug = crash ✅ Isolated failures ✅ Moderate
Code Size Large (~25M LOC) Tiny (~10K LOC) Medium
Examples Linux, FreeBSD QNX, seL4, MINIX Windows, macOS

System Calls

Applications can't directly access hardware—they must request services from the kernel through system calls (syscalls). This is the controlled gateway between user space and kernel space.

System Call Mechanism

System Call Flow

System Call Execution (x86-64 Linux):
══════════════════════════════════════════════════════════════

Application calls write():
┌─────────────────────────────────────────────────────────────┐
│  User Space                                                  │
│                                                              │
│  printf("Hello")                                             │
│       │                                                      │
│       ↓                                                      │
│  libc: write(1, "Hello", 5)    ← Library wrapper            │
│       │                                                      │
│       ↓ Prepare syscall                                      │
│  mov rax, 1        # syscall number (1 = write)             │
│  mov rdi, 1        # fd = stdout                            │
│  mov rsi, buf      # buffer address                         │
│  mov rdx, 5        # count                                  │
│  syscall           # Trap to kernel!                        │
│       │                                                      │
└───────┼──────────────────────────────────────────────────────┘
        │ Hardware trap (interrupt)
        │ • Save user registers to kernel stack
        │ • Switch to kernel stack
        │ • Change privilege level (Ring 3 → Ring 0)
        ↓
┌───────┴──────────────────────────────────────────────────────┐
│  Kernel Space                                                │
│                                                              │
│  syscall_entry:                                              │
│       │                                                      │
│       ↓                                                      │
│  syscall_table[rax]()    ← Look up handler by syscall #     │
│       │                                                      │
│       ↓                                                      │
│  sys_write(fd=1, buf="Hello", count=5)                      │
│       │                                                      │
│       ↓ Validate parameters, perform operation              │
│  • Check fd is valid file descriptor                        │
│  • Check buffer is in user's address space                  │
│  • Write data to stdout                                     │
│       │                                                      │
│       ↓                                                      │
│  Return value → rax    (bytes written, or -errno)           │
│       │                                                      │
│  sysret / iret         ← Return to user mode                │
│       │                                                      │
└───────┼──────────────────────────────────────────────────────┘
        ↓
┌───────┴──────────────────────────────────────────────────────┐
│  User Space                                                  │
│  • Restore user registers                                   │
│  • Resume at instruction after syscall                      │
│  • Check return value (rax)                                 │
└─────────────────────────────────────────────────────────────┘

Categories of System Calls

System Call Categories:
══════════════════════════════════════════════════════════════

1. PROCESS CONTROL
   ├── fork()      Create new process (copy parent)
   ├── exec()      Replace process image with new program
   ├── wait()      Wait for child process to terminate
   ├── exit()      Terminate current process
   └── kill()      Send signal to process

2. FILE OPERATIONS
   ├── open()      Open file, return file descriptor
   ├── close()     Close file descriptor
   ├── read()      Read bytes from file descriptor
   ├── write()     Write bytes to file descriptor
   ├── lseek()     Move read/write position
   └── stat()      Get file metadata

3. DEVICE MANAGEMENT
   ├── ioctl()     Device-specific control operations
   ├── mmap()      Map file/device to memory
   └── poll()      Wait for events on file descriptors

4. INFORMATION MAINTENANCE
   ├── getpid()    Get process ID
   ├── getuid()    Get user ID
   ├── time()      Get current time
   └── uname()     Get system information

5. COMMUNICATION
   ├── pipe()      Create pipe for IPC
   ├── socket()    Create network socket
   ├── connect()   Connect to remote socket
   └── sendmsg()   Send message on socket

6. MEMORY MANAGEMENT
   ├── brk()       Change data segment size
   ├── mmap()      Map memory (anonymous or file-backed)
   └── mprotect()  Set memory protection

Implementation Details

Tracing System Calls

# Linux: strace shows all system calls made by a program
$ strace -c ls /tmp
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 29.41    0.000050          50         1           execve
 17.65    0.000030           3        10           mmap
 11.76    0.000020           3         7           close
 11.76    0.000020           2         8           fstat
  5.88    0.000010           2         5           openat
  5.88    0.000010           2         5           read
...
------ ----------- ----------- --------- --------- ----------------
100.00    0.000170           2        72         4 total

# Show individual syscalls with arguments
$ strace ls /tmp 2>&1 | head -20
execve("/bin/ls", ["ls", "/tmp"], 0x7ffd... /* 50 vars */) = 0
brk(NULL)                               = 0x55a8c8b52000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8
openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents64(3, /* 15 entries */, 32768)  = 456
write(1, "file1.txt  file2.txt\n", 21) = 21
close(3)                                = 0
Syscall Cost: Each system call costs ~100-1000 CPU cycles due to mode switch, cache effects, and register saving. That's why buffered I/O (fwrite vs write) is faster—it batches many small writes into fewer syscalls.

Interrupt Handling

Interrupts are signals that demand the CPU's immediate attention. They're how hardware devices communicate with the processor and how the OS implements multitasking.

Interrupt Types

Types of Interrupts

Interrupt Classification:
══════════════════════════════════════════════════════════════

1. HARDWARE INTERRUPTS (External, Asynchronous)
   ┌─────────────────────────────────────────────────────────┐
   │ Device signals CPU via interrupt request (IRQ) line    │
   │                                                         │
   │ Examples:                                               │
   │ • Timer (IRQ 0)  - Periodic tick for scheduling        │
   │ • Keyboard (IRQ 1) - Key press/release                 │
   │ • Disk (IRQ 14/15) - I/O operation complete            │
   │ • Network (IRQ 11) - Packet arrived                    │
   │ • USB (varies) - Device connected/data ready           │
   └─────────────────────────────────────────────────────────┘

2. SOFTWARE INTERRUPTS / TRAPS (Internal, Synchronous)
   ┌─────────────────────────────────────────────────────────┐
   │ Triggered by executing instruction (INT, SYSCALL)      │
   │                                                         │
   │ Examples:                                               │
   │ • System calls (int 0x80 or syscall instruction)       │
   │ • Breakpoints (int 3) for debuggers                    │
   │ • Software-initiated timer                             │
   └─────────────────────────────────────────────────────────┘

3. EXCEPTIONS (CPU-generated, Synchronous)
   ┌─────────────────────────────────────────────────────────┐
   │ CPU detects error condition during instruction         │
   │                                                         │
   │ Types:                                                  │
   │ • Faults (recoverable): Page fault, GPF               │
   │   → Handler fixes problem, instruction re-executed    │
   │ • Traps (intentional): Breakpoint, syscall            │
   │   → Resume at next instruction                        │
   │ • Aborts (fatal): Hardware error, double fault        │
   │   → Process/system terminated                         │
   │                                                         │
   │ Common exceptions (x86):                               │
   │ • #DE (0) - Divide by zero                            │
   │ • #PF (14) - Page fault                               │
   │ • #GP (13) - General protection fault                 │
   │ • #UD (6) - Invalid opcode                            │
   └─────────────────────────────────────────────────────────┘

Interrupt Processing

Interrupt Handling Sequence:
══════════════════════════════════════════════════════════════

1. Device raises interrupt (IRQ line goes high)
         ↓
2. CPU finishes current instruction
         ↓
3. CPU checks if interrupts are enabled (IF flag)
         ↓
4. CPU pushes state to stack:
   • Flags register (EFLAGS/RFLAGS)
   • Code segment (CS)
   • Instruction pointer (EIP/RIP)
   • (For privilege change: SS, ESP/RSP)
         ↓
5. CPU looks up handler in Interrupt Descriptor Table (IDT)
   Handler address = IDT[interrupt_number]
         ↓
6. CPU jumps to interrupt handler (ISR)
   • Interrupts may be disabled
   • Running in kernel mode
         ↓
7. ISR executes:
   • Save additional registers (if needed)
   • Identify interrupt source
   • Handle the interrupt
   • Acknowledge interrupt controller
   • Restore registers
         ↓
8. ISR executes IRET instruction:
   • Pop RIP, CS, RFLAGS from stack
   • Resume interrupted code
         ↓
9. Original program continues (unaware of interruption)

Interrupt Controllers

APIC Architecture

Modern Interrupt Architecture (x86 APIC):
══════════════════════════════════════════════════════════════

            ┌──────────────────────────────────────────────┐
            │              Devices                          │
            │  (Keyboard, Disk, Network, USB, etc.)        │
            └──────────────┬───────────────────────────────┘
                           │ Interrupt signals
                           ▼
            ┌──────────────────────────────────────────────┐
            │            I/O APIC                          │
            │  (Routes device interrupts to CPUs)          │
            │  • Interrupt redirection table               │
            │  • Priority-based routing                    │
            │  • Multi-CPU distribution                    │
            └──────────────┬───────────────────────────────┘
                           │ Messages over system bus
            ┌──────────────┴───────────────┬───────────────┐
            ▼                              ▼               ▼
    ┌──────────────┐              ┌──────────────┐  ┌──────────────┐
    │  Local APIC  │              │  Local APIC  │  │  Local APIC  │
    │   (CPU 0)    │              │   (CPU 1)    │  │   (CPU N)    │
    │              │              │              │  │              │
    │ • Timer      │              │ • Timer      │  │ • Timer      │
    │ • IPI        │              │ • IPI        │  │ • IPI        │
    │ • Priority   │              │ • Priority   │  │ • Priority   │
    └──────────────┘              └──────────────┘  └──────────────┘

Each CPU has a Local APIC for:
• Local timer interrupts (scheduling tick)
• Inter-Processor Interrupts (IPI) for CPU-to-CPU signaling
• Interrupt prioritization

Privilege Levels

CPUs implement hardware-enforced privilege levels to isolate the kernel from user programs. This is fundamental to OS security.

Protection Rings

x86 Protection Rings

x86 Protection Rings:
══════════════════════════════════════════════════════════════

     ┌──────────────────────────────────────────────────────┐
     │                    Ring 3                            │
     │              User Applications                       │
     │      (Least privileged, restricted access)           │
     │   ┌──────────────────────────────────────────────┐   │
     │   │              Ring 2                          │   │
     │   │         Device Drivers (rarely used)        │   │
     │   │   ┌──────────────────────────────────────┐   │   │
     │   │   │           Ring 1                     │   │   │
     │   │   │    Device Drivers (rarely used)     │   │   │
     │   │   │   ┌──────────────────────────────┐   │   │   │
     │   │   │   │        Ring 0                │   │   │   │
     │   │   │   │   Operating System Kernel    │   │   │   │
     │   │   │   │  (Most privileged, full      │   │   │   │
     │   │   │   │   hardware access)           │   │   │   │
     │   │   │   └──────────────────────────────┘   │   │   │
     │   │   └──────────────────────────────────────┘   │   │
     │   └──────────────────────────────────────────────┘   │
     └──────────────────────────────────────────────────────┘

In practice, most OSes use only Ring 0 and Ring 3:
• Ring 0: Kernel (full access to CPU, memory, I/O)
• Ring 3: User applications (restricted)

Ring 0 can:                      Ring 3 cannot:
✓ Execute privileged instr       ✗ CLI/STI (interrupt control)
✓ Access I/O ports              ✗ IN/OUT (direct I/O)
✓ Access all memory             ✗ Access kernel memory
✓ Modify page tables            ✗ MOV to CR3 (page table base)
✓ Change privilege level        ✗ Direct hardware access

Mode Switching

Mode Switching (Ring 3 ↔ Ring 0):
══════════════════════════════════════════════════════════════

User Mode (Ring 3)               Kernel Mode (Ring 0)
┌───────────────────┐            ┌───────────────────┐
│ Application       │            │ Kernel            │
│                   │   syscall  │                   │
│ printf("Hi")  ────┼───────────→│ sys_write()       │
│                   │            │                   │
│                   │   sysret   │                   │
│ (continues)   ←───┼────────────│ return            │
│                   │            │                   │
└───────────────────┘            └───────────────────┘

Transitions User → Kernel:
1. System call (syscall/int 0x80) - Intentional request
2. Exception (page fault, div-by-0) - Error condition
3. Hardware interrupt (timer, I/O) - External event

Transitions Kernel → User:
1. Return from syscall (sysret/iret)
2. Return from exception handler
3. Return from interrupt handler
4. Starting new user process

Mode Switch Overhead:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Operation                          Cycles (approximate)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Save/restore registers             ~20-50 cycles
Change privilege level             ~50-100 cycles  
TLB flush (if needed)              ~100-1000 cycles
Cache effects                      Variable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total syscall cost                 ~200-1000 cycles
Security Foundation: Privilege separation is the foundation of OS security. Without it, any program could read your passwords, install rootkits, or crash the system. Hardware enforcement means even a buggy/malicious program can't bypass these restrictions.

Conclusion & Next Steps

We've explored the fundamental architecture of operating system kernels—the software that bridges applications and hardware. Key takeaways:

  • Kernel Types: Monolithic (Linux) for performance, microkernels (seL4) for security, hybrids (Windows) for balance
  • System Calls: The controlled gateway between user space and kernel space (~200-1000 cycles each)
  • Interrupts: Hardware signals, software traps, and exceptions that demand CPU attention
  • Privilege Levels: Hardware-enforced isolation (Ring 0/Ring 3) that protects the system
Key Insight: The kernel is trusted code that runs with full hardware access. This design (user/kernel separation) is why you can run untrusted programs without them crashing your system or stealing your data.

Next in the Series

In Part 9: Processes & Program Execution, we'll explore how the kernel creates and manages processes—the running instances of programs—including process states, context switching, and the fork/exec model.