Introduction: Hardware Discovery
Phase 15 Goals: By the end of this phase, your OS will discover hardware through PCI enumeration, have a driver framework, and support modern storage via AHCI (SATA) and NVMe.
Phase 0: Orientation & Big Picture
OS fundamentals, kernel architectures, learning path
Phase 1: How a Computer Starts
BIOS/UEFI, boot sequence, dev environment
Phase 2: Real Mode - First Steps
Real mode, bootloader, BIOS interrupts
Phase 3: Entering Protected Mode
GDT, 32-bit mode, C code execution
Phase 4: Display, Input & Output
VGA text mode, keyboard handling
Phase 5: Interrupts & CPU Control
IDT, ISRs, PIC programming
Phase 6: Memory Management
Paging, virtual memory, heap allocator
Phase 7: Disk Access & Filesystems
Block devices, FAT, VFS layer
Phase 8: Processes & User Mode
Task switching, system calls, user space
Phase 9: ELF Loading & Executables
ELF format, program loading
Phase 10: Standard Library & Shell
C library, command-line shell
Phase 11: 64-Bit Long Mode
x86-64, 64-bit paging, modern architecture
Phase 12: Modern Booting with UEFI
UEFI boot services, memory maps
Phase 13: Graphics & GUI Systems
Framebuffer, windowing, drawing
Phase 14: Advanced Input & Timing
Mouse, high-precision timers
16
Phase 15: Hardware Discovery & Drivers
PCI, device drivers, NVMe
You Are Here
17
Phase 16: Performance & Optimization
Caching, scheduler tuning
18
Phase 17: Stability, Security & Finishing
Debugging, hardening, completion
Your OS is now graphical and interactive. But all that software runs on hardware—and modern PCs have a lot of it! How does your OS know what devices are connected? How does it talk to an SSD, a network card, or a GPU?
The answer is a two-part system: hardware discovery (finding what's connected) and device drivers (knowing how to talk to each device). This phase covers both.
╔═════════════════════════════════════════════════════════════════════════════╗
║ HARDWARE DISCOVERY & DRIVER ARCHITECTURE ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ Application Layer ║
║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ║
║ │ File System │ │ Network │ │ Graphics │ ║
║ │ (VFS) │ │ Stack │ │ Subsystem │ ║
║ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ║
║ │ │ │ ║
║ ───────┴───────────────┴───────────────┴─────── ║
║ ║
║ Driver Layer ║
║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ║
║ │ NVMe Driver │ │ AHCI Driver │ │ NIC Driver │ │ GPU Driver │ ║
║ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ║
║ │ │ │ │ ║
║ ───────┴───────────────┴───────────────┴───────────────┴───── ║
║ ║
║ PCI/PCIe Bus Layer ║
║ ┌─────────────────────────────────────────────────────────────────────┐ ║
║ │ PCI Enumeration + Configuration │ ║
║ │ • Scan Bus:Slot:Function │ ║
║ │ • Read Vendor/Device IDs │ ║
║ │ • Map BARs (Memory/IO) │ ║
║ │ • Match to Drivers │ ║
║ └─────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ Physical Hardware ║
║ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ║
║ │ NVMe │ │ SATA │ │ Intel │ │ RTX │ │ USB │ ║
║ │ SSD │ │ SSD/HD │ │ NIC │ │ GPU │ │ Ctrl │ ║
║ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Key Insight: Hardware discovery is the foundation of extensibility. PCI enumeration lets your OS find and configure devices automatically, while a proper driver framework allows modular hardware support.
Hardware Landscape
Modern PCs communicate with hardware through several interfaces:
| Bus |
Typical Devices |
Speed |
Discovery |
| PCI/PCIe |
GPUs, NICs, NVMe SSDs, SATA controllers |
Up to 64 GB/s (PCIe 5.0 x16) |
Configuration Space scan |
| USB |
Keyboards, mice, storage, cameras |
Up to 20 Gbps (USB 3.2) |
Hub enumeration |
| SATA |
HDDs, older SSDs, optical drives |
6 Gbps (SATA III) |
Port detection via AHCI |
| LPC |
Legacy devices (PS/2, TPM) |
~33 MHz |
ACPI/Hardcoded |
For this phase, we focus on PCI/PCIe enumeration—the primary discovery mechanism for high-performance devices.
Driver Model
A driver is software that knows how to "speak" a specific device's protocol. Good OS design separates:
╔═════════════════════════════════════════════════════════════════════════════╗
║ OS DRIVER MODEL ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────────────────────────────────────────────────────────────┐ ║
║ │ Generic Block Device API │ ║
║ │ read_blocks(), write_blocks(), get_info() │ ║
║ └─────────────────────────────┬───────────────────────────────────────┘ ║
║ │ ║
║ ┌───────────────────┼───────────────────┐ ║
║ │ │ │ ║
║ ┌────────┴────────┐ ┌────────┴────────┐ ┌────────┴────────┐ ║
║ │ NVMe Driver │ │ AHCI Driver │ │ IDE Driver │ ║
║ │ Implements API │ │ Implements API │ │ Implements API │ ║
║ │ for NVMe SSDs │ │ for SATA │ │ for legacy IDE │ ║
║ └─────────────────┘ └─────────────────┘ └─────────────────┘ ║
║ ║
║ Benefits: ║
║ • File system doesn't care which storage technology is used ║
║ • New drivers can be added without changing upper layers ║
║ • Same code works on any hardware that has a driver ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
This abstraction is critical. Your file system calls block_read(), and the driver translates that to the specific commands the hardware understands.
Driver Lifecycle:
- Registration - Driver tells the kernel what devices it supports
- Probe - Kernel asks "can you handle this device?"
- Attach - Driver initializes the hardware
- Operation - Driver handles requests
- Detach - Cleanup when device removed (hot-unplug)
PCI Enumeration
PCI (Peripheral Component Interconnect) is the standard bus for high-speed devices. Every PCI device has a 256-byte Configuration Space that describes what it is and how to talk to it.
╔═════════════════════════════════════════════════════════════════════════════╗
║ PCI BUS TOPOLOGY ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ CPU ║
║ │ ║
║ ┌──────┴──────┐ ║
║ │ Host Bridge │ (Root Complex) ║
║ └──────┬──────┘ ║
║ │ ║
║ ═══════════════════════════════════════ Bus 0 ║
║ │ │ │ ║
║ ┌─────┴────┐ ┌────┴────┐ ┌────┴────┐ ║
║ │ GPU │ │PCI-PCI │ │ AHCI │ ║
║ │00:02.0 │ │ Bridge │ │00:1f.2 │ ║
║ └──────────┘ └────┬────┘ └─────────┘ ║
║ │ ║
║ ═════════════════════════ Bus 1 ║
║ │ │ ║
║ ┌─────┴────┐ ┌────┴────┐ ║
║ │ NVMe │ │ NIC │ ║
║ │01:00.0 │ │01:00.1 │ ║
║ └──────────┘ └─────────┘ ║
║ ║
║ Address Format: Bus:Slot.Function (e.g., 00:1f.2 = Bus 0, Slot 31, Func 2)║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Configuration Space
Every PCI device has a 256-byte Configuration Space (4KB for PCIe). The first 64 bytes are standardized:
╔═════════════════════════════════════════════════════════════════════════════╗
║ PCI CONFIGURATION SPACE HEADER (Type 0) ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ Offset 31 16 15 0 ║
║ ┌────────┬───────────────┬───────────────┐ ║
║ │ 0x00 │ Device ID │ Vendor ID │ ← Identity ║
║ ├────────┼───────────────┼───────────────┤ ║
║ │ 0x04 │ Status │ Command │ ← Control ║
║ ├────────┼───────────────┴───────────────┤ ║
║ │ 0x08 │Class│SubCls│ProgIF│ Rev ID │ ← Classification ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x0C │BIST│HdrTyp│Lat│CacheLine │ ← Built-in test ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x10 │ BAR0 │ ← Base Address Registers ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x14 │ BAR1 │ ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x18 │ BAR2 │ ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x1C │ BAR3 │ ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x20 │ BAR4 │ ║
║ ├────────┼───────────────────────────────┤ ║
║ │ 0x24 │ BAR5 │ ← Total 6 BARs ║
║ ├────────┼───────────────────────────────┤ ║
║ │ ... │ ... │ ║
║ ├────────┼───────────────┬───────────────┤ ║
║ │ 0x3C │ Max_Lat │ IRQ Line │ ← Interrupt info ║
║ └────────┴───────────────┴───────────────┘ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Access Configuration Space via I/O ports on legacy PCI:
/* PCI Configuration Space Access */
#define PCI_CONFIG_ADDR 0xCF8
#define PCI_CONFIG_DATA 0xCFC
/* Read 32-bit value from PCI config space */
uint32_t pci_read(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset) {
uint32_t address = (1U << 31) // Enable bit
| ((uint32_t)bus << 16)
| ((uint32_t)slot << 11)
| ((uint32_t)func << 8)
| (offset & 0xFC);
outl(PCI_CONFIG_ADDR, address);
return inl(PCI_CONFIG_DATA);
}
/* Write 32-bit value to PCI config space */
void pci_write(uint8_t bus, uint8_t slot, uint8_t func,
uint8_t offset, uint32_t value) {
uint32_t address = (1U << 31)
| ((uint32_t)bus << 16)
| ((uint32_t)slot << 11)
| ((uint32_t)func << 8)
| (offset & 0xFC);
outl(PCI_CONFIG_ADDR, address);
outl(PCI_CONFIG_DATA, value);
}
Common Class Codes help identify device types:
| Class |
Subclass |
Description |
| 0x01 |
0x06 |
SATA Controller (AHCI) |
| 0x01 |
0x08 |
Non-Volatile Memory (NVMe) |
| 0x02 |
0x00 |
Ethernet Controller |
| 0x03 |
0x00 |
VGA Compatible Controller |
| 0x0C |
0x03 |
USB Controller |
Bus Scanning
To discover all devices, scan every Bus:Slot:Function combination. If Vendor ID is 0xFFFF, no device is present:
/* PCI Device Structure */
typedef struct pci_device {
uint8_t bus, slot, func;
uint16_t vendor_id;
uint16_t device_id;
uint8_t class_code;
uint8_t subclass;
uint8_t prog_if;
uint32_t bar[6];
uint8_t interrupt_line;
struct pci_device* next;
} pci_device_t;
pci_device_t* pci_devices = NULL;
/* Scan all PCI buses */
void pci_scan(void) {
for (uint16_t bus = 0; bus < 256; bus++) {
for (uint8_t slot = 0; slot < 32; slot++) {
for (uint8_t func = 0; func < 8; func++) {
uint32_t id = pci_read(bus, slot, func, 0);
uint16_t vendor = id & 0xFFFF;
if (vendor == 0xFFFF) continue; // No device
// Found device - add to list
pci_device_t* dev = kmalloc(sizeof(pci_device_t));
dev->bus = bus;
dev->slot = slot;
dev->func = func;
dev->vendor_id = vendor;
dev->device_id = id >> 16;
// Read class info
uint32_t class_reg = pci_read(bus, slot, func, 0x08);
dev->class_code = (class_reg >> 24) & 0xFF;
dev->subclass = (class_reg >> 16) & 0xFF;
dev->prog_if = (class_reg >> 8) & 0xFF;
// Read BARs
for (int i = 0; i < 6; i++) {
dev->bar[i] = pci_read(bus, slot, func, 0x10 + i * 4);
}
// Read interrupt line
dev->interrupt_line = pci_read(bus, slot, func, 0x3C) & 0xFF;
// Add to list
dev->next = pci_devices;
pci_devices = dev;
kprintf("PCI: %02x:%02x.%x - %04x:%04x Class %02x:%02x\n",
bus, slot, func, vendor, dev->device_id,
dev->class_code, dev->subclass);
// Check for multi-function device
if (func == 0) {
uint32_t header = pci_read(bus, slot, 0, 0x0C);
if (!((header >> 16) & 0x80)) break;
}
}
}
}
}
BAR Decoding
BARs (Base Address Registers) tell us where the device's memory or I/O registers are mapped. But they also encode the region size—with a clever trick:
╔═════════════════════════════════════════════════════════════════════════════╗
║ BAR FORMAT AND SIZE DETECTION ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ Memory BAR (bit 0 = 0): ║
║ ┌───────────────────────────────────────┬───┬───┬───┐ ║
║ │ Base Address (bits 4-31) │Prf│Typ│ 0 │ ║
║ └───────────────────────────────────────┴───┴───┴───┘ ║
║ │ │ └─ Memory space ║
║ │ └──── Type: 00=32-bit ║
║ │ 10=64-bit ║
║ └───── Prefetchable ║
║ ║
║ I/O BAR (bit 0 = 1): ║
║ ┌───────────────────────────────────────────────┬───┐ ║
║ │ I/O Port (bits 2-31) │ 1 │ ║
║ └───────────────────────────────────────────────┴───┘ ║
║ ║
║ Size Detection Algorithm: ║
║ 1. Save original BAR value ║
║ 2. Write all 1s (0xFFFFFFFF) to BAR ║
║ 3. Read back - hardware sets writable bits ║
║ 4. Invert, add 1 = region size ║
║ 5. Restore original BAR value ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
/* Decode a BAR and return its base address and size */
void pci_decode_bar(uint8_t bus, uint8_t slot, uint8_t func,
int bar_num, uint64_t* base, uint64_t* size, bool* is_io) {
uint8_t offset = 0x10 + bar_num * 4;
uint32_t bar = pci_read(bus, slot, func, offset);
*is_io = bar & 1;
if (*is_io) {
// I/O BAR
*base = bar & ~0x3;
// Get size
uint32_t orig = bar;
pci_write(bus, slot, func, offset, 0xFFFFFFFF);
uint32_t mask = pci_read(bus, slot, func, offset);
pci_write(bus, slot, func, offset, orig);
*size = ~(mask & ~0x3) + 1;
} else {
// Memory BAR
int type = (bar >> 1) & 0x3;
if (type == 0x2) {
// 64-bit BAR - uses two consecutive BARs
uint32_t bar_hi = pci_read(bus, slot, func, offset + 4);
*base = ((uint64_t)bar_hi << 32) | (bar & ~0xF);
// Get size (must probe both BARs)
uint32_t orig_lo = bar, orig_hi = bar_hi;
pci_write(bus, slot, func, offset, 0xFFFFFFFF);
pci_write(bus, slot, func, offset + 4, 0xFFFFFFFF);
uint32_t mask_lo = pci_read(bus, slot, func, offset);
uint32_t mask_hi = pci_read(bus, slot, func, offset + 4);
pci_write(bus, slot, func, offset, orig_lo);
pci_write(bus, slot, func, offset + 4, orig_hi);
uint64_t mask = ((uint64_t)mask_hi << 32) | (mask_lo & ~0xF);
*size = ~mask + 1;
} else {
// 32-bit BAR
*base = bar & ~0xF;
uint32_t orig = bar;
pci_write(bus, slot, func, offset, 0xFFFFFFFF);
uint32_t mask = pci_read(bus, slot, func, offset);
pci_write(bus, slot, func, offset, orig);
*size = ~(mask & ~0xF) + 1;
}
}
}
/* Enable device memory/IO and bus mastering */
void pci_enable_device(uint8_t bus, uint8_t slot, uint8_t func) {
uint32_t cmd = pci_read(bus, slot, func, 0x04);
cmd |= (1 << 0); // I/O Space
cmd |= (1 << 1); // Memory Space
cmd |= (1 << 2); // Bus Master (DMA)
pci_write(bus, slot, func, 0x04, cmd);
}
Important: Modern systems use PCIe ECAM (Enhanced Configuration Access Mechanism) instead of I/O ports. ECAM memory-maps the entire configuration space, found via ACPI's MCFG table.
Driver Framework
A driver framework is the glue between discovered hardware and the rest of the OS. It defines how drivers register themselves, how they're matched to devices, and how they export functionality.
╔═════════════════════════════════════════════════════════════════════════════╗
║ DRIVER MATCHING FLOW ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ║
║ │ PCI Scan │ │ Driver Reg │ │ Matching │ ║
║ │ Finds │ │ Drivers │ │ For each │ ║
║ │ devices ├──────────►│ declare ├──────────►│ device: │ ║
║ │ │ │ supported │ │ call probe │ ║
║ │ ┌─────────┐ │ │ devices │ │ │ ║
║ │ │NVMe SSD│ │ │ ┌─────────┐ │ │ │ ║
║ │ │01:08:02│─┼───────────┼─│nvme_drv│─┼───────────┤ Returns 1 │ ║
║ │ └─────────┘ │ │ └─────────┘ │ │ │ │ ║
║ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ▼ │ ║
║ │ │AHCI Ctl│ │ │ │ahci_drv│ │ │ attach() │ ║
║ │ │00:1f:02│ │ │ └─────────┘ │ │ │ ║
║ │ └─────────┘ │ │ │ │ │ ║
║ └─────────────┘ └─────────────┘ └─────────────┘ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Driver Interface
Define a common interface that all drivers implement:
/* PCI Device ID for matching */
typedef struct pci_device_id {
uint16_t vendor;
uint16_t device;
uint8_t class_code;
uint8_t subclass;
uint8_t prog_if;
uint32_t driver_data; // Private data for driver
} pci_device_id_t;
/* Match any value */
#define PCI_ANY 0xFFFF
/* Driver Interface */
typedef struct driver {
const char* name;
const pci_device_id_t* id_table; // Supported devices
int (*probe)(pci_device_t* dev); // Check if driver supports device
int (*attach)(pci_device_t* dev); // Initialize device
int (*detach)(pci_device_t* dev); // Cleanup
struct driver* next;
} driver_t;
driver_t* drivers = NULL;
/* Register a driver */
void driver_register(driver_t* drv) {
drv->next = drivers;
drivers = drv;
kprintf("Driver registered: %s\n", drv->name);
}
/* Check if driver matches device using ID table */
static bool driver_match_id(driver_t* drv, pci_device_t* dev) {
for (const pci_device_id_t* id = drv->id_table; id->vendor; id++) {
if ((id->vendor == PCI_ANY || id->vendor == dev->vendor_id) &&
(id->device == PCI_ANY || id->device == dev->device_id) &&
(id->class_code == 0xFF || id->class_code == dev->class_code) &&
(id->subclass == 0xFF || id->subclass == dev->subclass)) {
return true;
}
}
return false;
}
/* Match drivers to devices */
void driver_match_all(void) {
for (pci_device_t* dev = pci_devices; dev; dev = dev->next) {
for (driver_t* drv = drivers; drv; drv = drv->next) {
// First check ID table, then call probe
if (driver_match_id(drv, dev) && drv->probe(dev)) {
if (drv->attach(dev) == 0) {
kprintf("Driver '%s' attached to %02x:%02x.%x\n",
drv->name, dev->bus, dev->slot, dev->func);
break; // Device claimed
}
}
}
}
}
Here's how a driver declares its supported devices:
/* NVMe Driver Example */
/* Devices this driver supports */
static const pci_device_id_t nvme_ids[] = {
{ .vendor = PCI_ANY, .device = PCI_ANY,
.class_code = 0x01, .subclass = 0x08 }, // Any NVMe controller
{ .vendor = 0x8086, .device = 0xF1A5 }, // Intel Optane
{ .vendor = 0x144D, .device = 0xA808 }, // Samsung PM981
{} // Terminator
};
int nvme_probe(pci_device_t* dev) {
// Additional checks if needed
return dev->class_code == 0x01 && dev->subclass == 0x08;
}
int nvme_attach(pci_device_t* dev) {
// Initialize NVMe controller
kprintf("NVMe: Initializing %04x:%04x\n", dev->vendor_id, dev->device_id);
// Enable bus mastering
pci_enable_device(dev->bus, dev->slot, dev->func);
// Map BAR0 (controller registers)
uint64_t bar_base, bar_size;
bool is_io;
pci_decode_bar(dev->bus, dev->slot, dev->func, 0,
&bar_base, &bar_size, &is_io);
// Initialize controller...
return 0;
}
int nvme_detach(pci_device_t* dev) {
// Cleanup
return 0;
}
static driver_t nvme_driver = {
.name = "nvme",
.id_table = nvme_ids,
.probe = nvme_probe,
.attach = nvme_attach,
.detach = nvme_detach
};
/* Called during OS init */
void nvme_driver_init(void) {
driver_register(&nvme_driver);
}
Device Tree
A device tree tracks the hierarchy of all discovered devices. This is useful for power management, display in lspci-like tools, and hot-plug handling:
╔═════════════════════════════════════════════════════════════════════════════╗
║ DEVICE TREE STRUCTURE ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ root ║
║ ├── pci0000:00 (Root Complex) ║
║ │ ├── 00:00.0 Host Bridge [8086:9A14] ║
║ │ ├── 00:02.0 VGA Controller [8086:9A49] ← Intel UHD ║
║ │ ├── 00:14.0 USB Controller [8086:A0ED] ← xHCI ║
║ │ ├── 00:17.0 SATA Controller [8086:A0D3] ← AHCI ║
║ │ ├── 00:1c.0 PCI Bridge ║
║ │ │ └── 01:00.0 NVMe SSD [144D:A808] ← Samsung ║
║ │ └── 00:1f.0 ISA Bridge ║
║ └── acpi ║
║ ├── HPET ║
║ └── APIC ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
/* Generic device node */
typedef struct device {
char name[32];
struct device* parent;
struct device* children;
struct device* sibling;
driver_t* driver;
void* driver_data; // Driver-specific state
enum { DEV_PCI, DEV_USB, DEV_ACPI } bus_type;
union {
pci_device_t* pci;
// usb_device_t* usb;
} bus_data;
} device_t;
device_t* device_root = NULL;
/* Add device to tree */
void device_add(device_t* dev, device_t* parent) {
dev->parent = parent;
dev->sibling = parent->children;
parent->children = dev;
}
/* Print device tree (recursive) */
void device_tree_print(device_t* dev, int depth) {
for (int i = 0; i < depth; i++) kprintf(" ");
kprintf("├── %s", dev->name);
if (dev->driver) kprintf(" [%s]", dev->driver->name);
kprintf("\n");
for (device_t* child = dev->children; child; child = child->sibling) {
device_tree_print(child, depth + 1);
}
}
/* Find device by path (e.g., "pci0000:00/00:1c.0/01:00.0") */
device_t* device_find(const char* path) {
// Parse path and walk tree
device_t* current = device_root;
// ... implementation
return current;
}
Real-World Pattern: Linux's /sys/devices/ exposes the device tree to userspace. Your shell can implement lspci and lsusb by walking this tree!
AHCI Storage
AHCI (Advanced Host Controller Interface) is the standard interface for SATA devices—hard drives and older SSDs. It's more complex than legacy IDE but supports NCQ (Native Command Queuing) for better performance.
╔═════════════════════════════════════════════════════════════════════════════╗
║ AHCI ARCHITECTURE ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────────────────────────────────────────────────────────────┐ ║
║ │ AHCI Controller (HBA) │ ║
║ │ ┌─────────────────────────────────────────────────────────────┐ │ ║
║ │ │ Generic Host Control │ │ ║
║ │ │ CAP | GHC | PI | IS | Version │ │ ║
║ │ └─────────────────────────────────────────────────────────────┘ │ ║
║ │ │ ║
║ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ║
║ │ │ Port 0 │ │ Port 1 │ │ Port 2 │ │ Port 3 │ ... up to 32 │ ║
║ │ │ PxCLB │ │ PxCLB │ │ PxCLB │ │ PxCLB │ │ ║
║ │ │ PxFB │ │ PxFB │ │ PxFB │ │ PxFB │ │ ║
║ │ │ PxIS │ │ PxIS │ │ PxIS │ │ PxIS │ │ ║
║ │ │ PxCMD │ │ PxCMD │ │ PxCMD │ │ PxCMD │ │ ║
║ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ ║
║ │ │ │ │ │ │ ║
║ └────────┼───────────┼───────────┼───────────┼──────────────────────┘ ║
║ │ │ │ │ ║
║ ┌──────┴──────┐ ┌───────┴─────┐ │ (no device) ║
║ │ SATA HD │ │ SATA SSD │ │ ║
║ │ 1TB HDD │ │ 256GB SSD │ │ ║
║ └─────────────┘ └─────────────┘ │ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Initialization
AHCI initialization follows a specific sequence:
/* AHCI Memory-Mapped Registers */
typedef volatile struct {
// Generic Host Control (offset 0x00)
uint32_t cap; // Host capabilities
uint32_t ghc; // Global host control
uint32_t is; // Interrupt status
uint32_t pi; // Ports implemented
uint32_t vs; // Version
uint32_t ccc_ctl; // Command completion coalescing control
uint32_t ccc_ports; // CCC ports
uint32_t em_loc; // Enclosure management location
uint32_t em_ctl; // Enclosure management control
uint32_t cap2; // Extended capabilities
uint32_t bohc; // BIOS/OS handoff control
uint8_t rsv[0xA0 - 0x2C];
uint8_t vendor[0x100 - 0xA0];
// Port registers at offset 0x100+
} ahci_hba_t;
/* Per-Port Registers (one set per port) */
typedef volatile struct {
uint32_t clb; // Command list base (low)
uint32_t clbu; // Command list base (high)
uint32_t fb; // FIS base (low)
uint32_t fbu; // FIS base (high)
uint32_t is; // Interrupt status
uint32_t ie; // Interrupt enable
uint32_t cmd; // Command and status
uint32_t rsv0;
uint32_t tfd; // Task file data
uint32_t sig; // Signature
uint32_t ssts; // SATA status
uint32_t sctl; // SATA control
uint32_t serr; // SATA error
uint32_t sact; // SATA active
uint32_t ci; // Command issue
uint32_t sntf; // SATA notification
uint32_t fbs; // FIS-based switching
uint32_t rsv1[11];
uint32_t vendor[4];
} ahci_port_t;
/* Get port registers */
#define AHCI_PORT(hba, n) ((ahci_port_t*)((uint8_t*)(hba) + 0x100 + (n) * 0x80))
/* Initialize AHCI controller */
int ahci_init(pci_device_t* pci_dev) {
// Map BAR5 (ABAR - AHCI BAR)
uint64_t abar_base, abar_size;
bool is_io;
pci_decode_bar(pci_dev->bus, pci_dev->slot, pci_dev->func,
5, &abar_base, &abar_size, &is_io);
ahci_hba_t* hba = (ahci_hba_t*)vmap(abar_base, abar_size);
// Enable AHCI mode and interrupts
hba->ghc |= (1 << 31); // AE (AHCI Enable)
hba->ghc |= (1 << 1); // IE (Interrupt Enable)
// Scan ports
uint32_t pi = hba->pi;
for (int i = 0; i < 32; i++) {
if (!(pi & (1 << i))) continue; // Port not implemented
ahci_port_t* port = AHCI_PORT(hba, i);
// Check if device connected
uint32_t ssts = port->ssts;
uint8_t det = ssts & 0x0F; // Device detection
uint8_t ipm = (ssts >> 8) & 0x0F; // Interface power management
if (det != 3 || ipm != 1) continue; // No device or not active
// Check signature
uint32_t sig = port->sig;
if (sig == 0x00000101) {
kprintf("AHCI port %d: SATA HDD/SSD\n", i);
ahci_port_init(hba, i);
} else if (sig == 0xEB140101) {
kprintf("AHCI port %d: ATAPI device\n", i);
}
}
return 0;
}
Command Structure
AHCI uses a Command List and Received FIS structure per port. Each command slot points to a Command Table with the actual command and data:
╔═════════════════════════════════════════════════════════════════════════════╗
║ AHCI COMMAND STRUCTURES ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ Port Registers Command List (1KB) Command Table ║
║ ┌────────────┐ ┌─────────────────┐ ┌──────────────┐║
║ │ PxCLB ├───────────────►│ Slot 0 (32B) ├───────►│ CFIS (64B) │║
║ │ Cmd List │ ├─────────────────┤ ├──────────────┤║
║ │ Base Addr │ │ Slot 1 (32B) │ │ ACMD (16B) │║
║ ├────────────┤ ├─────────────────┤ ├──────────────┤║
║ │ PxFB ├───────┐ │ ... │ │ Reserved │║
║ │ FIS Base │ │ ├─────────────────┤ ├──────────────┤║
║ │ Address │ │ │ Slot 31(32B) │ │ PRDT Entry 0 │║
║ ├────────────┤ │ └─────────────────┘ ├──────────────┤║
║ │ PxCI │ │ │ PRDT Entry 1 │║
║ │ Cmd Issue │ │ Received FIS (256B) ├──────────────┤║
║ └────────────┘ │ ┌─────────────────┐ │ ... │║
║ │ │ DMA Setup FIS │ ├──────────────┤║
║ └──────►│ PIO Setup FIS │ │ PRDT Entry N │║
║ │ D2H Reg FIS │ └──────────────┘║
║ └─────────────────┘ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
/* Command Header (32 bytes per slot, 32 slots) */
typedef struct {
uint16_t flags; // CFL (command FIS length), ATAPI, Write, Prefetch
uint16_t prdtl; // PRDT length (entries)
uint32_t prdbc; // PRD byte count (transferred)
uint32_t ctba; // Command table base (low)
uint32_t ctbau; // Command table base (high)
uint32_t rsv[4];
} __attribute__((packed)) ahci_cmd_header_t;
/* Physical Region Descriptor Table Entry (16 bytes) */
typedef struct {
uint32_t dba; // Data base address (low)
uint32_t dbau; // Data base address (high)
uint32_t rsv;
uint32_t dbc; // Byte count (bit 31 = interrupt on completion)
} __attribute__((packed)) ahci_prdt_entry_t;
/* Command Table (variable size: 128 bytes + N * 16 bytes for PRDTs) */
typedef struct {
uint8_t cfis[64]; // Command FIS (Host to Device Register FIS)
uint8_t acmd[16]; // ATAPI command (if ATAPI bit set)
uint8_t rsv[48];
ahci_prdt_entry_t prdt[1]; // Flexible array
} __attribute__((packed)) ahci_cmd_table_t;
/* Read sectors using AHCI */
int ahci_read(ahci_hba_t* hba, int port, uint64_t lba,
uint16_t count, void* buffer) {
ahci_port_t* p = AHCI_PORT(hba, port);
// Wait for port ready
while (p->tfd & 0x88); // BSY or DRQ set
// Find free command slot
int slot = ahci_find_slot(hba, port);
if (slot == -1) return -1;
// Get command header
ahci_cmd_header_t* hdr = &cmd_list[port][slot];
hdr->flags = 5; // FIS length = 5 DWORDs (20 bytes)
hdr->flags &= ~0x40; // Clear Write bit (this is a read)
hdr->prdtl = 1; // 1 PRDT entry
// Set up command table
ahci_cmd_table_t* tbl = cmd_tables[port][slot];
memset(tbl, 0, sizeof(ahci_cmd_table_t));
// Build H2D Register FIS
tbl->cfis[0] = 0x27; // FIS type: H2D
tbl->cfis[1] = 0x80; // Command (not control)
tbl->cfis[2] = 0x25; // ATA_CMD_READ_DMA_EXT
// LBA (48-bit)
tbl->cfis[4] = (lba >> 0) & 0xFF; // LBA low
tbl->cfis[5] = (lba >> 8) & 0xFF; // LBA mid
tbl->cfis[6] = (lba >> 16) & 0xFF; // LBA high
tbl->cfis[7] = 0x40; // Device: LBA mode
tbl->cfis[8] = (lba >> 24) & 0xFF; // LBA low exp
tbl->cfis[9] = (lba >> 32) & 0xFF; // LBA mid exp
tbl->cfis[10] = (lba >> 40) & 0xFF; // LBA high exp
tbl->cfis[12] = count & 0xFF; // Sector count low
tbl->cfis[13] = (count >> 8) & 0xFF; // Sector count high
// Set up PRDT entry
tbl->prdt[0].dba = (uint32_t)(uintptr_t)buffer;
tbl->prdt[0].dbau = (uint32_t)((uintptr_t)buffer >> 32);
tbl->prdt[0].dbc = (count * 512) - 1; // 0-based
tbl->prdt[0].dbc |= (1 << 31); // Interrupt on completion
// Issue command
p->ci = (1 << slot);
// Wait for completion
while (p->ci & (1 << slot)) {
if (p->is & (1 << 30)) { // Task file error
return -1;
}
}
return 0;
}
NCQ (Native Command Queuing): AHCI supports up to 32 outstanding commands per port. This enables the drive to reorder operations for optimal head movement (HDDs) or parallelism (SSDs).
NVMe Storage
NVMe (Non-Volatile Memory Express) is the modern interface for SSDs, designed from scratch for flash storage over PCIe. Unlike AHCI (which was designed for spinning disks), NVMe uses multiple parallel queues to achieve incredible performance—millions of IOPS!
╔═════════════════════════════════════════════════════════════════════════════╗
║ NVMe ARCHITECTURE ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌──────────────────────────────────────────────────────────────────────┐ ║
║ │ CPU / Driver │ ║
║ │ │ ║
║ │ Admin Queue I/O Queues (up to 65535) │ ║
║ │ ┌────┬────┐ ┌────┬────┐ ┌────┬────┐ ┌────┬────┐ │ ║
║ │ │ASQ │ACQ │ │SQ1 │CQ1 │ │SQ2 │CQ2 │ │SQ3 │CQ3 │ │ ║
║ │ └──┬─┴──┬─┘ └──┬─┴──┬─┘ └──┬─┴──┬─┘ └──┬─┴──┬─┘ │ ║
║ └─────┼────┼──────────────────┼────┼──────┼────┼──────┼────┼───────┘ ║
║ │ ▲ │ ▲ │ ▲ │ ▲ ║
║ ▼ │ ▼ │ ▼ │ ▼ │ ║
║ ┌──────────────────────────────────────────────────────────────────────┐ ║
║ │ NVMe Controller (PCIe) │ ║
║ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ ║
║ │ │ Doorbell 0 │ │ Doorbell 1 │ │ Doorbell N │ (ring to submit) │ ║
║ │ └─────────────┘ └─────────────┘ └─────────────┘ │ ║
║ │ │ │ ║
║ │ ┌─────┴─────┐ │ ║
║ │ │ Flash Memory │ │ ║
║ │ │ (NAND) │ │ ║
║ │ └───────────┘ │ ║
║ └──────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ SQ = Submission Queue (driver writes commands) ║
║ CQ = Completion Queue (controller writes results) ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Submission/Completion Queues
NVMe's key innovation is the queue-based command interface. Instead of one command at a time (like IDE), or 32 commands (like AHCI), NVMe supports up to 65,535 I/O queue pairs, each with up to 65,536 entries!
Why So Many Queues? Modern SSDs have massive internal parallelism (hundreds of flash chips). Multiple queues let the OS feed commands from different CPU cores simultaneously, fully saturating the SSD's bandwidth.
Each queue pair consists of:
- Submission Queue (SQ) - Driver writes 64-byte command entries
- Completion Queue (CQ) - Controller writes 16-byte completion entries
- Doorbell Register - Driver rings to notify controller of new commands
/* NVMe Submission Queue Entry (64 bytes) */
typedef struct {
uint8_t opcode;
uint8_t flags;
uint16_t cid; // Command ID
uint32_t nsid; // Namespace ID
uint64_t reserved;
uint64_t mptr; // Metadata pointer
uint64_t prp1; // Physical Region Page 1
uint64_t prp2; // Physical Region Page 2
uint32_t cdw10; // Command-specific
uint32_t cdw11;
uint32_t cdw12;
uint32_t cdw13;
uint32_t cdw14;
uint32_t cdw15;
} __attribute__((packed)) nvme_sqe_t;
/* NVMe Completion Queue Entry */
typedef struct {
uint32_t result; // Command-specific result
uint32_t reserved;
uint16_t sq_head; // Submission queue head pointer
uint16_t sq_id; // Submission queue ID
uint16_t cid; // Command ID
uint16_t status; // Status field
} __attribute__((packed)) nvme_cqe_t;
The phase bit in the completion entry is clever: it flips between 0 and 1 each time the queue wraps. This lets the driver detect new completions without needing explicit notifications for each one.
NVMe Commands
NVMe commands are classified into two types:
| Type |
Queue |
Examples |
| Admin |
Admin SQ/CQ |
Create I/O Queue, Identify, Get Log Page, Set Features |
| I/O |
I/O SQ/CQ pairs |
Read, Write, Flush, Dataset Management (TRIM) |
Before issuing I/O commands, you must use Admin commands to identify the controller and create I/O queues:
/* NVMe Controller Initialization */
int nvme_init(pci_device_t* pci_dev) {
// Get BAR0 (NVMe registers)
uint64_t bar0, bar0_size;
bool is_io;
pci_decode_bar(pci_dev->bus, pci_dev->slot, pci_dev->func,
0, &bar0, &bar0_size, &is_io);
nvme_regs_t* regs = (nvme_regs_t*)map_physical(bar0, bar0_size);
// Disable controller
regs->cc &= ~1; // Clear EN bit
while (regs->csts & 1); // Wait for RDY=0
// Configure Admin Queue (AQ)
regs->aqa = (ADMIN_QUEUE_SIZE - 1) | ((ADMIN_QUEUE_SIZE - 1) << 16);
regs->asq = virt_to_phys(admin_sq); // Submission queue
regs->acq = virt_to_phys(admin_cq); // Completion queue
// Enable controller
regs->cc = (0 << 20) | // I/O CQ entry size = 16 bytes (2^4)
(0 << 16) | // I/O SQ entry size = 64 bytes (2^6)
(0 << 14) | // Shutdown notification = none
(0 << 11) | // Arbitration = round robin
(6 << 7) | // Memory page size = 4KB (2^12)
(0 << 4) | // NVM command set
1; // Enable
while (!(regs->csts & 1)); // Wait for RDY=1
// Issue Identify Controller command
nvme_identify(ctrl, 0, 1, &ctrl_info);
// Create I/O Completion Queue (Admin cmd 0x05)
nvme_create_cq(ctrl, 1, IO_QUEUE_SIZE, io_cq);
// Create I/O Submission Queue (Admin cmd 0x01)
nvme_create_sq(ctrl, 1, IO_QUEUE_SIZE, io_sq, 1);
return 0;
}
Now we can issue read/write commands:
/* NVMe Read Command */
int nvme_read(nvme_ctrl_t* ctrl, uint32_t nsid,
uint64_t lba, uint32_t blocks, void* buffer) {
nvme_sqe_t cmd = {0};
cmd.opcode = 0x02; // Read
cmd.cid = alloc_command_id();
cmd.nsid = nsid;
cmd.prp1 = virt_to_phys(buffer);
cmd.cdw10 = lba & 0xFFFFFFFF;
cmd.cdw11 = lba >> 32;
cmd.cdw12 = blocks - 1; // Zero-based count
// Submit to I/O queue
submit_command(ctrl, &cmd);
// Wait for completion
return wait_completion(ctrl, cmd.cid);
}
NVMe vs AHCI Performance: While AHCI supports 32 commands with one queue, NVMe can handle 65K commands across 65K queues. This translates to 500K+ IOPS vs AHCI's ~100K IOPS on the same SSD!
What You Can Build
Phase 15 Project: A hardware-aware OS! Your system now discovers PCI devices automatically, has a pluggable driver framework, and can access modern NVMe SSDs at full speed.
Let's combine everything into a complete storage subsystem demo:
/* storage_demo.c - Complete Storage Subsystem */
#include "pci.h"
#include "ahci.h"
#include "nvme.h"
#include "block.h"
/* Generic Block Device Interface */
typedef struct block_device {
char name[32];
uint64_t total_sectors;
uint32_t sector_size;
int (*read)(struct block_device* dev, uint64_t lba,
uint32_t count, void* buffer);
int (*write)(struct block_device* dev, uint64_t lba,
uint32_t count, const void* buffer);
void* private; // Driver-specific data
} block_device_t;
static block_device_t devices[MAX_BLOCK_DEVICES];
static int num_devices = 0;
/* Register a block device */
void block_register(block_device_t* dev) {
if (num_devices < MAX_BLOCK_DEVICES) {
devices[num_devices++] = *dev;
kprintf("Block: Registered %s (%llu MB)\n",
dev->name,
(dev->total_sectors * dev->sector_size) / (1024*1024));
}
}
/* Storage subsystem initialization */
void storage_init(void) {
kprintf("Storage: Scanning for devices...\n");
// Scan PCI bus for storage controllers
pci_device_t* dev;
while ((dev = pci_next_device()) != NULL) {
// Check class code
if (dev->class_code != 0x01) continue; // Mass Storage
switch (dev->subclass) {
case 0x06: // AHCI (SATA)
if (dev->prog_if == 0x01) {
kprintf("Storage: Found AHCI controller at %02x:%02x.%x\n",
dev->bus, dev->slot, dev->func);
ahci_init(dev);
}
break;
case 0x08: // NVMe
if (dev->prog_if == 0x02) {
kprintf("Storage: Found NVMe controller at %02x:%02x.%x\n",
dev->bus, dev->slot, dev->func);
nvme_init(dev);
}
break;
case 0x01: // IDE (legacy)
kprintf("Storage: Found IDE controller (legacy)\n");
// ide_init(dev);
break;
}
}
kprintf("Storage: Found %d block device(s)\n", num_devices);
}
/* Read from any block device */
int block_read(int device_id, uint64_t lba, uint32_t count, void* buffer) {
if (device_id >= num_devices) return -1;
return devices[device_id].read(&devices[device_id], lba, count, buffer);
}
/* Demo: Read first sector of each device */
void storage_demo(void) {
uint8_t sector[512];
for (int i = 0; i < num_devices; i++) {
kprintf("\nReading sector 0 from %s:\n", devices[i].name);
if (block_read(i, 0, 1, sector) == 0) {
// Check for MBR signature
if (sector[510] == 0x55 && sector[511] == 0xAA) {
kprintf(" Found MBR partition table\n");
// Parse partition entries
for (int p = 0; p < 4; p++) {
uint8_t* entry = §or[446 + p * 16];
uint8_t type = entry[4];
if (type != 0) {
uint32_t start = *(uint32_t*)&entry[8];
uint32_t size = *(uint32_t*)&entry[12];
kprintf(" Partition %d: type=0x%02x, start=%u, size=%u\n",
p, type, start, size);
}
}
} else if (memcmp(§or[512-2], "EFI PART", 8) == 0) {
kprintf(" Found GPT partition table\n");
} else {
kprintf(" Unknown partition format\n");
}
} else {
kprintf(" Read failed!\n");
}
}
}
Exercises
Exercise 1: Implement lspci
Create a command that lists all PCI devices with details:
void cmd_lspci(void) {
// TODO: Enumerate all PCI devices
// Print: Bus:Slot.Func VendorID:DeviceID Class Description
// Example: 00:1f.2 8086:a102 0106 Intel AHCI Controller
// Hint: Create a table of known vendor/device IDs
}
PCI
Enumeration
Exercise 2: Block Device Cache
Add a simple block cache to reduce disk reads:
typedef struct cache_entry {
uint64_t lba;
uint8_t data[512];
bool dirty;
uint32_t access_count;
} cache_entry_t;
// TODO: Implement LRU cache with:
// - cache_read(device, lba) - check cache first
// - cache_write(device, lba, data) - write-back caching
// - cache_flush() - write all dirty entries
Caching
Performance
Exercise 3: Hot-Plug Detection
Handle AHCI hot-plug events:
void ahci_interrupt_handler(int irq) {
uint32_t is = hba->is; // Global interrupt status
for (int port = 0; port < 32; port++) {
if (!(is & (1 << port))) continue;
ahci_port_t* p = get_port(port);
uint32_t pis = p->is; // Port interrupt status
// TODO: Handle these events:
// - Device connected (PRCS bit)
// - Device disconnected
// - Command completion
// - Error conditions
p->is = pis; // Clear handled interrupts
}
hba->is = is; // Clear global status
}
Hot-Plug
Interrupts
Exercise 4: NVMe Multiple Queues
Create per-CPU I/O queues for maximum parallelism:
typedef struct nvme_queue_pair {
nvme_sqe_t* sq; // Submission queue
nvme_cqe_t* cq; // Completion queue
uint16_t sq_tail; // Next slot to write
uint16_t cq_head; // Next slot to read
uint8_t cq_phase; // Expected phase bit
spinlock_t lock;
} nvme_queue_pair_t;
// TODO: Create one queue pair per CPU
// - nvme_init_percpu_queues()
// - Use current CPU's queue for submissions
// - No locking needed if each CPU uses its own queue!
SMP
NVMe
╔═════════════════════════════════════════════════════════════════════════════╗
║ PHASE 15 → PHASE 16 TRANSITION ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ✓ Phase 15 Complete: ║
║ • PCI bus enumeration and device discovery ║
║ • Pluggable driver framework with ID matching ║
║ • AHCI driver for SATA devices ║
║ • NVMe driver for modern SSDs ║
║ • Generic block device abstraction ║
║ ║
║ → Phase 16 Preview: Performance & Optimization ║
║ • Scheduler tuning (time slice, priority algorithms) ║
║ • Block cache and buffer management ║
║ • Memory allocator optimization ║
║ • Profiling and bottleneck identification ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
Concepts Covered
| Concept |
Description |
| PCI Configuration Space | 256/4096-byte device descriptor with IDs, BARs, capabilities |
| BAR Decoding | Determining memory/IO base addresses and sizes |
| Driver Matching | Vendor/Device ID tables for automatic driver selection |
| Device Tree | Hierarchical representation of hardware topology |
| AHCI | SATA interface with command lists and FIS structures |
| NVMe Queues | Submission/Completion queue pairs for parallel I/O |
| Block Device | Generic interface abstracting storage hardware |
Next Steps
With all major subsystems in place, it's time to optimize. In Phase 16, we'll tune the scheduler, implement caching strategies, and profile performance to make the OS fast and responsive.
Continue the Series
Phase 14: Advanced Input & Timing
Review mouse drivers and timer implementation.
Read Article
Phase 16: Performance & Optimization
Tune the scheduler and implement caching.
Read Article
Phase 17: Stability, Security & Finishing
Debug, harden, and complete the OS.
Read Article