Embedded Systems Series Part 1: Fundamentals & Architecture

January 25, 2026 Wasil Zafar 35 min read

Master the building blocks of embedded systems—microcontroller vs microprocessor architectures, memory types, Harvard vs Von Neumann, interrupts, and real-time constraints.

Introduction: What Are Embedded Systems?
Microcontroller vs Microprocessor
Processor Architectures
Memory Types in Embedded Systems
Interrupts & Exception Handling
Real-Time Constraints
Development Tools & Workflow
Bare-Metal Programming Basics
Conclusion & Next Steps

Introduction: What Are Embedded Systems?

                        
                        Series Navigation: This is Part 1 of the 13-part Embedded Systems Series. Start here to build a solid foundation in embedded development.
                    

Embedded Systems Mastery

Your 13-step learning path • Currently on Step 1

Fundamentals & Architecture

Microcontrollers, memory, interrupts

You Are Here

An embedded system is a computer system designed to perform dedicated functions within a larger mechanical or electronic system. Unlike general-purpose computers, embedded systems are optimized for specific tasks—from controlling your car's engine to managing a smart thermostat.

Embedded systems are everywhere: your microwave, washing machine, smartphone, car (with 50-100+ embedded controllers), medical devices, industrial robots, and the aircraft autopilot. The global embedded systems market exceeds $100 billion annually, making this one of the most important domains in computing.

Diagram showing the main components of an embedded system including processor, memory, I/O peripherals, and sensors — Overview of a typical embedded system showing the core components: processor, memory, peripherals, and I/O interfaces

                        
                        Key Characteristics of Embedded Systems:
                        Task-specific: Designed for one or a few dedicated functions
Resource-constrained: Limited memory, processing power, and energy
Real-time requirements: Must respond within strict timing deadlines
Reliability: Must operate continuously without crashes or reboots
Cost-sensitive: Often produced in high volumes with tight margins

                    

Microcontroller vs Microprocessor

Understanding the difference between microcontrollers (MCUs) and microprocessors (MPUs) is fundamental to embedded systems design. Both are integrated circuits that execute instructions, but their architecture and use cases differ significantly.

Side-by-side comparison of microcontroller and microprocessor architectures showing integrated versus external components — Microcontroller (self-contained SoC with integrated memory and peripherals) vs Microprocessor (requires external RAM, storage, and I/O chips)

Microcontroller Architecture

A microcontroller is a self-contained "system on a chip" (SoC) that integrates:

CPU core: The processing unit (e.g., ARM Cortex-M, AVR, PIC)
Flash memory: Non-volatile storage for program code (typically 16KB to 2MB)
SRAM: Volatile memory for runtime data (typically 4KB to 512KB)
Peripherals: GPIO, timers, UART, SPI, I2C, ADC, DAC
Clock system: Internal oscillators and PLLs

Popular Microcontroller Families

ARM Cortex-M AVR PIC

STM32 (ARM Cortex-M): Industry standard, extensive ecosystem, 32-bit
ESP32: WiFi/Bluetooth built-in, great for IoT projects
ATmega328 (AVR): Powers Arduino Uno, beginner-friendly
PIC: Microchip's family, popular in industrial applications
Nordic nRF52: Low-power Bluetooth, wearables and sensors

Microprocessor Architecture

A microprocessor is primarily a CPU that requires external components:

External RAM: DDR3/DDR4 memory modules (GBs of capacity)
External storage: eMMC, SD card, NVMe for OS and data
Support chips: Power management, memory controllers
Higher performance: Complex instruction sets, caches, MMU

Popular Microprocessor Families

ARM Cortex-A RISC-V

ARM Cortex-A (A53, A72, A78): Smartphones, tablets, Raspberry Pi
Qualcomm Snapdragon: Mobile SoCs with integrated GPU, DSP
Intel Atom: Low-power x86 for embedded PCs
RISC-V: Open-source ISA gaining traction

When to Use Which

                        
                        Choose a Microcontroller When:
                        Real-time response is critical (motor control, safety systems)
Power consumption must be minimal (battery-powered, always-on)
Cost per unit is a major concern (high-volume production)
Simple, dedicated functionality (sensors, actuators, basic UI)
Instant boot time required (no OS loading)

                    

                        
                        Choose a Microprocessor When:
                        Running a full operating system (Linux, Android)
Complex UI with graphics and touchscreen
Network connectivity with full TCP/IP stack
Multitasking with many simultaneous processes
Large data processing or storage requirements

                    

Processor Architectures

Harvard vs Von Neumann Architecture

These two fundamental architectures define how processors access instructions and data.

Von Neumann Architecture

Shared Memory Single Bus

Key characteristic: Single memory space for both instructions and data.

Single bus: Instructions and data share the same memory bus
Bottleneck: CPU can't fetch instruction while accessing data (Von Neumann bottleneck)
Flexibility: Self-modifying code possible, simpler design
Examples: x86 processors, ARM Cortex-A (at memory level)

Harvard Architecture

Separate Memory Dual Bus

Key characteristic: Separate memory and buses for instructions and data.

Parallel access: Can fetch instruction while reading/writing data
Higher throughput: No bus contention, better performance
Common in MCUs: Flash for code, SRAM for data
Examples: ARM Cortex-M, AVR, PIC microcontrollers

Modified Harvard: Many modern processors use a hybrid approach—separate L1 caches for instructions and data (Harvard-style), but unified main memory (Von Neumann-style).

RISC vs CISC

Instruction set architecture (ISA) philosophies differ in how they approach CPU design:

RISC (Reduced Instruction Set Computer)

ARM RISC-V MIPS

Simple instructions: Execute in one clock cycle
Load-store architecture: Only load/store access memory
Many registers: Reduce memory access
Fixed instruction length: Easier pipelining
Compiler complexity: More instructions, simpler hardware

CISC (Complex Instruction Set Computer)

x86 x86-64

Complex instructions: Single instruction can do multiple operations
Memory operands: Instructions can operate directly on memory
Variable instruction length: More compact code
Fewer instructions: Hardware complexity trades for code density
Modern x86: CISC externally, RISC-like execution internally

ARM Architecture Overview

ARM (Advanced RISC Machines) dominates embedded systems. Understanding ARM's processor families is essential:

                        
                        ARM Cortex Families:
                        Cortex-A (Application): High-performance, runs Linux/Android, MMU included. Examples: A53, A72, A78
Cortex-R (Real-time): Deterministic real-time, safety-critical systems. Examples: R4, R5, R52
Cortex-M (Microcontroller): Low-power, cost-effective, no MMU. Examples: M0, M3, M4, M7, M33

Memory Types in Embedded Systems

Flash Memory (Program Storage)

Flash memory stores your program code. It's non-volatile (retains data without power) and can be electrically erased and reprogrammed.

Memory hierarchy diagram showing Flash, SRAM, EEPROM, and registers with their speed and size characteristics — Embedded memory types and hierarchy: registers at the top for speed, Flash and EEPROM at the bottom for persistent storage

NOR Flash: Execute-in-place (XIP), random access, used for code storage
NAND Flash: Higher density, sequential access, used for mass storage
Endurance: Limited write cycles (typically 10,000-100,000)
Typical sizes: 16KB to 2MB in MCUs

// Flash memory is read-only at runtime
// Code executes directly from Flash (XIP)
const uint32_t lookup_table[] = {0, 1, 4, 9, 16, 25}; // Stored in Flash

SRAM (Data Memory)

Static RAM is fast, volatile memory used for runtime data—variables, stack, and heap.

Speed: Single clock cycle access
Volatility: Loses data when power is removed
No refresh: Unlike DRAM, doesn't need periodic refresh
Typical sizes: 4KB to 512KB in MCUs

// SRAM stores runtime variables
uint32_t counter = 0;        // Global variable in SRAM
uint8_t buffer[256];         // Array in SRAM

void function(void) {
    uint32_t local_var = 10; // Stack (also SRAM)
}

DRAM & DDR Memory in SoC Systems

While microcontrollers use on-chip SRAM (kilobytes), System-on-Chip (SoC) platforms like the TI AM335x, i.MX6, or Snapdragon processors use external DRAM (megabytes to gigabytes) as their main system memory. Understanding the DRAM family and DDR generations is essential for anyone working with embedded Linux, Android, or application-class embedded systems.

SRAM vs DRAM—The Fundamental Difference

SRAM (Static RAM): Uses 6 transistors per bit. No refresh needed. Fast (single-cycle access), but large and expensive per bit. Used for caches, on-chip MCU memory, and register files. Typical embedded sizes: 4KB–512KB.
DRAM (Dynamic RAM): Uses 1 transistor + 1 capacitor per bit. Needs periodic refresh (every 64 ms) because the capacitor charge leaks. Much higher density and lower cost per bit, but requires a dedicated DRAM controller to manage refresh cycles, timing, and command sequences. Typical embedded sizes: 128MB–8GB.

                        
                        Why DRAM Needs a Controller: Unlike SRAM which the CPU can read/write directly, DRAM requires a complex initialization sequence—setting CAS latency, RAS-to-CAS delay, refresh intervals, and hundreds of timing parameters. The DRAM controller is a dedicated hardware block on the SoC that translates simple CPU read/write requests into the precise command sequences (ACTIVATE → READ/WRITE → PRECHARGE) that DRAM chips require. This is why DDR must be initialized by software (SPL) before it can be used—the ROM bootloader doesn’t know which DDR chip is connected.
                    

DRAM Classification: Synchronous vs Asynchronous

The entire DRAM family splits into two historical eras based on how memory communicates with the processor:

Asynchronous DRAM (1970s–1990s): Operated independently of the system clock. The CPU would issue a memory request and simply wait for the DRAM to signal completion. Timing was controlled by analog delays, making it slower and harder to optimize at higher frequencies.
Synchronous DRAM (SDRAM, 1996–present): Coordinates all operations with the system clock. The memory controller and DRAM chip share the same clock signal, enabling pipelined burst reads and predictable latencies. All modern computers, phones, and embedded SoCs use some form of SDRAM.

                        
                        Why Synchronous Won: Asynchronous DRAM required the CPU to wait for analog timing margins on every access. As system clocks exceeded 100 MHz in the mid-1990s, these timing margins became impractical. Synchronous DRAM eliminated this by locking memory operations to clock edges—enabling burst transfers, pipelining, and predictable latency that scales with clock speed. The transition from EDO DRAM to SDRAM in ~1996 was the watershed moment in memory architecture.
                    

Modern Synchronous DRAM (SDRAM Families)

Virtually all memory used today is a form of SDRAM, categorized by specific use cases and performance targets:

Standard DDR (Double Data Rate) SDRAM

DDR SDRAM is the main system memory for desktops, laptops, servers, and embedded SoCs. It transfers data on both the rising and falling edges of the clock signal, doubling effective bandwidth compared to single data rate (SDR) SDRAM. Each generation improves bandwidth, reduces voltage, and increases density:

DDR Generation Comparison

Memory Standards SoC Design

Standard	Voltage	Clock (MHz)	Transfer Rate	Prefetch	Typical Use
DDR1	2.5V	100–200	200–400 MT/s	2n	Legacy (obsolete since ~2007)
DDR2	1.8V	200–533	400–1066 MT/s	4n	Legacy industrial SoCs, AM1808
DDR3	1.5V	400–1066	800–2133 MT/s	8n	AM335x (BeagleBone), i.MX6
DDR3L	1.35V	400–1066	800–2133 MT/s	8n	Low-power variants of DDR3 boards
DDR4	1.2V	800–1600	1600–3200 MT/s	8n	Raspberry Pi 5, i.MX8, Jetson
DDR5	1.1V	2400–4000	4800–8000+ MT/s	16n	Current mainstream standard for high-performance computing

Low-Power DDR (LPDDR)

LPDDR is a highly energy-efficient memory variant optimized for battery-powered devices—smartphones, tablets, ultra-thin laptops, and portable consoles (like the Nintendo Switch). LPDDR uses lower operating voltages, on-die termination, and deep power-down modes to minimize current draw. It is not pin-compatible with standard DDR.

LPDDR Generations

Mobile Memory Low Power

Standard	Voltage	Transfer Rate	Typical Use
LPDDR4/4X	1.1V / 0.6V	3200–4266 MT/s	Smartphones, Snapdragon, i.MX8M
LPDDR5	1.05V / 0.5V	6400 MT/s	Flagship mobile SoCs, automotive
LPDDR5X	1.05V / 0.5V	8533 MT/s	AI accelerators, Dimensity 9300, Snapdragon 8 Gen 3

Graphics DDR (GDDR)

GDDR is high-bandwidth memory built specifically for Graphics Processing Units (GPUs) and video cards. GPUs demand massive parallel bandwidth (hundreds of GB/s) rather than low latency, so GDDR optimizes for wide buses and high clock speeds at the cost of higher latency and power consumption compared to standard DDR.

GDDR6: Current mainstream for gaming GPUs—up to 16 Gbps per pin, 256-bit or 384-bit bus widths. Used in NVIDIA RTX 40-series, AMD RX 7000-series.
GDDR6X: NVIDIA-exclusive variant using PAM4 signaling (4 voltage levels per symbol) to double effective bandwidth—up to 21 Gbps per pin.
GDDR7: Next-generation standard (2024+) with PAM3 signaling, targeting 36+ Gbps per pin for future GPUs and AI inference cards.

High Bandwidth Memory (HBM)

HBM is a premium, 3D-stacked memory architecture designed for extreme data bandwidth in enterprise data centers, AI hardware, and high-performance computing clusters. Instead of placing memory chips next to the processor on a PCB (like DDR), HBM stacks multiple DRAM dies vertically and connects them to the processor via a silicon interposer with thousands of microscopic wires (Through-Silicon Vias, or TSVs).

HBM2e: Up to 460 GB/s per stack. Used in NVIDIA A100, AMD MI250.
HBM3: Up to 819 GB/s per stack. Used in NVIDIA H100, AMD MI300X.
HBM3e: Up to 1.2 TB/s per stack. Used in NVIDIA H200, B200 AI accelerators.

                        
                        Embedded Relevance: While GDDR and HBM are not directly used in typical embedded MCU systems, understanding the DRAM family tree matters because:
                        LPDDR4/5 is increasingly common in embedded Linux SoCs (automotive, robotics, edge AI)
GDDR6 appears in embedded GPU compute modules (Jetson AGX, automotive ADAS)
HBM is found in high-end embedded AI accelerators deployed at the edge
Certification exams (CompTIA A+, embedded systems certifications) test knowledge of the full DRAM family

                    

DDR Terminology Reference

                        
                        Key DDR Terminology:
                        MT/s (Megatransfers/second): The effective data rate. DDR3-1600 means 1600 million transfers per second, with an 800 MHz clock (double data rate).
Prefetch (2n, 4n, 8n, 16n): Number of bits fetched per access from the internal DRAM array. Higher prefetch = wider internal bus = higher bandwidth without faster cells.
CAS Latency (CL): Clock cycles between a read command and the first data output. Lower = faster, but tied to clock frequency.
tRCD, tRP, tRAS: Row-to-column delay, row precharge time, row active time—the timing parameters that the DDR controller must be configured with. These vary by manufacturer and part number.
ECC (Error Correcting Code): Optional wider data bus (72 bits vs 64) that detects and corrects single-bit errors. Common in industrial/automotive but rare in consumer embedded.
TSV (Through-Silicon Via): Vertical electrical connection through a silicon die, enabling 3D stacking in HBM.
PAM4 (Pulse Amplitude Modulation 4-level): Signaling technique used in GDDR6X that encodes 2 bits per symbol using 4 voltage levels instead of 2.

                    

DDR Configuration in Embedded Systems

// DDR3 timing parameters example (AM335x BeagleBone Black)
// These values are specific to the Kingston DDR3L chip used on BBB
// From: board/ti/am335x/board.c in U-Boot source

#include <asm/arch/ddr_defs.h>

const struct ddr_data ddr3_data = {
    .datardsratio0    = MT41K256M16HA125E_RD_DQS,
    .datawdsratio0    = MT41K256M16HA125E_WR_DQS,
    .datafwsratio0    = MT41K256M16HA125E_PHY_FIFO_WE,
    .datawrsratio0    = MT41K256M16HA125E_PHY_WR_DATA,
};

const struct cmd_control ddr3_cmd_ctrl_data = {
    .cmd0csratio  = MT41K256M16HA125E_RATIO,
    .cmd0iclkout  = MT41K256M16HA125E_INVERT_CLKOUT,
    .cmd1csratio  = MT41K256M16HA125E_RATIO,
    .cmd1iclkout  = MT41K256M16HA125E_INVERT_CLKOUT,
    .cmd2csratio  = MT41K256M16HA125E_RATIO,
    .cmd2iclkout  = MT41K256M16HA125E_INVERT_CLKOUT,
};

const struct emif_regs ddr3_emif_reg_data = {
    .sdram_config         = MT41K256M16HA125E_EMIF_SDCFG,
    .ref_ctrl             = MT41K256M16HA125E_EMIF_SDREF,
    .sdram_tim1           = MT41K256M16HA125E_EMIF_TIM1,
    .sdram_tim2           = MT41K256M16HA125E_EMIF_TIM2,
    .sdram_tim3           = MT41K256M16HA125E_EMIF_TIM3,
    .zq_config            = MT41K256M16HA125E_ZQ_CFG,
    .emif_ddr_phy_ctlr_1  = MT41K256M16HA125E_EMIF_READ_LATENCY,
};

Why DDR Tuning Matters: Every DDR chip has unique timing parameters specified in its datasheet. Using incorrect values causes silent data corruption, random crashes, or total boot failure. When designing a custom board with a different DDR chip than the reference design (e.g., switching from Kingston to Micron on an AM335x board), you must update the timing macros in the SPL source code, rebuild, and re-test. There is no “universal” DDR configuration. See the U-Boot article for a detailed explanation of why the ROM bootloader cannot handle DDR initialization.

Legacy Asynchronous DRAM (Historical)

Before SDRAM (pre-1996), all DRAM operated asynchronously—independently of the processor’s clock cycle. These types are no longer found in modern systems but are important for understanding memory evolution and appear in certification exams:

Asynchronous DRAM Types

Historical Memory Evolution

Type	Era	Key Innovation	Limitation
Conventional DRAM	1970s–1980s	First consumer DRAM form factor	CPU must wait for each access to fully complete before issuing the next command
FPM DRAM (Fast Page Mode)	1987–1995	Faster sequential reads within the same “page” (row) of memory—eliminates redundant row address strobes	Still asynchronous; latency improvements only within a single row
EDO DRAM (Extended Data Out)	1994–1998	Allows the next access cycle to begin before the previous cycle finishes (pipelined read)—~15% latency reduction over FPM	Maximum practical bus speed ~66 MHz; replaced by SDRAM at higher clocks

                        
                        The Transition Timeline: Conventional DRAM → FPM DRAM (~1987) → EDO DRAM (~1994) → SDRAM (~1996) → DDR (~2000) → DDR2 (~2003) → DDR3 (~2007) → DDR4 (~2014) → DDR5 (~2020). The jump from EDO to SDRAM was the most fundamental architectural change—every subsequent generation (DDR1–DDR5) builds on the synchronous foundation.
                    

MMC & eMMC Flash Storage

MMC (MultiMediaCard) is an early standard for solid-state digital storage introduced in 1997. It packages NAND flash memory and a simple storage controller into a single unit. While the original removable MMC cards are obsolete, the technology directly evolved into the eMMC (embedded MMC) chips used inside billions of modern electronics.

How MMC Works: “Managed NAND”

Historically, raw flash memory required the device’s main processor to manually manage complex storage tasks—bad block detection, wear leveling, error correction, and garbage collection. This consumed significant CPU cycles and required specialized driver code.

MMC changed this by combining NAND flash cells and a dedicated microcontroller into one package. The onboard controller handles all data-management tasks internally, presenting a simple block-device interface to the host processor. The CPU can issue read/write commands without worrying about flash management complexity.

                        
                        Raw NAND vs Managed NAND: In embedded systems, you’ll encounter both approaches. Raw NAND (accessed via a NAND Flash controller) gives you full control but requires a Flash Translation Layer (FTL) in software. Managed NAND (eMMC, SD cards, UFS) handles this internally. Most application-class embedded systems (running Linux/Android) prefer managed NAND for simplicity, while some cost-sensitive or performance-critical designs still use raw NAND with a software FTL.
                    

The Evolution: MMC → SD → eMMC

MMC Family Tree

Storage Standards Flash Memory

Standard	Form Factor	Status	Key Difference from MMC
Original MMC	Removable card (24×32×1.4 mm)	Obsolete	— (baseline standard)
SD Card	Removable card (24×32×2.1 mm)	Active	Added write-protection switch, encryption (CPRM), faster speeds, wider bus
eMMC	BGA chip (soldered to PCB)	Active	Permanently soldered onto motherboard; 8-bit parallel interface; built-in wear leveling, ECC, and bad block management
UFS	BGA chip (soldered to PCB)	Active (successor)	Serial interface (like SATA/NVMe), full-duplex, command queueing—much faster than eMMC

Where eMMC is Used in Embedded Systems

eMMC is the dominant internal storage for cost-sensitive embedded devices that need reliable block storage without the complexity or cost of an NVMe SSD:

Single-board computers: BeagleBone Black (4GB eMMC), many Raspberry Pi alternatives
Budget Chromebooks/laptops: Often labeled “64GB eMMC” or “128GB eMMC”
Smartphones and tablets: Entry-level devices (flagship phones use UFS)
IoT appliances: Smart TVs, streaming sticks, routers, industrial HMIs
Automotive infotainment: Head units, instrument clusters, telematics modules

eMMC vs NVMe SSD: Performance Comparison

eMMC vs SSD Storage Comparison

Design Decision Storage

Feature	eMMC 5.1	NVMe SSD (PCIe Gen3)
Interface	8-bit parallel (single lane)	PCIe x4 (4 lanes, serial)
Sequential read	~250–400 MB/s	~2,000–3,500 MB/s
Sequential write	~80–200 MB/s	~1,500–3,000 MB/s
Random 4K IOPS	~10K–15K	~200K–500K
Capacity range	4GB–256GB	128GB–8TB
Upgradeable	No (soldered BGA)	Yes (M.2 slot, removable)
Power draw	~0.2–0.5W	~2–8W (active)
Cost per GB	Lower	Higher
Best for	IoT, budget devices, boot storage	Laptops, desktops, servers, high-throughput workloads

# Device Tree snippet for AM335x eMMC (mmc1 controller)
# File: arch/arm/boot/dts/am335x-boneblack.dts
# eMMC appears as /dev/mmcblk0 with partitions mmcblk0p1, mmcblk0p2, etc.

# &mmc2 {
#     vmmc-supply = <&vmmcsd_fixed>;
#     pinctrl-names = "default";
#     pinctrl-0 = <&emmc_pins>;
#     bus-width = <8>;            /* 8-bit parallel bus */
#     non-removable;               /* soldered, not a card slot */
#     status = "okay";
# };

# Verify eMMC is detected after boot
cat /sys/class/mmc_host/mmc1/mmc1:0001/name
cat /sys/class/mmc_host/mmc1/mmc1:0001/csd
lsblk | grep mmcblk

# Common eMMC operations on embedded Linux
# Check eMMC device and partitions
lsblk /dev/mmcblk0

# Read eMMC device info (CID, CSD registers)
cat /sys/class/mmc_host/mmc1/mmc1:0001/name
cat /sys/class/mmc_host/mmc1/mmc1:0001/date
cat /sys/class/mmc_host/mmc1/mmc1:0001/fwrev

# Benchmark sequential read speed
dd if=/dev/mmcblk0 of=/dev/null bs=1M count=100 iflag=direct

# Benchmark sequential write speed (CAUTION: destructive to partition!)
# dd if=/dev/zero of=/dev/mmcblk0p2 bs=1M count=100 oflag=direct

# Flash an image to eMMC from SD card boot
dd if=/path/to/image.img of=/dev/mmcblk1 bs=4M status=progress
sync

                        
                        eMMC Design Considerations:
                        Write amplification: eMMC internally writes more data than requested due to garbage collection and wear leveling—avoid excessive small random writes in firmware design
No upgrade path: eMMC is permanently soldered; capacity cannot be increased post-manufacture. Plan storage requirements carefully during hardware design.
Boot partitions: eMMC has dedicated hardware boot partitions (boot0, boot1) separate from the user data area—commonly used for bootloaders (SPL, U-Boot)
Power-loss protection: Not all eMMC chips handle sudden power loss gracefully. For industrial/automotive, select parts with “power-off notification” (PON) support and use robust filesystems (ext4 with journaling, F2FS)

                    

EEPROM (Non-Volatile Data)

Electrically Erasable Programmable ROM stores configuration data that must survive power cycles. Unlike Flash memory which must be erased in large sectors (typically 4KB–128KB), EEPROM allows byte-level erase and write operations, making it ideal for storing small configuration values that change frequently during the device’s lifetime.

Byte-erasable: Can erase and rewrite individual bytes without affecting neighboring data
Slower writes: Typically 3–10 ms per byte (vs nanosecond reads)
Higher endurance: Often 1 million write cycles (vs 10K–100K for Flash)
Small capacity: Typically 256 bytes to 64KB in MCUs
Asymmetric speed: Reads are fast (nanoseconds), but writes require an internal charge pump and are orders of magnitude slower
Use cases: Calibration data, user settings, device IDs, boot configuration, error logs, counters

                        
                        How EEPROM Works Internally: EEPROM cells use floating-gate MOSFETs—the same transistor technology as Flash. Each cell stores a bit by trapping electrons on an insulated floating gate. To program a cell, a high voltage (~12–20V generated by an on-chip charge pump) forces electrons through a thin oxide layer via Fowler–Nordheim tunneling. To erase, the voltage is reversed to remove electrons. The key difference from Flash: EEPROM adds a select transistor per cell, enabling byte-level operations at the cost of larger cell size and lower density.
                    

// Writing to internal EEPROM (STM32L0/L1 series)
#include "stm32l0xx_hal.h"

#define EEPROM_BASE  0x08080000
#define EEPROM_END   0x080807FF  // 2KB EEPROM on STM32L053

void eeprom_write_halfword(uint32_t address, uint16_t value) {
    HAL_FLASHEx_DATAEEPROM_Unlock();
    HAL_FLASHEx_DATAEEPROM_Program(
        FLASH_TYPEPROGRAMDATA_HALFWORD,
        EEPROM_BASE + address,
        value
    );
    HAL_FLASHEx_DATAEEPROM_Lock();
}

// Reading from internal EEPROM (direct memory-mapped access)
#include <stdint.h>

#define EEPROM_BASE  0x08080000

uint16_t eeprom_read_halfword(uint32_t address) {
    // EEPROM is memory-mapped — read like any memory location
    return *(__IO uint16_t *)(EEPROM_BASE + address);
}

uint8_t eeprom_read_byte(uint32_t address) {
    return *(__IO uint8_t *)(EEPROM_BASE + address);
}

// Usage: restore calibration after power cycle
uint16_t saved_cal = eeprom_read_halfword(0x0000);
uint8_t  device_id = eeprom_read_byte(0x0010);

Internal vs External EEPROM

EEPROM comes in two forms: internal EEPROM built into the MCU die, and external EEPROM chips connected via I2C or SPI. The choice depends on capacity requirements, MCU availability, and system constraints.

Internal vs External EEPROM Comparison

Design Decision Hardware

Feature	Internal EEPROM	External EEPROM (I2C/SPI)
Capacity	256B – 16KB (MCU-dependent)	1KB – 2MB (e.g., AT24C256 = 32KB)
Access	Memory-mapped (direct read)	Bus protocol (I2C/SPI commands)
Read speed	Single clock cycle	Limited by bus speed (100–400 kHz I2C, MHz SPI)
Write speed	~3–5 ms per byte	~5–10 ms per page write
Endurance	100K – 1M cycles	1M – 4M cycles
Extra pins	None	SDA + SCL (I2C) or MOSI/MISO/SCK/CS (SPI)
Example parts	STM32L0 (2KB), AVR ATmega328 (1KB)	AT24C256 (I2C), 25LC256 (SPI), CAT24C512
Best for	Small configs, device IDs, flags	Large data logs, firmware backup, calibration tables

// Reading from external I2C EEPROM (AT24C256, 32KB)
// Using STM32 HAL — each snippet is self-contained
#include "stm32f4xx_hal.h"

#define EEPROM_I2C_ADDR  0xA0  // 7-bit: 0x50 left-shifted
#define EEPROM_PAGE_SIZE 64    // AT24C256 page size

extern I2C_HandleTypeDef hi2c1;

// Read N bytes from external EEPROM
HAL_StatusTypeDef eeprom_i2c_read(uint16_t mem_addr, uint8_t *data, uint16_t len) {
    return HAL_I2C_Mem_Read(
        &hi2c1,
        EEPROM_I2C_ADDR,
        mem_addr,
        I2C_MEMADD_SIZE_16BIT,  // AT24C256 uses 16-bit addresses
        data,
        len,
        HAL_MAX_DELAY
    );
}

// Writing to external I2C EEPROM (AT24C256)
// Must respect page boundaries and write cycle time
#include "stm32f4xx_hal.h"

#define EEPROM_I2C_ADDR  0xA0
#define EEPROM_PAGE_SIZE 64
#define EEPROM_WRITE_TIME 5  // 5ms max write cycle time

extern I2C_HandleTypeDef hi2c1;

// Write a single page (up to 64 bytes) — must not cross page boundary
HAL_StatusTypeDef eeprom_i2c_write_page(uint16_t mem_addr, uint8_t *data, uint16_t len) {
    HAL_StatusTypeDef status;

    status = HAL_I2C_Mem_Write(
        &hi2c1,
        EEPROM_I2C_ADDR,
        mem_addr,
        I2C_MEMADD_SIZE_16BIT,
        data,
        len,
        HAL_MAX_DELAY
    );

    // Wait for write cycle to complete
    HAL_Delay(EEPROM_WRITE_TIME);
    return status;
}

// Write arbitrary length — handles page boundary crossing
HAL_StatusTypeDef eeprom_i2c_write(uint16_t addr, uint8_t *data, uint16_t len) {
    while (len > 0) {
        uint16_t page_offset = addr % EEPROM_PAGE_SIZE;
        uint16_t bytes_in_page = EEPROM_PAGE_SIZE - page_offset;
        uint16_t chunk = (len < bytes_in_page) ? len : bytes_in_page;

        if (eeprom_i2c_write_page(addr, data, chunk) != HAL_OK)
            return HAL_ERROR;

        addr += chunk;
        data += chunk;
        len  -= chunk;
    }
    return HAL_OK;
}

                        
                        EEPROM Write Pitfalls:
                        Page boundary crossing: Writing across a page boundary wraps around and overwrites the beginning of the current page—always check alignment
Write cycle delay: The EEPROM is unresponsive for 5–10 ms after each write—poll the ACK bit or use a fixed delay
Power loss during write: Can corrupt the byte being written—consider CRC checksums for critical data
Endurance limit: 1M cycles sounds high, but a sensor logging every second exhausts it in ~11.5 days

                    

EEPROM Wear-Leveling & Best Practices

EEPROM cells degrade with each write cycle. Wear leveling distributes writes across multiple addresses to extend the effective lifetime of the memory. This is especially critical for data that updates frequently—counters, timestamps, and sensor logs.

Round-robin writing: Rotate writes across N slots—multiplies effective endurance by N
Write-back caching: Accumulate changes in SRAM and flush to EEPROM periodically (e.g., on shutdown or every N minutes)
Dirty flag pattern: Use a single status byte to track which EEPROM block holds the latest data
CRC/checksum validation: Store a CRC alongside data to detect corruption from incomplete writes or cell degradation

// Simple round-robin wear-leveling for a 16-bit counter
// Spreads writes across 64 EEPROM slots to extend lifetime 64x
#include <stdint.h>
#include <string.h>

#define EEPROM_BASE       0x08080000
#define SLOT_COUNT        64
#define SLOT_SIZE         4   // 2 bytes data + 2 bytes sequence number
#define WEAR_LEVEL_START  0x0100  // Offset in EEPROM for this data

typedef struct {
    uint16_t value;
    uint16_t seq_num;  // Incrementing sequence to find latest
} eeprom_slot_t;

// Find the slot with the highest sequence number (latest write)
uint8_t find_latest_slot(void) {
    uint16_t max_seq = 0;
    uint8_t  latest  = 0;

    for (uint8_t i = 0; i < SLOT_COUNT; i++) {
        uint32_t addr = EEPROM_BASE + WEAR_LEVEL_START + (i * SLOT_SIZE);
        eeprom_slot_t slot;
        memcpy(&slot, (void *)addr, sizeof(slot));

        if (slot.seq_num >= max_seq && slot.seq_num != 0xFFFF) {
            max_seq = slot.seq_num;
            latest  = i;
        }
    }
    return latest;
}

// Write new value to the next slot in rotation
void wear_level_write(uint16_t new_value) {
    uint8_t latest = find_latest_slot();
    uint32_t addr  = EEPROM_BASE + WEAR_LEVEL_START + (latest * SLOT_SIZE);
    eeprom_slot_t old_slot;
    memcpy(&old_slot, (void *)addr, sizeof(old_slot));

    uint8_t next = (latest + 1) % SLOT_COUNT;
    eeprom_slot_t new_slot = {
        .value   = new_value,
        .seq_num = old_slot.seq_num + 1
    };

    uint32_t new_addr = EEPROM_BASE + WEAR_LEVEL_START + (next * SLOT_SIZE);
    // Write new_slot to new_addr using HAL or direct write
}

Flash-Emulated EEPROM

Many modern MCUs (e.g., STM32F4, STM32G4, ESP32) don’t include dedicated EEPROM. Instead, they use a software technique called Flash-emulated EEPROM (or EEPROM emulation) that reserves one or two Flash sectors to simulate byte-level EEPROM operations on top of sector-erasable Flash.

                        
                        How Flash EEPROM Emulation Works:
                        Two-page scheme: Two Flash sectors alternate roles—one is “active” (current data), the other is “receiving” (for compaction)
Virtual addresses: Each stored variable is tagged with a 16-bit virtual address + 16-bit data pair
Append-only writes: New values are appended to the active page (Flash can only write 0-bits without erasing)
Page transfer: When the active page fills up, only the latest value for each virtual address is copied to the receiving page, then the old page is erased
Wear distribution: The two-page swap naturally distributes erases across both sectors

                    

EEPROM vs Flash-Emulated EEPROM

Comparison

Feature	True EEPROM	Flash-Emulated EEPROM
Erase granularity	Single byte	Full sector (4KB–128KB)
Write endurance	1M cycles per byte	10K–100K cycles per sector
Implementation	Hardware (dedicated silicon)	Software (driver + Flash sectors)
Flash overhead	None	Reserves 2 Flash sectors (e.g., 32KB)
Complexity	Simple read/write API	Requires emulation library / driver
Available on	STM32L0/L1, AVR, PIC	STM32F4, STM32G4, ESP32, NRF52

// Flash-emulated EEPROM using STM32 EEPROM Emulation library (AN4894)
// Requires linking ST's eeprom emulation middleware
#include "eeprom_emul.h"

#define VIRT_ADDR_CALIBRATION  0x0001
#define VIRT_ADDR_DEVICE_ID    0x0002
#define VIRT_ADDR_BOOT_COUNT   0x0003

void eeprom_emul_example(void) {
    // Initialize emulation (formats Flash pages on first run)
    EE_Status status = EE_Init(EE_FORCED_ERASE);
    if (status != EE_OK) {
        // Handle initialization error
        return;
    }

    // Write a virtual variable
    uint16_t cal_value = 0x1234;
    EE_WriteVariable16bits(VIRT_ADDR_CALIBRATION, cal_value);

    // Read it back
    uint16_t read_value = 0;
    EE_ReadVariable16bits(VIRT_ADDR_CALIBRATION, &read_value);
    // read_value == 0x1234

    // Increment boot counter
    uint16_t boot_count = 0;
    EE_ReadVariable16bits(VIRT_ADDR_BOOT_COUNT, &boot_count);
    EE_WriteVariable16bits(VIRT_ADDR_BOOT_COUNT, boot_count + 1);
}

Memory Map & Addressing

A memory map defines how the processor sees all addressable resources—memory, peripherals, and system registers.

Typical ARM Cortex-M Memory Map

32-bit Address Space 4GB Total

Address Range        | Region
---------------------|---------------------------
0x00000000-0x1FFFFFFF | Code (Flash)
0x20000000-0x3FFFFFFF | SRAM
0x40000000-0x5FFFFFFF | Peripherals
0x60000000-0x9FFFFFFF | External RAM
0xA0000000-0xDFFFFFFF | External Device
0xE0000000-0xE00FFFFF | Private Peripheral Bus (NVIC, SysTick)
0xE0100000-0xFFFFFFFF | Vendor-specific

Interrupts & Exception Handling

Interrupt Basics

Interrupts allow the processor to respond to events asynchronously—without continuously polling for them.

Flowchart showing the interrupt handling process from event trigger through ISR execution and context restoration — Interrupt handling flow: an event triggers the interrupt controller, the CPU saves context, executes the ISR, then restores context

                        
                        Interrupt Flow:
                        Event occurs (button press, timer overflow, data received)
Hardware signals interrupt request to CPU
CPU completes current instruction
CPU saves context (registers, PC) to stack
CPU jumps to Interrupt Service Routine (ISR)
ISR executes and returns
CPU restores context and resumes main code

                    

NVIC (Nested Vectored Interrupt Controller)

ARM Cortex-M processors include the NVIC—a powerful interrupt controller with these features:

Vectored: Each interrupt has a dedicated handler address
Nested: Higher priority interrupts can preempt lower priority ones
Programmable priorities: 0-255 priority levels (lower = higher priority)
Low latency: Tail-chaining and late arrival optimization

// Enable and configure NVIC interrupt
void setup_interrupt(void) {
    // Set priority (0 = highest, 15 = lowest for 4-bit priority)
    NVIC_SetPriority(EXTI0_IRQn, 2);
    
    // Enable the interrupt
    NVIC_EnableIRQ(EXTI0_IRQn);
}

// Interrupt Service Routine
void EXTI0_IRQHandler(void) {
    // Clear the interrupt flag FIRST
    EXTI->PR |= EXTI_PR_PR0;
    
    // Handle the interrupt
    toggle_led();
}

ISR Design Best Practices

                        
                        Golden Rules for ISRs:
                        Keep it short: Do minimal work, defer processing to main loop
Clear flags early: Prevent re-triggering
Use volatile: For variables shared with main code
Avoid blocking: Never use delays or wait loops
Be reentrant: Don't use non-reentrant library functions

                    

// Good ISR pattern
volatile uint8_t data_ready = 0;
volatile uint8_t rx_buffer[64];

void USART1_IRQHandler(void) {
    if (USART1->SR & USART_SR_RXNE) {
        rx_buffer[rx_index++] = USART1->DR; // Read clears flag
        data_ready = 1; // Signal main loop
    }
}

// Main loop processes data
int main(void) {
    while (1) {
        if (data_ready) {
            process_data(rx_buffer);
            data_ready = 0;
        }
    }
}

Real-Time Constraints

Hard vs Soft Real-Time

Real-time systems must respond to events within specified time constraints. The consequences of missing deadlines distinguish two categories:

Hard Real-Time

Critical Deadlines

Missing a deadline = system failure. Consequences can be catastrophic.

Automotive airbag deployment (must deploy within milliseconds)
Aircraft flight control systems
Industrial robot arm positioning
Pacemaker pulse timing

Soft Real-Time

Flexible Deadlines

Missing deadlines degrades quality but doesn't cause failure.

Video streaming (dropped frames = quality loss)
Audio processing (glitches are annoying, not fatal)
User interface responsiveness
Network packet processing

Timing Analysis

Key timing metrics for embedded systems:

Interrupt latency: Time from interrupt signal to ISR start (typically 12-20 cycles on Cortex-M)
Response time: Total time from event to system response
Jitter: Variation in response time
WCET (Worst-Case Execution Time): Maximum time a code section can take

// Measuring execution time
#include "stm32f4xx.h"

void measure_timing(void) {
    // Enable DWT cycle counter
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
    DWT->CYCCNT = 0;
    DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
    
    uint32_t start = DWT->CYCCNT;
    
    // Code to measure
    critical_function();
    
    uint32_t cycles = DWT->CYCCNT - start;
    float time_us = (float)cycles / (SystemCoreClock / 1000000);
}

Determinism in Embedded Systems

Deterministic behavior means the system's response time is predictable and bounded.

                        
                        Achieving Determinism:
                        Avoid dynamic memory allocation (malloc/free have variable timing)
Use fixed iteration counts in loops
Disable interrupts for critical sections (minimize duration)
Use priority-based scheduling with bounded priorities
Avoid caches or understand their behavior

                    

Development Tools & Workflow

Toolchains & Cross-Compilation

Embedded development uses cross-compilation—compiling on one platform (host PC) for another (target MCU).

ARM GCC Toolchain Components

arm-none-eabi-gcc

arm-none-eabi-gcc: C/C++ compiler
arm-none-eabi-as: Assembler
arm-none-eabi-ld: Linker
arm-none-eabi-objcopy: Convert ELF to binary/hex
arm-none-eabi-gdb: Debugger
arm-none-eabi-size: Show memory usage

# Basic compilation command
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -O2 \
    -c main.c -o main.o

# Linking with linker script
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \
    -T linker.ld -o firmware.elf main.o startup.o

# Convert to binary for flashing
arm-none-eabi-objcopy -O binary firmware.elf firmware.bin

# Check memory usage
arm-none-eabi-size firmware.elf

Debuggers & JTAG/SWD

Hardware debuggers connect your PC to the target MCU:

JTAG: Older standard, 4-5 wire, supports multiple devices in chain
SWD (Serial Wire Debug): ARM-specific, 2-wire, faster for single target

Popular Debug Probes

Hardware Debuggers

ST-Link: Bundled with STM32 development boards
J-Link: Professional grade, fast, feature-rich
Black Magic Probe: Open-source, GDB server built-in
DAP-Link: Open-source, drag-and-drop programming

IDEs for Embedded Development

                        
                        Popular IDEs:
                        STM32CubeIDE: Free, Eclipse-based, excellent STM32 support
Keil MDK: Industry standard, ARM compiler, commercial
IAR Embedded Workbench: Professional, excellent optimization
PlatformIO: VS Code extension, multi-platform
Eclipse + GNU MCU: Free, open-source, flexible

                    

Bare-Metal Programming Basics

Bare-metal programming means writing code that runs directly on hardware without an operating system. Your code has complete control—and complete responsibility.

// Minimal bare-metal program structure
#include "stm32f4xx.h"

// Vector table (defined in startup code)
// Reset handler is the entry point

int main(void) {
    // 1. Initialize system clock
    SystemInit();
    
    // 2. Enable peripheral clocks
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
    
    // 3. Configure peripherals
    GPIOA->MODER |= GPIO_MODER_MODER5_0; // PA5 as output
    
    // 4. Main loop
    while (1) {
        GPIOA->ODR ^= GPIO_ODR_OD5; // Toggle LED
        for (volatile int i = 0; i < 100000; i++); // Delay
    }
}

                        
                        Bare-Metal Essentials:
                        Startup code: Initializes stack, copies data, zeroes BSS
Linker script: Defines memory layout (Flash, RAM regions)
Vector table: Array of function pointers for exception handlers
System initialization: Clock configuration, peripheral enables
Super loop: Main while(1) loop with polling or interrupt-driven logic

                    

Conclusion & What's Next

You've now built a solid foundation in embedded systems fundamentals. You understand the key differences between microcontrollers and microprocessors, how Harvard and Von Neumann architectures work, the memory types available in MCUs, interrupt handling patterns, and real-time system requirements.

                        
                        Key Takeaways:
                        Microcontrollers are self-contained systems; microprocessors need external support
Harvard architecture enables parallel instruction/data access
Flash stores code, SRAM stores runtime data, EEPROM stores persistent configuration
Keep ISRs short, clear flags early, use volatile for shared variables
Hard real-time systems cannot miss deadlines; soft real-time tolerates some misses

                    

In Part 2, we'll dive deep into STM32 and ARM Cortex-M development—configuring peripherals, using HAL libraries, and building real embedded applications.

Next Steps

Technology

Cookie Consent