Back to Technology

Embedded Systems Series Part 1: Fundamentals & Architecture

January 25, 2026 Wasil Zafar 35 min read

Master the building blocks of embedded systems—microcontroller vs microprocessor architectures, memory types, Harvard vs Von Neumann, interrupts, and real-time constraints.

Table of Contents

  1. Introduction: What Are Embedded Systems?
  2. Microcontroller vs Microprocessor
  3. Processor Architectures
  4. Memory Types in Embedded Systems
  5. Interrupts & Exception Handling
  6. Real-Time Constraints
  7. Development Tools & Workflow
  8. Bare-Metal Programming Basics
  9. Conclusion & Next Steps

Introduction: What Are Embedded Systems?

Series Navigation: This is Part 1 of the 12-part Embedded Systems Series. Start here to build a solid foundation in embedded development.

An embedded system is a computer system designed to perform dedicated functions within a larger mechanical or electronic system. Unlike general-purpose computers, embedded systems are optimized for specific tasks—from controlling your car's engine to managing a smart thermostat.

Embedded systems are everywhere: your microwave, washing machine, smartphone, car (with 50-100+ embedded controllers), medical devices, industrial robots, and the aircraft autopilot. The global embedded systems market exceeds $100 billion annually, making this one of the most important domains in computing.

Key Characteristics of Embedded Systems:
  • Task-specific: Designed for one or a few dedicated functions
  • Resource-constrained: Limited memory, processing power, and energy
  • Real-time requirements: Must respond within strict timing deadlines
  • Reliability: Must operate continuously without crashes or reboots
  • Cost-sensitive: Often produced in high volumes with tight margins

Microcontroller vs Microprocessor

Understanding the difference between microcontrollers (MCUs) and microprocessors (MPUs) is fundamental to embedded systems design. Both are integrated circuits that execute instructions, but their architecture and use cases differ significantly.

Microcontroller Architecture

A microcontroller is a self-contained "system on a chip" (SoC) that integrates:

  • CPU core: The processing unit (e.g., ARM Cortex-M, AVR, PIC)
  • Flash memory: Non-volatile storage for program code (typically 16KB to 2MB)
  • SRAM: Volatile memory for runtime data (typically 4KB to 512KB)
  • Peripherals: GPIO, timers, UART, SPI, I2C, ADC, DAC
  • Clock system: Internal oscillators and PLLs

Popular Microcontroller Families

ARM Cortex-M AVR PIC
  • STM32 (ARM Cortex-M): Industry standard, extensive ecosystem, 32-bit
  • ESP32: WiFi/Bluetooth built-in, great for IoT projects
  • ATmega328 (AVR): Powers Arduino Uno, beginner-friendly
  • PIC: Microchip's family, popular in industrial applications
  • Nordic nRF52: Low-power Bluetooth, wearables and sensors

Microprocessor Architecture

A microprocessor is primarily a CPU that requires external components:

  • External RAM: DDR3/DDR4 memory modules (GBs of capacity)
  • External storage: eMMC, SD card, NVMe for OS and data
  • Support chips: Power management, memory controllers
  • Higher performance: Complex instruction sets, caches, MMU

Popular Microprocessor Families

ARM Cortex-A RISC-V
  • ARM Cortex-A (A53, A72, A78): Smartphones, tablets, Raspberry Pi
  • Qualcomm Snapdragon: Mobile SoCs with integrated GPU, DSP
  • Intel Atom: Low-power x86 for embedded PCs
  • RISC-V: Open-source ISA gaining traction

When to Use Which

Choose a Microcontroller When:
  • Real-time response is critical (motor control, safety systems)
  • Power consumption must be minimal (battery-powered, always-on)
  • Cost per unit is a major concern (high-volume production)
  • Simple, dedicated functionality (sensors, actuators, basic UI)
  • Instant boot time required (no OS loading)
Choose a Microprocessor When:
  • Running a full operating system (Linux, Android)
  • Complex UI with graphics and touchscreen
  • Network connectivity with full TCP/IP stack
  • Multitasking with many simultaneous processes
  • Large data processing or storage requirements

Processor Architectures

Harvard vs Von Neumann Architecture

These two fundamental architectures define how processors access instructions and data.

Von Neumann Architecture

Shared Memory Single Bus

Key characteristic: Single memory space for both instructions and data.

  • Single bus: Instructions and data share the same memory bus
  • Bottleneck: CPU can't fetch instruction while accessing data (Von Neumann bottleneck)
  • Flexibility: Self-modifying code possible, simpler design
  • Examples: x86 processors, ARM Cortex-A (at memory level)

Harvard Architecture

Separate Memory Dual Bus

Key characteristic: Separate memory and buses for instructions and data.

  • Parallel access: Can fetch instruction while reading/writing data
  • Higher throughput: No bus contention, better performance
  • Common in MCUs: Flash for code, SRAM for data
  • Examples: ARM Cortex-M, AVR, PIC microcontrollers

Modified Harvard: Many modern processors use a hybrid approach—separate L1 caches for instructions and data (Harvard-style), but unified main memory (Von Neumann-style).

RISC vs CISC

Instruction set architecture (ISA) philosophies differ in how they approach CPU design:

RISC (Reduced Instruction Set Computer)

ARM RISC-V MIPS
  • Simple instructions: Execute in one clock cycle
  • Load-store architecture: Only load/store access memory
  • Many registers: Reduce memory access
  • Fixed instruction length: Easier pipelining
  • Compiler complexity: More instructions, simpler hardware

CISC (Complex Instruction Set Computer)

x86 x86-64
  • Complex instructions: Single instruction can do multiple operations
  • Memory operands: Instructions can operate directly on memory
  • Variable instruction length: More compact code
  • Fewer instructions: Hardware complexity trades for code density
  • Modern x86: CISC externally, RISC-like execution internally

ARM Architecture Overview

ARM (Advanced RISC Machines) dominates embedded systems. Understanding ARM's processor families is essential:

ARM Cortex Families:
  • Cortex-A (Application): High-performance, runs Linux/Android, MMU included. Examples: A53, A72, A78
  • Cortex-R (Real-time): Deterministic real-time, safety-critical systems. Examples: R4, R5, R52
  • Cortex-M (Microcontroller): Low-power, cost-effective, no MMU. Examples: M0, M3, M4, M7, M33

Memory Types in Embedded Systems

Flash Memory (Program Storage)

Flash memory stores your program code. It's non-volatile (retains data without power) and can be electrically erased and reprogrammed.

  • NOR Flash: Execute-in-place (XIP), random access, used for code storage
  • NAND Flash: Higher density, sequential access, used for mass storage
  • Endurance: Limited write cycles (typically 10,000-100,000)
  • Typical sizes: 16KB to 2MB in MCUs
// Flash memory is read-only at runtime
// Code executes directly from Flash (XIP)
const uint32_t lookup_table[] = {0, 1, 4, 9, 16, 25}; // Stored in Flash

SRAM (Data Memory)

Static RAM is fast, volatile memory used for runtime data—variables, stack, and heap.

  • Speed: Single clock cycle access
  • Volatility: Loses data when power is removed
  • No refresh: Unlike DRAM, doesn't need periodic refresh
  • Typical sizes: 4KB to 512KB in MCUs
// SRAM stores runtime variables
uint32_t counter = 0;        // Global variable in SRAM
uint8_t buffer[256];         // Array in SRAM

void function(void) {
    uint32_t local_var = 10; // Stack (also SRAM)
}

EEPROM (Non-Volatile Data)

Electrically Erasable Programmable ROM stores configuration data that must survive power cycles.

  • Byte-erasable: Can erase single bytes (unlike Flash which erases sectors)
  • Slower writes: Milliseconds per byte
  • Higher endurance: Often 1 million write cycles
  • Use cases: Calibration data, user settings, device IDs
// EEPROM for persistent configuration
#define EEPROM_BASE 0x08080000

void save_calibration(uint16_t value) {
    // Write to EEPROM (device-specific)
    HAL_FLASHEx_DATAEEPROM_Unlock();
    HAL_FLASHEx_DATAEEPROM_Program(
        FLASH_TYPEPROGRAMDATA_HALFWORD,
        EEPROM_BASE,
        value
    );
    HAL_FLASHEx_DATAEEPROM_Lock();
}

Memory Map & Addressing

A memory map defines how the processor sees all addressable resources—memory, peripherals, and system registers.

Typical ARM Cortex-M Memory Map

32-bit Address Space 4GB Total
Address Range        | Region
---------------------|---------------------------
0x00000000-0x1FFFFFFF | Code (Flash)
0x20000000-0x3FFFFFFF | SRAM
0x40000000-0x5FFFFFFF | Peripherals
0x60000000-0x9FFFFFFF | External RAM
0xA0000000-0xDFFFFFFF | External Device
0xE0000000-0xE00FFFFF | Private Peripheral Bus (NVIC, SysTick)
0xE0100000-0xFFFFFFFF | Vendor-specific

Interrupts & Exception Handling

Interrupt Basics

Interrupts allow the processor to respond to events asynchronously—without continuously polling for them.

Interrupt Flow:
  1. Event occurs (button press, timer overflow, data received)
  2. Hardware signals interrupt request to CPU
  3. CPU completes current instruction
  4. CPU saves context (registers, PC) to stack
  5. CPU jumps to Interrupt Service Routine (ISR)
  6. ISR executes and returns
  7. CPU restores context and resumes main code

NVIC (Nested Vectored Interrupt Controller)

ARM Cortex-M processors include the NVIC—a powerful interrupt controller with these features:

  • Vectored: Each interrupt has a dedicated handler address
  • Nested: Higher priority interrupts can preempt lower priority ones
  • Programmable priorities: 0-255 priority levels (lower = higher priority)
  • Low latency: Tail-chaining and late arrival optimization
// Enable and configure NVIC interrupt
void setup_interrupt(void) {
    // Set priority (0 = highest, 15 = lowest for 4-bit priority)
    NVIC_SetPriority(EXTI0_IRQn, 2);
    
    // Enable the interrupt
    NVIC_EnableIRQ(EXTI0_IRQn);
}

// Interrupt Service Routine
void EXTI0_IRQHandler(void) {
    // Clear the interrupt flag FIRST
    EXTI->PR |= EXTI_PR_PR0;
    
    // Handle the interrupt
    toggle_led();
}

ISR Design Best Practices

Golden Rules for ISRs:
  • Keep it short: Do minimal work, defer processing to main loop
  • Clear flags early: Prevent re-triggering
  • Use volatile: For variables shared with main code
  • Avoid blocking: Never use delays or wait loops
  • Be reentrant: Don't use non-reentrant library functions
// Good ISR pattern
volatile uint8_t data_ready = 0;
volatile uint8_t rx_buffer[64];

void USART1_IRQHandler(void) {
    if (USART1->SR & USART_SR_RXNE) {
        rx_buffer[rx_index++] = USART1->DR; // Read clears flag
        data_ready = 1; // Signal main loop
    }
}

// Main loop processes data
int main(void) {
    while (1) {
        if (data_ready) {
            process_data(rx_buffer);
            data_ready = 0;
        }
    }
}

Real-Time Constraints

Hard vs Soft Real-Time

Real-time systems must respond to events within specified time constraints. The consequences of missing deadlines distinguish two categories:

Hard Real-Time

Critical Deadlines

Missing a deadline = system failure. Consequences can be catastrophic.

  • Automotive airbag deployment (must deploy within milliseconds)
  • Aircraft flight control systems
  • Industrial robot arm positioning
  • Pacemaker pulse timing

Soft Real-Time

Flexible Deadlines

Missing deadlines degrades quality but doesn't cause failure.

  • Video streaming (dropped frames = quality loss)
  • Audio processing (glitches are annoying, not fatal)
  • User interface responsiveness
  • Network packet processing

Timing Analysis

Key timing metrics for embedded systems:

  • Interrupt latency: Time from interrupt signal to ISR start (typically 12-20 cycles on Cortex-M)
  • Response time: Total time from event to system response
  • Jitter: Variation in response time
  • WCET (Worst-Case Execution Time): Maximum time a code section can take
// Measuring execution time
#include "stm32f4xx.h"

void measure_timing(void) {
    // Enable DWT cycle counter
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
    DWT->CYCCNT = 0;
    DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
    
    uint32_t start = DWT->CYCCNT;
    
    // Code to measure
    critical_function();
    
    uint32_t cycles = DWT->CYCCNT - start;
    float time_us = (float)cycles / (SystemCoreClock / 1000000);
}

Determinism in Embedded Systems

Deterministic behavior means the system's response time is predictable and bounded.

Achieving Determinism:
  • Avoid dynamic memory allocation (malloc/free have variable timing)
  • Use fixed iteration counts in loops
  • Disable interrupts for critical sections (minimize duration)
  • Use priority-based scheduling with bounded priorities
  • Avoid caches or understand their behavior

Development Tools & Workflow

Toolchains & Cross-Compilation

Embedded development uses cross-compilation—compiling on one platform (host PC) for another (target MCU).

ARM GCC Toolchain Components

arm-none-eabi-gcc
  • arm-none-eabi-gcc: C/C++ compiler
  • arm-none-eabi-as: Assembler
  • arm-none-eabi-ld: Linker
  • arm-none-eabi-objcopy: Convert ELF to binary/hex
  • arm-none-eabi-gdb: Debugger
  • arm-none-eabi-size: Show memory usage
# Basic compilation command
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -O2 \
    -c main.c -o main.o

# Linking with linker script
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \
    -T linker.ld -o firmware.elf main.o startup.o

# Convert to binary for flashing
arm-none-eabi-objcopy -O binary firmware.elf firmware.bin

# Check memory usage
arm-none-eabi-size firmware.elf

Debuggers & JTAG/SWD

Hardware debuggers connect your PC to the target MCU:

  • JTAG: Older standard, 4-5 wire, supports multiple devices in chain
  • SWD (Serial Wire Debug): ARM-specific, 2-wire, faster for single target

Popular Debug Probes

Hardware Debuggers
  • ST-Link: Bundled with STM32 development boards
  • J-Link: Professional grade, fast, feature-rich
  • Black Magic Probe: Open-source, GDB server built-in
  • DAP-Link: Open-source, drag-and-drop programming

IDEs for Embedded Development

Popular IDEs:
  • STM32CubeIDE: Free, Eclipse-based, excellent STM32 support
  • Keil MDK: Industry standard, ARM compiler, commercial
  • IAR Embedded Workbench: Professional, excellent optimization
  • PlatformIO: VS Code extension, multi-platform
  • Eclipse + GNU MCU: Free, open-source, flexible

Bare-Metal Programming Basics

Bare-metal programming means writing code that runs directly on hardware without an operating system. Your code has complete control—and complete responsibility.

// Minimal bare-metal program structure
#include "stm32f4xx.h"

// Vector table (defined in startup code)
// Reset handler is the entry point

int main(void) {
    // 1. Initialize system clock
    SystemInit();
    
    // 2. Enable peripheral clocks
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
    
    // 3. Configure peripherals
    GPIOA->MODER |= GPIO_MODER_MODER5_0; // PA5 as output
    
    // 4. Main loop
    while (1) {
        GPIOA->ODR ^= GPIO_ODR_OD5; // Toggle LED
        for (volatile int i = 0; i < 100000; i++); // Delay
    }
}
Bare-Metal Essentials:
  • Startup code: Initializes stack, copies data, zeroes BSS
  • Linker script: Defines memory layout (Flash, RAM regions)
  • Vector table: Array of function pointers for exception handlers
  • System initialization: Clock configuration, peripheral enables
  • Super loop: Main while(1) loop with polling or interrupt-driven logic

Conclusion & What's Next

You've now built a solid foundation in embedded systems fundamentals. You understand the key differences between microcontrollers and microprocessors, how Harvard and Von Neumann architectures work, the memory types available in MCUs, interrupt handling patterns, and real-time system requirements.

Key Takeaways:
  • Microcontrollers are self-contained systems; microprocessors need external support
  • Harvard architecture enables parallel instruction/data access
  • Flash stores code, SRAM stores runtime data, EEPROM stores persistent configuration
  • Keep ISRs short, clear flags early, use volatile for shared variables
  • Hard real-time systems cannot miss deadlines; soft real-time tolerates some misses

In Part 2, we'll dive deep into STM32 and ARM Cortex-M development—configuring peripherals, using HAL libraries, and building real embedded applications.

Next Steps

Technology