Introduction: What Are Embedded Systems?
Embedded Systems Mastery
Fundamentals & Architecture
Microcontrollers, memory, interruptsSTM32 & ARM Cortex-M Development
ARM architecture, peripherals, HALRTOS Fundamentals (FreeRTOS/Zephyr)
Task management, scheduling, synchronizationCommunication Protocols Deep Dive
UART, SPI, I2C, CAN, USBEmbedded Linux Fundamentals
Linux kernel, userspace, filesystemU-Boot Bootloader Mastery
Boot process, configuration, customizationLinux Device Drivers
Character, block, network driversLinux Kernel Customization
Kernel configuration, modules, debuggingAndroid System Architecture
Android layers, services, frameworkAndroid HAL & Native Development
HAL interfaces, NDK, JNIAndroid BSP & Kernel
BSP development, kernel integrationDebugging & Optimization
JTAG, GDB, profiling, optimizationAn embedded system is a computer system designed to perform dedicated functions within a larger mechanical or electronic system. Unlike general-purpose computers, embedded systems are optimized for specific tasks—from controlling your car's engine to managing a smart thermostat.
Embedded systems are everywhere: your microwave, washing machine, smartphone, car (with 50-100+ embedded controllers), medical devices, industrial robots, and the aircraft autopilot. The global embedded systems market exceeds $100 billion annually, making this one of the most important domains in computing.
- Task-specific: Designed for one or a few dedicated functions
- Resource-constrained: Limited memory, processing power, and energy
- Real-time requirements: Must respond within strict timing deadlines
- Reliability: Must operate continuously without crashes or reboots
- Cost-sensitive: Often produced in high volumes with tight margins
Microcontroller vs Microprocessor
Understanding the difference between microcontrollers (MCUs) and microprocessors (MPUs) is fundamental to embedded systems design. Both are integrated circuits that execute instructions, but their architecture and use cases differ significantly.
Microcontroller Architecture
A microcontroller is a self-contained "system on a chip" (SoC) that integrates:
- CPU core: The processing unit (e.g., ARM Cortex-M, AVR, PIC)
- Flash memory: Non-volatile storage for program code (typically 16KB to 2MB)
- SRAM: Volatile memory for runtime data (typically 4KB to 512KB)
- Peripherals: GPIO, timers, UART, SPI, I2C, ADC, DAC
- Clock system: Internal oscillators and PLLs
Popular Microcontroller Families
- STM32 (ARM Cortex-M): Industry standard, extensive ecosystem, 32-bit
- ESP32: WiFi/Bluetooth built-in, great for IoT projects
- ATmega328 (AVR): Powers Arduino Uno, beginner-friendly
- PIC: Microchip's family, popular in industrial applications
- Nordic nRF52: Low-power Bluetooth, wearables and sensors
Microprocessor Architecture
A microprocessor is primarily a CPU that requires external components:
- External RAM: DDR3/DDR4 memory modules (GBs of capacity)
- External storage: eMMC, SD card, NVMe for OS and data
- Support chips: Power management, memory controllers
- Higher performance: Complex instruction sets, caches, MMU
Popular Microprocessor Families
- ARM Cortex-A (A53, A72, A78): Smartphones, tablets, Raspberry Pi
- Qualcomm Snapdragon: Mobile SoCs with integrated GPU, DSP
- Intel Atom: Low-power x86 for embedded PCs
- RISC-V: Open-source ISA gaining traction
When to Use Which
- Real-time response is critical (motor control, safety systems)
- Power consumption must be minimal (battery-powered, always-on)
- Cost per unit is a major concern (high-volume production)
- Simple, dedicated functionality (sensors, actuators, basic UI)
- Instant boot time required (no OS loading)
- Running a full operating system (Linux, Android)
- Complex UI with graphics and touchscreen
- Network connectivity with full TCP/IP stack
- Multitasking with many simultaneous processes
- Large data processing or storage requirements
Processor Architectures
Harvard vs Von Neumann Architecture
These two fundamental architectures define how processors access instructions and data.
Von Neumann Architecture
Key characteristic: Single memory space for both instructions and data.
- Single bus: Instructions and data share the same memory bus
- Bottleneck: CPU can't fetch instruction while accessing data (Von Neumann bottleneck)
- Flexibility: Self-modifying code possible, simpler design
- Examples: x86 processors, ARM Cortex-A (at memory level)
Harvard Architecture
Key characteristic: Separate memory and buses for instructions and data.
- Parallel access: Can fetch instruction while reading/writing data
- Higher throughput: No bus contention, better performance
- Common in MCUs: Flash for code, SRAM for data
- Examples: ARM Cortex-M, AVR, PIC microcontrollers
Modified Harvard: Many modern processors use a hybrid approach—separate L1 caches for instructions and data (Harvard-style), but unified main memory (Von Neumann-style).
RISC vs CISC
Instruction set architecture (ISA) philosophies differ in how they approach CPU design:
RISC (Reduced Instruction Set Computer)
- Simple instructions: Execute in one clock cycle
- Load-store architecture: Only load/store access memory
- Many registers: Reduce memory access
- Fixed instruction length: Easier pipelining
- Compiler complexity: More instructions, simpler hardware
CISC (Complex Instruction Set Computer)
- Complex instructions: Single instruction can do multiple operations
- Memory operands: Instructions can operate directly on memory
- Variable instruction length: More compact code
- Fewer instructions: Hardware complexity trades for code density
- Modern x86: CISC externally, RISC-like execution internally
ARM Architecture Overview
ARM (Advanced RISC Machines) dominates embedded systems. Understanding ARM's processor families is essential:
- Cortex-A (Application): High-performance, runs Linux/Android, MMU included. Examples: A53, A72, A78
- Cortex-R (Real-time): Deterministic real-time, safety-critical systems. Examples: R4, R5, R52
- Cortex-M (Microcontroller): Low-power, cost-effective, no MMU. Examples: M0, M3, M4, M7, M33
Memory Types in Embedded Systems
Flash Memory (Program Storage)
Flash memory stores your program code. It's non-volatile (retains data without power) and can be electrically erased and reprogrammed.
- NOR Flash: Execute-in-place (XIP), random access, used for code storage
- NAND Flash: Higher density, sequential access, used for mass storage
- Endurance: Limited write cycles (typically 10,000-100,000)
- Typical sizes: 16KB to 2MB in MCUs
// Flash memory is read-only at runtime
// Code executes directly from Flash (XIP)
const uint32_t lookup_table[] = {0, 1, 4, 9, 16, 25}; // Stored in Flash
SRAM (Data Memory)
Static RAM is fast, volatile memory used for runtime data—variables, stack, and heap.
- Speed: Single clock cycle access
- Volatility: Loses data when power is removed
- No refresh: Unlike DRAM, doesn't need periodic refresh
- Typical sizes: 4KB to 512KB in MCUs
// SRAM stores runtime variables
uint32_t counter = 0; // Global variable in SRAM
uint8_t buffer[256]; // Array in SRAM
void function(void) {
uint32_t local_var = 10; // Stack (also SRAM)
}
EEPROM (Non-Volatile Data)
Electrically Erasable Programmable ROM stores configuration data that must survive power cycles.
- Byte-erasable: Can erase single bytes (unlike Flash which erases sectors)
- Slower writes: Milliseconds per byte
- Higher endurance: Often 1 million write cycles
- Use cases: Calibration data, user settings, device IDs
// EEPROM for persistent configuration
#define EEPROM_BASE 0x08080000
void save_calibration(uint16_t value) {
// Write to EEPROM (device-specific)
HAL_FLASHEx_DATAEEPROM_Unlock();
HAL_FLASHEx_DATAEEPROM_Program(
FLASH_TYPEPROGRAMDATA_HALFWORD,
EEPROM_BASE,
value
);
HAL_FLASHEx_DATAEEPROM_Lock();
}
Memory Map & Addressing
A memory map defines how the processor sees all addressable resources—memory, peripherals, and system registers.
Typical ARM Cortex-M Memory Map
Address Range | Region
---------------------|---------------------------
0x00000000-0x1FFFFFFF | Code (Flash)
0x20000000-0x3FFFFFFF | SRAM
0x40000000-0x5FFFFFFF | Peripherals
0x60000000-0x9FFFFFFF | External RAM
0xA0000000-0xDFFFFFFF | External Device
0xE0000000-0xE00FFFFF | Private Peripheral Bus (NVIC, SysTick)
0xE0100000-0xFFFFFFFF | Vendor-specific
Interrupts & Exception Handling
Interrupt Basics
Interrupts allow the processor to respond to events asynchronously—without continuously polling for them.
- Event occurs (button press, timer overflow, data received)
- Hardware signals interrupt request to CPU
- CPU completes current instruction
- CPU saves context (registers, PC) to stack
- CPU jumps to Interrupt Service Routine (ISR)
- ISR executes and returns
- CPU restores context and resumes main code
NVIC (Nested Vectored Interrupt Controller)
ARM Cortex-M processors include the NVIC—a powerful interrupt controller with these features:
- Vectored: Each interrupt has a dedicated handler address
- Nested: Higher priority interrupts can preempt lower priority ones
- Programmable priorities: 0-255 priority levels (lower = higher priority)
- Low latency: Tail-chaining and late arrival optimization
// Enable and configure NVIC interrupt
void setup_interrupt(void) {
// Set priority (0 = highest, 15 = lowest for 4-bit priority)
NVIC_SetPriority(EXTI0_IRQn, 2);
// Enable the interrupt
NVIC_EnableIRQ(EXTI0_IRQn);
}
// Interrupt Service Routine
void EXTI0_IRQHandler(void) {
// Clear the interrupt flag FIRST
EXTI->PR |= EXTI_PR_PR0;
// Handle the interrupt
toggle_led();
}
ISR Design Best Practices
- Keep it short: Do minimal work, defer processing to main loop
- Clear flags early: Prevent re-triggering
- Use volatile: For variables shared with main code
- Avoid blocking: Never use delays or wait loops
- Be reentrant: Don't use non-reentrant library functions
// Good ISR pattern
volatile uint8_t data_ready = 0;
volatile uint8_t rx_buffer[64];
void USART1_IRQHandler(void) {
if (USART1->SR & USART_SR_RXNE) {
rx_buffer[rx_index++] = USART1->DR; // Read clears flag
data_ready = 1; // Signal main loop
}
}
// Main loop processes data
int main(void) {
while (1) {
if (data_ready) {
process_data(rx_buffer);
data_ready = 0;
}
}
}
Real-Time Constraints
Hard vs Soft Real-Time
Real-time systems must respond to events within specified time constraints. The consequences of missing deadlines distinguish two categories:
Hard Real-Time
Missing a deadline = system failure. Consequences can be catastrophic.
- Automotive airbag deployment (must deploy within milliseconds)
- Aircraft flight control systems
- Industrial robot arm positioning
- Pacemaker pulse timing
Soft Real-Time
Missing deadlines degrades quality but doesn't cause failure.
- Video streaming (dropped frames = quality loss)
- Audio processing (glitches are annoying, not fatal)
- User interface responsiveness
- Network packet processing
Timing Analysis
Key timing metrics for embedded systems:
- Interrupt latency: Time from interrupt signal to ISR start (typically 12-20 cycles on Cortex-M)
- Response time: Total time from event to system response
- Jitter: Variation in response time
- WCET (Worst-Case Execution Time): Maximum time a code section can take
// Measuring execution time
#include "stm32f4xx.h"
void measure_timing(void) {
// Enable DWT cycle counter
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
uint32_t start = DWT->CYCCNT;
// Code to measure
critical_function();
uint32_t cycles = DWT->CYCCNT - start;
float time_us = (float)cycles / (SystemCoreClock / 1000000);
}
Determinism in Embedded Systems
Deterministic behavior means the system's response time is predictable and bounded.
- Avoid dynamic memory allocation (malloc/free have variable timing)
- Use fixed iteration counts in loops
- Disable interrupts for critical sections (minimize duration)
- Use priority-based scheduling with bounded priorities
- Avoid caches or understand their behavior
Development Tools & Workflow
Toolchains & Cross-Compilation
Embedded development uses cross-compilation—compiling on one platform (host PC) for another (target MCU).
ARM GCC Toolchain Components
- arm-none-eabi-gcc: C/C++ compiler
- arm-none-eabi-as: Assembler
- arm-none-eabi-ld: Linker
- arm-none-eabi-objcopy: Convert ELF to binary/hex
- arm-none-eabi-gdb: Debugger
- arm-none-eabi-size: Show memory usage
# Basic compilation command
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -O2 \
-c main.c -o main.o
# Linking with linker script
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \
-T linker.ld -o firmware.elf main.o startup.o
# Convert to binary for flashing
arm-none-eabi-objcopy -O binary firmware.elf firmware.bin
# Check memory usage
arm-none-eabi-size firmware.elf
Debuggers & JTAG/SWD
Hardware debuggers connect your PC to the target MCU:
- JTAG: Older standard, 4-5 wire, supports multiple devices in chain
- SWD (Serial Wire Debug): ARM-specific, 2-wire, faster for single target
Popular Debug Probes
- ST-Link: Bundled with STM32 development boards
- J-Link: Professional grade, fast, feature-rich
- Black Magic Probe: Open-source, GDB server built-in
- DAP-Link: Open-source, drag-and-drop programming
IDEs for Embedded Development
- STM32CubeIDE: Free, Eclipse-based, excellent STM32 support
- Keil MDK: Industry standard, ARM compiler, commercial
- IAR Embedded Workbench: Professional, excellent optimization
- PlatformIO: VS Code extension, multi-platform
- Eclipse + GNU MCU: Free, open-source, flexible
Bare-Metal Programming Basics
Bare-metal programming means writing code that runs directly on hardware without an operating system. Your code has complete control—and complete responsibility.
// Minimal bare-metal program structure
#include "stm32f4xx.h"
// Vector table (defined in startup code)
// Reset handler is the entry point
int main(void) {
// 1. Initialize system clock
SystemInit();
// 2. Enable peripheral clocks
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
// 3. Configure peripherals
GPIOA->MODER |= GPIO_MODER_MODER5_0; // PA5 as output
// 4. Main loop
while (1) {
GPIOA->ODR ^= GPIO_ODR_OD5; // Toggle LED
for (volatile int i = 0; i < 100000; i++); // Delay
}
}
- Startup code: Initializes stack, copies data, zeroes BSS
- Linker script: Defines memory layout (Flash, RAM regions)
- Vector table: Array of function pointers for exception handlers
- System initialization: Clock configuration, peripheral enables
- Super loop: Main while(1) loop with polling or interrupt-driven logic
Conclusion & What's Next
You've now built a solid foundation in embedded systems fundamentals. You understand the key differences between microcontrollers and microprocessors, how Harvard and Von Neumann architectures work, the memory types available in MCUs, interrupt handling patterns, and real-time system requirements.
- Microcontrollers are self-contained systems; microprocessors need external support
- Harvard architecture enables parallel instruction/data access
- Flash stores code, SRAM stores runtime data, EEPROM stores persistent configuration
- Keep ISRs short, clear flags early, use volatile for shared variables
- Hard real-time systems cannot miss deadlines; soft real-time tolerates some misses
In Part 2, we'll dive deep into STM32 and ARM Cortex-M development—configuring peripherals, using HAL libraries, and building real embedded applications.