Back to Embedded Systems Hardware Engineering Series

Part 3: MCU & System Architecture

April 17, 2026 Wasil Zafar 50 min read

Deep dive into microcontroller internals — GPIO register configuration, ADC/DAC conversion, hardware timers, interrupt handling, power budgeting, and the four essential serial protocols: I2C, SPI, UART, and CAN.

Table of Contents

  1. MCU Block Diagram
  2. GPIO Configuration
  3. ADC & DAC
  4. Hardware Timers
  5. Interrupts & NVIC
  6. Communication Protocols
  7. Power Budgeting
  8. Exercises
  9. Design Review Tool
  10. Conclusion & Next Steps

MCU Block Diagram

Analogy A microcontroller is like a self-contained factory. The CPU is the factory manager — it reads work orders (instructions from Flash) and directs operations. SRAM is the workbench where the manager lays out parts currently being assembled. The peripherals (ADC, UART, SPI, timers) are specialised machines on the factory floor: the ADC is a precision measuring station, the UART is the mailroom for serial communication, and the timers are the factory clocks that coordinate production schedules. The bus system (AHB, APB) is the network of conveyor belts connecting everything. Just as a factory can run multiple machines simultaneously without the manager touching each one, the DMA controller moves data between peripherals and memory while the CPU works on something else entirely.

A Brief History of Microcontrollers

1971 Intel releases the 4004 — the first commercial microprocessor (4-bit, 740 kHz). It was designed for a Japanese calculator but launches the era of programmable silicon.
1976 Intel 8048 becomes the first true microcontroller — CPU, RAM, ROM, and I/O all on a single chip. It powers the IBM PC keyboard and millions of consumer products.
1993 Atmel releases the AT89C51, the first MCU with in-system-programmable Flash memory. Engineers can now reprogram chips without removing them from the board — a revolution in development speed.
2004 ARM Cortex-M3 architecture announced. It brings 32-bit performance to the microcontroller market at 8-bit prices, eventually dominating the embedded world. STM32 launches in 2007 based on this core.
2020 Raspberry Pi RP2040 introduces a dual-core Cortex-M0+ at $1. RISC-V MCUs (ESP32-C3, GD32V) emerge as open-architecture alternatives, breaking ARM’s monopoly on 32-bit embedded.

Every microcontroller is built from the same fundamental blocks: a processor core, memory, peripherals, and a bus system that connects them. Understanding this architecture is what separates someone who copies example code from someone who can design reliable embedded systems.

STM32F4 Block Diagram (Simplified)
flowchart TB
    subgraph Core["ARM Cortex-M4 Core"]
        CPU["CPU
ALU + FPU"] NVIC["NVIC
Interrupt Controller"] SysTick["SysTick
System Timer"] end subgraph Memory["Memory"] FLASH["Flash
512KB (Code)"] SRAM["SRAM
128KB (Data)"] end subgraph AHB["AHB Bus (High Speed)"] DMA["DMA Controller"] GPIO["GPIO Ports
A, B, C, D, E"] end subgraph APB2["APB2 Bus (Fast Peripherals)"] ADC1["ADC1"] TIM1["TIM1 (Advanced)"] SPI1["SPI1"] USART1["USART1"] end subgraph APB1["APB1 Bus (Standard Peripherals)"] I2C1["I2C1"] UART2["USART2"] TIM2["TIM2/3/4/5"] CAN1["CAN1"] DAC1["DAC"] end Core --> Memory Core --> AHB AHB --> APB2 AHB --> APB1

Bus Matrix & Memory Map

The STM32’s memory map gives every peripheral a unique address range. When you write to a GPIO register, you’re writing to a specific memory address:

Address Range Region Contents
0x0000_00000x07FF_FFFF Code Flash memory (program code)
0x2000_00000x2001_FFFF SRAM Stack, heap, global variables
0x4000_00000x4000_7FFF APB1 TIM2–5, USART2/3, I2C1/2, CAN, DAC
0x4001_00000x4001_4BFF APB2 TIM1/8, USART1/6, ADC, SPI1, SYSCFG
0x4002_00000x4007_FFFF AHB1 GPIOA–E, RCC, DMA1/2
0xE000_00000xE00F_FFFF Cortex-M NVIC, SysTick, MPU, Debug

GPIO Configuration

Pin Modes & Registers

Each GPIO pin has four configuration registers that control its behaviour. Understanding these registers is crucial for bare-metal programming:

Register Abbreviation Purpose Bits per Pin
Mode MODER Input, Output, Alternate Function, Analog 2
Output Type OTYPER Push-Pull or Open-Drain 1
Speed OSPEEDR Low, Medium, Fast, High (slew rate) 2
Pull-up/down PUPDR None, Pull-up, Pull-down 2
Input Data IDR Read pin state (read-only) 1
Output Data ODR Set output HIGH or LOW 1
Bit Set/Reset BSRR Atomic set/reset individual bits 1 (set) + 1 (reset)

HAL vs Register-Level GPIO

/* GPIO: HAL Abstraction vs Direct Register Access
 * Both examples configure PA5 as push-pull output and toggle it
 * Board: STM32F411RE Nucleo
 */

/* ---- Method 1: HAL (beginner-friendly, portable) ---- */
#include "stm32f4xx_hal.h"

void gpio_hal_example(void)
{
    __HAL_RCC_GPIOA_CLK_ENABLE();

    GPIO_InitTypeDef gpio = {0};
    gpio.Pin   = GPIO_PIN_5;
    gpio.Mode  = GPIO_MODE_OUTPUT_PP;
    gpio.Pull  = GPIO_NOPULL;
    gpio.Speed = GPIO_SPEED_FREQ_LOW;
    HAL_GPIO_Init(GPIOA, &gpio);

    HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
}

/* ---- Method 2: Register (fast, precise, low overhead) ---- */
void gpio_register_example(void)
{
    /* Enable GPIOA clock via RCC AHB1ENR */
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;

    /* Configure PA5: MODER bits [11:10] = 01 (output) */
    GPIOA->MODER &= ~(3U << 10);  /* Clear bits 11:10 */
    GPIOA->MODER |=  (1U << 10);  /* Set bit 10 = output mode */

    /* Push-pull: OTYPER bit 5 = 0 (default) */
    GPIOA->OTYPER &= ~(1U << 5);

    /* Toggle PA5 using BSRR (atomic, no read-modify-write) */
    if (GPIOA->ODR & (1U << 5))
        GPIOA->BSRR = (1U << 21);  /* Reset (upper 16 bits) */
    else
        GPIOA->BSRR = (1U << 5);   /* Set (lower 16 bits) */
}
When to use which? HAL for quick prototyping and portability across STM32 families. Direct registers for tight timing requirements (ISRs, bit-banging), minimal code size (bootloaders), or when you need to understand exactly what’s happening at the silicon level.

ADC & DAC

ADC Configuration

The Analog-to-Digital Converter transforms real-world analog voltages into digital values your code can process. Key parameters:

Parameter STM32F411 Impact
Resolution 6, 8, 10, or 12-bit Higher = more precision, slower conversion
Sampling Time 3–480 cycles Longer = more accurate for high-impedance sources
VREF Typically 3.3V Sets the full-scale voltage range
Conversion Speed Up to 2.4 MSPS Max sample rate at 6-bit resolution
/* ADC Single-Channel Read — STM32 HAL
 * Read potentiometer on PA0 (ADC1_IN0)
 * Board: STM32F411RE Nucleo
 */

#include "stm32f4xx_hal.h"
#include <stdio.h>

ADC_HandleTypeDef hadc1;

void adc_init(void)
{
    __HAL_RCC_ADC1_CLK_ENABLE();
    __HAL_RCC_GPIOA_CLK_ENABLE();

    /* Configure PA0 as analog input */
    GPIO_InitTypeDef gpio = {0};
    gpio.Pin  = GPIO_PIN_0;
    gpio.Mode = GPIO_MODE_ANALOG;
    gpio.Pull = GPIO_NOPULL;
    HAL_GPIO_Init(GPIOA, &gpio);

    /* Configure ADC1 */
    hadc1.Instance                   = ADC1;
    hadc1.Init.Resolution            = ADC_RESOLUTION_12B;
    hadc1.Init.ScanConvMode          = DISABLE;
    hadc1.Init.ContinuousConvMode    = DISABLE;
    hadc1.Init.ExternalTrigConvEdge  = ADC_EXTERNALTRIGCONVEDGE_NONE;
    hadc1.Init.DataAlign             = ADC_DATAALIGN_RIGHT;
    hadc1.Init.NbrOfConversion       = 1;
    HAL_ADC_Init(&hadc1);

    /* Configure channel 0, sampling time 84 cycles */
    ADC_ChannelConfTypeDef ch = {0};
    ch.Channel      = ADC_CHANNEL_0;
    ch.Rank         = 1;
    ch.SamplingTime = ADC_SAMPLETIME_84CYCLES;
    HAL_ADC_ConfigChannel(&hadc1, &ch);
}

uint16_t adc_read(void)
{
    HAL_ADC_Start(&hadc1);
    HAL_ADC_PollForConversion(&hadc1, 10);
    return HAL_ADC_GetValue(&hadc1);
}

/* Usage:
 *   adc_init();
 *   uint16_t raw = adc_read();           // 0-4095
 *   float voltage = raw * 3.3f / 4095;   // Convert to volts
 *   printf("ADC: %d (%.2fV)\r\n", raw, voltage);
 */

DAC Output

/* DAC Output — Generate a voltage on PA4 (DAC1_OUT1)
 * STM32F411 has no DAC; use STM32F446 or F303 for this example
 * Board: STM32F446RE Nucleo
 */

#include "stm32f4xx_hal.h"

DAC_HandleTypeDef hdac;

void dac_init(void)
{
    __HAL_RCC_DAC_CLK_ENABLE();
    __HAL_RCC_GPIOA_CLK_ENABLE();

    /* PA4 as analog (DAC output) */
    GPIO_InitTypeDef gpio = {0};
    gpio.Pin  = GPIO_PIN_4;
    gpio.Mode = GPIO_MODE_ANALOG;
    gpio.Pull = GPIO_NOPULL;
    HAL_GPIO_Init(GPIOA, &gpio);

    hdac.Instance = DAC;
    HAL_DAC_Init(&hdac);

    DAC_ChannelConfTypeDef cfg = {0};
    cfg.DAC_Trigger      = DAC_TRIGGER_NONE;
    cfg.DAC_OutputBuffer = DAC_OUTPUTBUFFER_ENABLE;
    HAL_DAC_ConfigChannel(&hdac, &cfg, DAC_CHANNEL_1);

    HAL_DAC_Start(&hdac, DAC_CHANNEL_1);
}

void dac_set_voltage(float volts)
{
    /* DAC is 12-bit: 0-4095 maps to 0-VREF (3.3V) */
    uint32_t dac_val = (uint32_t)(volts / 3.3f * 4095);
    if (dac_val > 4095) dac_val = 4095;
    HAL_DAC_SetValue(&hdac, DAC_CHANNEL_1, DAC_ALIGN_12B_R, dac_val);
}

/* Usage:
 *   dac_init();
 *   dac_set_voltage(1.65f);  // Output 1.65V on PA4 (half of 3.3V)
 */

Hardware Timers

Timer Fundamentals

Hardware timers are free-running counters driven by the system clock. They are the backbone of precise timing in embedded systems: generating PWM signals, measuring pulse widths, triggering periodic interrupts, and counting external events.

Timer Configuration Chain
flowchart LR
    CLK["System Clock
e.g., 84 MHz"] --> PSC["Prescaler (PSC)
÷ (PSC+1)"] PSC --> CNT["Counter (CNT)
Counts 0 → ARR"] CNT --> ARR["Auto-Reload (ARR)
Overflow → Reset"] ARR --> EVT{"Event"} EVT -->|"Interrupt"| ISR["Timer ISR"] EVT -->|"PWM"| OC["Output Compare
(CCRx)"] EVT -->|"DMA"| DMA["DMA Transfer"]
## Timer Prescaler Calculator
## Compute PSC and ARR for a desired timer frequency

import math

# System parameters
system_clock_hz = 84_000_000  # 84 MHz (APB1 timer clock on STM32F411)
desired_freq_hz = 1000        # 1 kHz timer interrupt (1ms period)
timer_bits = 16               # 16-bit timer (max ARR = 65535)

max_count = (2 ** timer_bits) - 1  # 65535

# Calculate: freq = clock / ((PSC+1) * (ARR+1))
# So (PSC+1) * (ARR+1) = clock / freq
total_counts = system_clock_hz / desired_freq_hz
print(f"Total counts needed: {total_counts:.0f}")

# Find best PSC/ARR combination (prefer larger ARR for better resolution)
best_psc = 0
best_arr = 0
best_error = float('inf')

for psc in range(0, 65536):
    arr = total_counts / (psc + 1) - 1
    if arr < 0 or arr > max_count:
        continue
    arr_rounded = round(arr)
    actual_freq = system_clock_hz / ((psc + 1) * (arr_rounded + 1))
    error = abs(actual_freq - desired_freq_hz)
    if error < best_error:
        best_error = error
        best_psc = psc
        best_arr = arr_rounded
    if error == 0:
        break

actual = system_clock_hz / ((best_psc + 1) * (best_arr + 1))
print(f"\nBest configuration for {desired_freq_hz} Hz:")
print(f"  PSC = {best_psc}")
print(f"  ARR = {best_arr}")
print(f"  Actual frequency: {actual:.2f} Hz")
print(f"  Error: {abs(actual - desired_freq_hz):.4f} Hz")
Output
Total counts needed: 84000

Best configuration for 1000 Hz:
  PSC = 83
  ARR = 999
  Actual frequency: 1000.00 Hz
  Error: 0.0000 Hz

PWM Generation

Pulse Width Modulation uses a timer’s output compare feature to generate variable-duty-cycle square waves. The duty cycle is set by the CCR (Capture/Compare Register) value relative to the ARR (Auto-Reload Register).

/* PWM Output — Drive an LED or servo motor
 * TIM2 Channel 1 on PA0, 1 kHz PWM, variable duty cycle
 * Board: STM32F411RE Nucleo
 */

#include "stm32f4xx_hal.h"

TIM_HandleTypeDef htim2;

void pwm_init(uint32_t freq_hz)
{
    __HAL_RCC_TIM2_CLK_ENABLE();
    __HAL_RCC_GPIOA_CLK_ENABLE();

    /* PA0 as TIM2_CH1 alternate function */
    GPIO_InitTypeDef gpio = {0};
    gpio.Pin       = GPIO_PIN_0;
    gpio.Mode      = GPIO_MODE_AF_PP;
    gpio.Pull      = GPIO_NOPULL;
    gpio.Speed     = GPIO_SPEED_FREQ_LOW;
    gpio.Alternate = GPIO_AF1_TIM2;
    HAL_GPIO_Init(GPIOA, &gpio);

    /* Timer base: 84 MHz / (83+1) = 1 MHz tick
     * ARR = (1 MHz / freq_hz) - 1
     */
    htim2.Instance               = TIM2;
    htim2.Init.Prescaler         = 83;
    htim2.Init.CounterMode       = TIM_COUNTERMODE_UP;
    htim2.Init.Period            = (1000000 / freq_hz) - 1;
    htim2.Init.ClockDivision     = TIM_CLOCKDIVISION_DIV1;
    HAL_TIM_PWM_Init(&htim2);

    /* Output compare channel 1 */
    TIM_OC_InitTypeDef oc = {0};
    oc.OCMode     = TIM_OCMODE_PWM1;
    oc.Pulse      = 0;  /* Initial duty = 0% */
    oc.OCPolarity = TIM_OCPOLARITY_HIGH;
    oc.OCFastMode = TIM_OCFAST_DISABLE;
    HAL_TIM_PWM_ConfigChannel(&htim2, &oc, TIM_CHANNEL_1);

    HAL_TIM_PWM_Start(&htim2, TIM_CHANNEL_1);
}

void pwm_set_duty(float percent)
{
    uint32_t arr = __HAL_TIM_GET_AUTORELOAD(&htim2);
    uint32_t ccr = (uint32_t)(percent / 100.0f * (arr + 1));
    __HAL_TIM_SET_COMPARE(&htim2, TIM_CHANNEL_1, ccr);
}

/* Usage:
 *   pwm_init(1000);        // 1 kHz PWM
 *   pwm_set_duty(50.0f);   // 50% duty cycle
 *   pwm_set_duty(25.0f);   // 25% duty cycle
 */

Input Capture

Input Capture measures the timing of external signals — pulse width, frequency, or duty cycle. The timer captures its counter value when an edge (rising, falling, or both) is detected on an input pin.

Common use cases: Measuring ultrasonic sensor echo pulse (HC-SR04), reading RC servo PWM input, measuring encoder pulse frequency (RPM calculation), and decoding IR remote signals.

Interrupts & NVIC

The Cortex-M Interrupt Model

Interrupts allow the MCU to respond to events immediately without polling. When an interrupt fires, the hardware automatically saves the current context, jumps to the Interrupt Service Routine (ISR), and restores context when done.

Interrupt Lifecycle
sequenceDiagram
    participant Main as Main Loop
    participant HW as Hardware Event
    participant NVIC as NVIC
    participant ISR as ISR Handler

    Main->>Main: Running normal code
    HW->>NVIC: Interrupt request (IRQ)
    NVIC->>NVIC: Check priority & enable
    NVIC->>ISR: Save context, jump to ISR
    ISR->>ISR: Handle event (keep short!)
    ISR->>ISR: Clear interrupt flag
    ISR->>Main: Restore context, resume
    Main->>Main: Continue normal code
                            
/* External Interrupt — Button press on PC13 (User button on Nucleo)
 * Board: STM32F411RE Nucleo
 */

#include "stm32f4xx_hal.h"
#include <stdio.h>

volatile uint32_t button_count = 0;  /* volatile: modified in ISR */

void exti_button_init(void)
{
    __HAL_RCC_GPIOC_CLK_ENABLE();

    /* PC13 as input with pull-up (button pulls to GND) */
    GPIO_InitTypeDef gpio = {0};
    gpio.Pin  = GPIO_PIN_13;
    gpio.Mode = GPIO_MODE_IT_FALLING;  /* Interrupt on falling edge */
    gpio.Pull = GPIO_PULLUP;
    HAL_GPIO_Init(GPIOC, &gpio);

    /* Enable EXTI15_10 interrupt in NVIC */
    HAL_NVIC_SetPriority(EXTI15_10_IRQn, 2, 0);
    HAL_NVIC_EnableIRQ(EXTI15_10_IRQn);
}

/* ISR — called by hardware on button press */
void EXTI15_10_IRQHandler(void)
{
    HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_13);
}

/* HAL callback — called from IRQHandler after flag clearing */
void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
{
    if (GPIO_Pin == GPIO_PIN_13)
    {
        button_count++;
        HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);  /* Toggle LED */
    }
}

/* In main loop:
 *   printf("Button pressed %lu times\r\n", button_count);
 */

Priority & Nesting

Priority Level Typical Use Example
0 (Highest) Hard fault, safety-critical Motor overcurrent, watchdog
1 Time-critical communication CAN bus, high-speed SPI DMA
2 Periodic sampling ADC timer, sensor polling
3 User input Button press, UART receive
4+ (Lower) Background tasks LED blinking, display updates
ISR golden rules: (1) Keep ISRs as short as possible — set a flag and return. (2) Never call HAL_Delay() inside an ISR. (3) Never use printf() in an ISR (it blocks). (4) Use volatile for variables shared between ISR and main code. (5) Always clear the interrupt flag before returning.

Communication Protocols

Embedded systems communicate with sensors, displays, and other MCUs using serial protocols. Each protocol has trade-offs in speed, wiring, distance, and complexity.

UART (Universal Asynchronous Receiver/Transmitter)

UART is the simplest serial protocol — just TX and RX lines, no clock. Both sides must agree on the baud rate beforehand.

Parameter Typical Value Notes
Wires 2 (TX + RX) + GND Point-to-point only
Speed 9600–115200 baud (common) Up to 4.5 Mbaud on STM32
Frame 8N1 (8 data, no parity, 1 stop) Most common configuration
Direction Full duplex TX and RX simultaneously

I2C (Inter-Integrated Circuit)

Parameter Value Notes
Wires 2 (SDA + SCL) + GND Multi-device bus (up to 127 addresses)
Speed 100kHz / 400kHz / 1MHz / 3.4MHz Standard / Fast / Fast+ / High-speed
Pull-ups Required (4.7kΩ typical) Open-drain outputs need external pull-ups
Direction Half duplex Master initiates all transfers

SPI (Serial Peripheral Interface)

SPI Bus Topology
flowchart LR
    MASTER["MCU
(Master)"] -->|"MOSI (data out)"| S1["Sensor 1
CS1"] MASTER -->|"SCK (clock)"| S1 S1 -->|"MISO (data in)"| MASTER MASTER -->|"MOSI"| S2["Display
CS2"] MASTER -->|"SCK"| S2 S2 -->|"MISO"| MASTER MASTER -->|"CS1 (select)"| S1 MASTER -->|"CS2 (select)"| S2
Parameter Value Notes
Wires 4 + 1 CS per device MOSI, MISO, SCK, CS
Speed Up to 50+ MHz Much faster than I2C
Direction Full duplex Simultaneous TX and RX
Best For High-speed data Displays, SD cards, flash memory, high-speed ADCs

CAN Bus (Controller Area Network)

CAN is a robust, multi-master protocol designed for noisy environments like vehicles and industrial systems. It uses differential signalling for excellent noise immunity.

Parameter CAN 2.0B Notes
Wires 2 (CAN_H + CAN_L) Differential pair, 120Ω termination
Speed Up to 1 Mbit/s Speed decreases with bus length
Payload 0–8 bytes per frame CAN FD extends to 64 bytes
Topology Multi-master bus Priority-based arbitration, no collisions

Protocol Comparison Quick Reference

Cheat Sheet
UART I2C SPI CAN
Wires 2 2 4+ 2
Speed ~115 kbps ~400 kbps ~50 Mbps ~1 Mbps
Devices 2 127 N (1 CS each) 110+
Use Case Debug, GPS Sensors Displays, ADC Automotive

Power Budgeting

Every embedded system must balance power consumption against performance. A power budget lists every component’s current draw and ensures the power supply can deliver enough current with safety margin.

## Power Budget Calculator
## Estimate total system current draw and battery life

# Component current draws (from datasheets, in mA)
components = {
    "STM32F411 (Run, 100MHz)":   30.0,
    "BME280 (measuring)":         0.714,
    "BH1750 (measuring)":         0.120,
    "SSD1306 OLED (active)":     20.0,
    "LED indicator (×2)":        10.0,
    "Voltage regulator quiescent": 5.0,
    "Pull-up resistors (I2C)":    0.66,
    "Miscellaneous / margin":    10.0,
}

supply_voltage = 3.3    # Volts
battery_mah = 2000      # mAh (e.g., 18650 Li-ion through regulator)
regulator_efficiency = 0.85  # 85% for typical LDO/buck

print("╔══════════════════════════════════════════════╗")
print("║         POWER BUDGET ESTIMATE                ║")
print("╠══════════════════════════════════════════════╣")

total_ma = 0
for name, current in components.items():
    total_ma += current
    print(f"║  {name:.<36} {current:>6.2f} mA ║")

print("╠══════════════════════════════════════════════╣")
print(f"║  TOTAL CURRENT DRAW{'':.<17} {total_ma:>6.2f} mA ║")

power_mw = total_ma * supply_voltage
print(f"║  Power consumption{'':.<18} {power_mw:>6.0f} mW ║")

# Battery life calculation
effective_mah = battery_mah * regulator_efficiency
hours = effective_mah / total_ma
days = hours / 24
print(f"║  Battery life ({battery_mah}mAh){'':.<12} {hours:>5.1f} hrs ║")
print(f"║  ({days:.1f} days)")
print("╚══════════════════════════════════════════════╝")
Output
╔══════════════════════════════════════════════╗
║         POWER BUDGET ESTIMATE                ║
╠══════════════════════════════════════════════╣
║  STM32F411 (Run, 100MHz)................  30.00 mA ║
║  BME280 (measuring).....................   0.71 mA ║
║  BH1750 (measuring).....................   0.12 mA ║
║  SSD1306 OLED (active)..................  20.00 mA ║
║  LED indicator (×2).....................  10.00 mA ║
║  Voltage regulator quiescent............   5.00 mA ║
║  Pull-up resistors (I2C)................   0.66 mA ║
║  Miscellaneous / margin.................  10.00 mA ║
╠══════════════════════════════════════════════╣
║  TOTAL CURRENT DRAW...................   76.49 mA ║
║  Power consumption....................    252 mW ║
║  Battery life (2000mAh)...............   22.2 hrs ║
║  (0.9 days)
╚══════════════════════════════════════════════╝
Sleep modes for battery life: STM32 Sleep mode reduces to ~2mA, Stop mode to ~10μA, Standby to ~2μA. A well-designed duty-cycled sensor node can run for months on a coin cell by spending 99% of its time in Stop mode.
Case Study
Toyota Unintended Acceleration — When Firmware Meets Life Safety (2009–2011)

Between 2009 and 2011, Toyota recalled over 9 million vehicles worldwide after reports of sudden unintended acceleration (SUA) that caused 89 deaths. While Toyota initially blamed floor mats and sticky pedals, NASA engineers commissioned by the US government conducted a 10-month investigation of the electronic throttle control system’s firmware running on a Renesas V850 microcontroller.

The MCU architecture failures: NASA’s investigation found critical design violations: (1) The task scheduler lacked proper priority inversion protection, meaning a low-priority task could block the throttle-control task indefinitely. (2) Stack memory was undersized with no overflow detection — corrupted stack frames could overwrite the throttle position variable stored in SRAM. (3) The watchdog timer was serviced by a dedicated task that could run even when the main throttle-control task had crashed, masking failures. (4) There was no independent hardware monitoring of the throttle position — the MCU trusted its own software output without cross-checking against a separate sensor path.

Engineering lesson: Every concept in this article — memory layout, interrupt priorities, timer configuration, and peripheral architecture — is directly relevant to safety-critical design. A proper NVIC priority scheme would have prevented task starvation. Stack monitoring (using MPU regions) would have detected corruption. A secondary watchdog checking the output, not just the task scheduler, would have caught the failure. Toyota’s $1.2 billion criminal fine plus $1.6 billion in settlements could have been prevented by applying the MCU architecture principles covered here.

$2.8B Total Cost Stack Overflow Watchdog Bypass NVIC Priority Failure
Case Study
Therac-25 — The Race Condition That Killed (1985–1987)

The Therac-25 was a computer-controlled radiation therapy machine that massively overdosed six patients between 1985 and 1987, causing three deaths and severe injuries to others. It used a PDP-11 minicomputer (architecturally similar to modern MCUs in its single-threaded interrupt-driven design) to control beam energy and collimator position.

The concurrency failure: The software had a race condition between the operator input task and the beam-setup task. When the operator changed the beam type from “X-ray” to “Electron” within an 8-second window, the beam energy register could be set to 25 MeV (X-ray mode) while the collimator was positioned for electron mode (which had no spreading foil). The result: a concentrated beam of radiation 100× the intended dose. The system had no hardware interlock — previous Therac models (Therac-6, Therac-20) had mechanical interlocks that prevented this state, but the Therac-25 relied entirely on software.

Engineering lesson: This is the most cited example of why interrupt-driven MCU systems must handle shared resources correctly. In modern STM32 terms: critical sections must be protected with __disable_irq()/__enable_irq() or NVIC priority masking. Peripheral registers that must be written atomically need proper sequencing. And most importantly: safety-critical outputs must have independent hardware interlocks that do not depend on software correctness. Never assume your firmware is bug-free.

3 Deaths Race Condition No Hardware Interlock Shared Resource Corruption

Exercises

Exercise 1: Timer Configuration

You need to generate a 50 Hz PWM signal (20ms period) to control a standard servo motor on an STM32F411 running at 84 MHz APB1 timer clock.

  1. Using the timer prescaler formula from this article, calculate PSC and ARR values for a 50 Hz base frequency. Show your working.
  2. For a servo that expects 1ms–2ms pulses (5%–10% duty cycle at 50 Hz), what CCR values correspond to 0°, 90°, and 180° positions?
  3. If you need 0.1° resolution (1800 steps from 0° to 180°), what minimum ARR value do you need? Can a 16-bit timer achieve this? What PSC would you use?
  4. You want to drive 4 servo motors simultaneously. Which STM32 timer has 4 output compare channels? Write the HAL initialization pseudocode.

Hint: 84MHz / 50Hz = 1,680,000 total counts. PSC=83 gives 1MHz tick → ARR=19999 (20ms period at 1µs resolution). CCR for 1ms = 1000, 1.5ms = 1500, 2ms = 2000. For 1800 steps across 1ms range: ARR needs 1000×(1800/1000) = at least 18000 counts in the 1ms window. With ARR=19999 you get 1000 counts per ms → 0.001ms = 1µs resolution per step, which gives exactly 1000 steps (0.18°/step). For 0.1°: use PSC=41, ARR=39999 (2µs tick, still 50Hz, 2000 counts per ms = 2000 steps for 1ms range). TIM2 or TIM5 (32-bit) with 4 channels: CH1–CH4.

Exercise 2: Memory Map & Peripheral Access

Using the STM32F411 reference manual memory map from this article, answer the following:

  1. GPIOA’s base address is 0x40020000. The ODR (Output Data Register) offset is 0x14. Write a C expression using direct register access (no HAL) to set PA5 HIGH. What happens if you write to the entire ODR instead of using BSRR?
  2. Explain why APB1 peripherals (address 0x4000_xxxx) are typically clocked at half the AHB speed. Which peripherals benefit from slower clocking and why?
  3. The SRAM region starts at 0x2000_0000. If your MCU has 128KB of SRAM, what is the last valid address? Your linker script allocates 4KB for the stack at the top of SRAM. What is the initial stack pointer value?
  4. A DMA transfer moves 256 bytes from ADC data register (0x4001_2050) to a buffer at 0x2000_1000. Draw the data path: which buses does the data cross? Can the CPU access Flash while DMA is running?

Hint: (1) *(volatile uint32_t *)(0x40020000 + 0x14) |= (1 << 5); — but this is a read-modify-write, not atomic. Use BSRR: *(volatile uint32_t *)(0x40020000 + 0x18) = (1 << 5); for atomic set. Writing full ODR risks clearing other pins. (2) APB1 max = 42MHz on F411. Low-speed peripherals (I2C at 400kHz, UART at ~1Mbps) don’t need fast clocks; slower bus saves power. (3) 128KB = 0x20000, last address = 0x2001FFFF. SP = 0x20020000 (points one past last byte, ARM stack is full-descending). (4) ADC→DMA→AHB bus→SRAM. Yes, CPU can access Flash via I-bus while DMA uses D-bus (dual-port bus matrix).

Exercise 3: Power Budget & Sleep Mode Design

You are designing a LoRaWAN soil moisture sensor that must run for 1 year on 2× AA batteries (3000 mAh total at 3V, through a 90%-efficient buck to 3.3V). The sensor reports once every 15 minutes.

  1. Calculate the maximum average current draw allowed for 1 year of operation.
  2. The LoRa radio draws 120mA for 200ms during transmission. What is the average current contribution of the radio over a 15-minute cycle?
  3. The STM32L4 MCU in Run mode draws 10mA and needs 50ms to wake up, read the ADC, and prepare the LoRa packet. In Stop 2 mode it draws 1.3μA. What is the MCU’s average current over the 15-minute cycle?
  4. Budget the remaining current for the moisture sensor, voltage regulator quiescent current, and leakage. Can you meet the 1-year target? Show your full power budget table.
  5. If the customer now wants 2-year battery life, what design changes would you make? Consider duty cycle, MCU sleep mode, and alternative radio protocols.

Hint: (1) Effective capacity = 3000 × 0.9 = 2700 mAh. 1 year = 8760 hours. Max avg = 2700/8760 = 0.308 mA = 308 µA. (2) Radio: 120mA × 0.2s = 24 mA·s per cycle. Cycle = 900s. Avg = 24/900 = 26.7 µA. (3) MCU active: 10mA × 0.05s = 0.5 mA·s. MCU sleep: 0.0013mA × 899.75s = 1.17 mA·s. Avg MCU = 1.67/900 = 1.85 µA. (4) Budget so far: 26.7 + 1.85 = 28.6 µA. Remaining for sensor + regulator: 308 - 28.6 = 279 µA — plenty of margin. (5) For 2 years: target drops to 154 µA. Still feasible. Could switch to Standby mode (0.4 µA) with RTC wake-up, or reduce TX to every 30 min.

Design Review Checklist Tool

Use this tool to create a hardware design review checklist for your MCU-based project. It generates a comprehensive document covering power, GPIO, peripherals, and communication.

MCU Design Review Checklist

Enter your project details to generate a hardware design review document. Download as Word, Excel, or PDF.

Draft auto-saved

Conclusion & Next Steps

You now understand the full architecture of an ARM Cortex-M microcontroller — from the bus matrix and memory map to GPIO registers, ADC/DAC converters, hardware timers, the NVIC interrupt controller, and the four essential serial protocols. This knowledge forms the foundation for everything that follows in the series.

Key takeaways: (1) Every peripheral is a set of memory-mapped registers. (2) HAL for portability, registers for performance. (3) Timers = prescaler × auto-reload → precise frequencies. (4) ISRs must be short — set a flag and return. (5) Choose UART for simplicity, I2C for multi-sensor, SPI for speed, CAN for robustness. (6) Always budget power before choosing a battery.

Next in the Series

In Part 4: Schematic Design, we’ll translate this MCU knowledge into professional schematics — KiCad and Altium workflows, hierarchical sheet organisation, component selection with lifecycle awareness, power supply design, and ESD protection circuits.