Back to Technology

STM32 Part 9: Interrupt Management & NVIC

March 31, 2026 Wasil Zafar 26 min read

Poorly configured interrupt priorities are the hidden cause of most STM32 concurrency bugs — master priority grouping, preemption, ISR design rules, and HAL callback architecture to build rock-solid real-time firmware.

Table of Contents

  1. NVIC Architecture
  2. Priority Grouping
  3. Preemption & Nesting
  4. ISR Design Rules
  5. HAL Callback Architecture
  6. Latency Measurement
  7. Common Interrupt Pitfalls
  8. Exercises
  9. NVIC Configuration Tool
  10. Conclusion & Next Steps
Series Overview: This is Part 9 of the 18-part STM32 Unleashed series. We have covered architecture, GPIO, UART, timers, ADC, SPI, I2C, and DMA. Now we tackle the interrupt subsystem — the mechanism that makes every one of those peripherals reactive instead of polling-bound.

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 9
1
Architecture & CubeMX Setup
STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project
Completed
2
GPIO & Button Debounce
GPIO modes, pull-up/down, EXTI, software debounce, HAL_GPIO_ReadPin
Completed
3
UART Communication
Polling, interrupt, DMA modes, printf retargeting, ring buffers
Completed
4
Timers, PWM & Input Capture
TIM basics, PWM generation, input capture, encoder mode
Completed
5
ADC & DAC
Single/continuous conversion, DMA, injected channels, DAC waveforms
Completed
6
SPI Protocol
SPI master/slave, full-duplex, DMA transfers, sensor drivers
Completed
7
I2C Protocol
I2C master, 7/10-bit addressing, DMA, multi-master, error handling
Completed
8
DMA & Memory Efficiency
DMA streams, circular mode, memory-to-memory, zero-copy patterns
Completed
9
Interrupt Management & NVIC
Priority grouping, preemption, ISR design, HAL callbacks, latency
You Are Here
10
Low-Power Modes
Sleep, Stop, Standby modes, RTC wakeup, LP UART, power profiling
11
RTC & Calendar
RTC configuration, alarms, backup registers, calendar subseconds
12
CAN Bus
FDCAN/bxCAN, filters, message frames, error handling, automotive use
13
USB CDC Virtual COM Port
USB FS/HS, CDC class, virtual serial, control transfers, descriptors
14
FreeRTOS Integration
Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing
15
Bootloader Development
Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app
16
External Storage: SD & QSPI Flash
FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling
17
Ethernet & TCP/IP Stack
LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors
18
Production Readiness
Watchdog, HardFault handler, flash option bytes, code signing, CI/CD

NVIC Architecture

The Nested Vectored Interrupt Controller (NVIC) is a standard component of all ARM Cortex-M processors. It provides hardware-managed interrupt prioritisation, preemption, nesting, and tail-chaining — capabilities that are absent in simpler 8-bit architectures and that make real-time firmware engineering on Cortex-M qualitatively different from what developers encounter on AVR or PIC platforms.

The NVIC on Cortex-M supports up to 240 external (peripheral) interrupts plus 16 internal system exceptions, for a total of 256 exception sources. The STM32F4 implements a subset of these — typically 90 to 97 IRQs depending on the specific device — defined in the device-specific header (stm32f4xx.h) as the IRQn_Type enumeration.

System Exceptions with Fixed Priority

Three system exceptions have negative (fixed) priority levels that cannot be changed: Reset at -3, NMI (Non-Maskable Interrupt) at -2, and HardFault at -1. These priorities are lower-numbered — and therefore higher-priority — than any configurable interrupt. This guarantees that a HardFault caused by memory access violation, division by zero, or invalid instruction can always fire, even when all configurable interrupts are masked. The NMI is typically connected to the Clock Security System (CSS) on STM32F4 — if the HSE oscillator fails, the CSS triggers an NMI to allow the firmware to gracefully fall back to the internal HSI before system behaviour becomes undefined.

Exception Vector Offset Default Priority Maskable? Purpose
Reset0x04-3 (fixed)NoSystem startup, highest priority of all
NMI0x08-2 (fixed)NoClock security system, watchdog NMI
HardFault0x0C-1 (fixed)NoEscalated fault, always reachable
MemManage0x10ConfigurableNo (always active)MPU violation
BusFault0x14ConfigurableNoPrecise/imprecise bus error
UsageFault0x18ConfigurableNoUndefined instruction, divide by zero
SVC0x2CConfigurableYesSupervisor call (FreeRTOS task switch)
PendSV0x38ConfigurableYesContext switch trigger (FreeRTOS)
SysTick0x3CConfigurableYesHAL timebase, RTOS tick
IRQ0–IRQn0x40+ConfigurableYesPeripheral interrupts (EXTI, UART, TIM…)

Priority Register Width

The Cortex-M architecture supports up to 8 bits for priority registers, but most STM32 devices implement only 4 bits (the most significant 4 bits of the 8-bit register). This gives values 0–15 for HAL_NVIC_SetPriority(). The unused lower 4 bits always read as zero. STM32L0 and STM32G0 (Cortex-M0/M0+) implement only 2 bits — 4 priority levels — so code written for F4 must account for this when porting.

Interrupt Vector Table

The interrupt vector table is an array of 32-bit function pointers located at the start of Flash memory by default (address 0x08000000 on STM32F4). Each entry contains the address of the corresponding handler function. Entry 0 holds the initial stack pointer; entry 1 holds the Reset_Handler address. Peripheral interrupt handlers start at offset 0x40 (IRQ0 = entry 16).

The table is relocatable via the VTOR (Vector Table Offset Register) at 0xE000ED08. This is used by bootloaders that jump to an application: the application writes its own vector table address into VTOR so that subsequent interrupts dispatch to the application's handlers, not the bootloader's.

/* ─── Low-level CMSIS NVIC functions — raw priority management ──────────── */
#include "core_cm4.h"   /* or core_cm7.h, included via stm32f4xx.h */

/* Set interrupt priority for USART1 (IRQn = 37 on STM32F407) */
NVIC_SetPriority(USART1_IRQn, 6);   /* priority value 0–15 in Group 4 */

/* Enable the interrupt in NVIC (peripheral must also have its IE bit set) */
NVIC_EnableIRQ(USART1_IRQn);

/* Read back the currently configured priority */
uint32_t prio = NVIC_GetPriority(USART1_IRQn);   /* returns 6 */

/* Trigger a software interrupt (useful for testing) */
NVIC_SetPendingIRQ(USART1_IRQn);

/* Disable and clear pending */
NVIC_DisableIRQ(USART1_IRQn);
NVIC_ClearPendingIRQ(USART1_IRQn);

/* Relocate vector table to SRAM (for runtime patching) */
extern uint32_t g_pfnVectors;           /* defined in linker script */
SCB->VTOR = (uint32_t)&g_pfnVectors;    /* must be 512-byte aligned */
__DSB();                                /* data sync barrier */

Priority Grouping

The STM32F4 NVIC uses 4 bits for interrupt priorities, giving 16 levels (0–15). These 4 bits can be split between two concepts: preemption priority (also called group priority) and sub-priority. The split is controlled by the PRIGROUP field in the SCB Application Interrupt and Reset Control Register (AIRCR), configurable via HAL_NVIC_SetPriorityGrouping().

The most important rule: HAL_NVIC_SetPriorityGrouping() must be called exactly once, before any interrupt priorities are configured. Changing the grouping after setting priorities scrambles all previously assigned values. STM32 HAL defaults to NVIC_PRIORITYGROUP_4 (4 preemption bits, 0 sub-priority bits), which gives 16 distinct preemption levels and is the simplest configuration to reason about.

Priority Group Preemption Bits Sub-Priority Bits Max Preemption Levels Max Sub-Levels
NVIC_PRIORITYGROUP_0041 (no preemption)16
NVIC_PRIORITYGROUP_11328
NVIC_PRIORITYGROUP_22244
NVIC_PRIORITYGROUP_33182
NVIC_PRIORITYGROUP_440161 (none)

Sub-priority only determines which of two equal-preemption interrupts is handled first when both are pending simultaneously. It does not cause preemption — an ISR with sub-priority 0 cannot preempt an ISR with the same preemption priority but sub-priority 3. This distinction is critical: if you need interrupt A to preempt interrupt B, they must have different preemption priorities.

Choosing the Right Priority Values in Practice

A practical priority assignment strategy for an STM32F4 project using FreeRTOS and NVIC_PRIORITYGROUP_4 (16 levels, 0 = highest):

  • 0: Reserved — never assign to a FreeRTOS-aware ISR. Use for non-maskable hardware safety functions (e.g., emergency shutdown GPIO) that must fire even during RTOS scheduler operation.
  • 1–4: High-urgency, non-FreeRTOS ISRs. These can preempt the RTOS scheduler and must never call any FreeRTOS API. Use for cycle-accurate timing, hardware encoder capture, or safety watchdog resets.
  • 5 (configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY): The boundary. ISRs at this priority and above can safely use FreeRTOS FromISR API functions (xQueueSendFromISR, xSemaphoreGiveFromISR, etc.).
  • 5–10: Real-time peripheral ISRs (DMA complete, UART, SPI). Higher priority = lower number = more responsive.
  • 11–14: Non-urgent peripheral ISRs (slow UART logging, ADC batch complete).
  • 15 (lowest): SysTick (FreeRTOS tick) and PendSV (context switch). These are set automatically by FreeRTOS at the lowest possible configurable priority to ensure that all application ISRs can preempt the scheduler tick.
/* ─── Priority configuration with NVIC_PRIORITYGROUP_4 (HAL default) ─────── */

/* Call once in SystemClock_Config or before MX_GPIO_Init */
HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4);

/* SysTick: HAL timebase — highest among configurables to keep HAL_Delay working */
HAL_NVIC_SetPriority(SysTick_IRQn, 0, 0);

/* USART1: data logging — low urgency */
HAL_NVIC_SetPriority(USART1_IRQn, 8, 0);
HAL_NVIC_EnableIRQ(USART1_IRQn);

/* TIM2 Update: 1 kHz scheduler tick — medium urgency */
HAL_NVIC_SetPriority(TIM2_IRQn, 5, 0);
HAL_NVIC_EnableIRQ(TIM2_IRQn);

/* EXTI0: safety-critical button (PA0) — high urgency, preempts TIM2 and USART1 */
HAL_NVIC_SetPriority(EXTI0_IRQn, 3, 0);
HAL_NVIC_EnableIRQ(EXTI0_IRQn);

/* FreeRTOS note: all FreeRTOS-aware ISRs must use priorities >= configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY
   (typically 5 on STM32). Priorities 0-4 are reserved for non-maskable ISRs
   that must never call FreeRTOS API functions.                               */

Preemption & Nesting

Interrupt nesting occurs when a higher-priority interrupt fires while a lower-priority ISR is already executing. The Cortex-M processor automatically pushes the current processor state (8 registers: PC, PSR, R0–R3, R12, LR) onto the stack, enters the higher-priority ISR, and then resumes the original ISR seamlessly when the higher-priority one returns. This entire context save/restore is performed by hardware — no software stack management is needed.

The depth of nesting is limited only by available stack space. Each nested interrupt consumes at least 32 bytes of stack (8 × 4 bytes), plus up to 136 bytes if the FPU lazy stacking feature must save FPU registers. Always size your main stack (in the linker script) to accommodate the maximum expected nesting depth.

Masking Interrupts: PRIMASK vs BASEPRI

__disable_irq() sets the PRIMASK register, which masks all configurable interrupts globally. It is used for the shortest possible critical sections — typically protecting a read-modify-write on a shared variable. Always restore PRIMASK immediately; leaving it set for more than a few microseconds can cause missed interrupts in time-critical systems.

BASEPRI is a more surgical mask: it blocks all interrupts with priority numerically greater than or equal to the BASEPRI value, while allowing higher-priority interrupts through. FreeRTOS uses this mechanism in taskENTER_CRITICAL() — it sets BASEPRI to configMAX_SYSCALL_INTERRUPT_PRIORITY, masking only FreeRTOS-aware interrupts while leaving high-priority hardware ISRs (priority 0–4) fully operational.

/* ─── Demonstrating preemption: TIM2 ISR (priority 5) preempted by EXTI0 (priority 3) */

/* In stm32f4xx_it.c: */

/* TIM2 Update ISR — priority 5, runs at 1 kHz */
void TIM2_IRQHandler(void)
{
    __HAL_TIM_CLEAR_IT(&htim2, TIM_IT_UPDATE);

    /* Simulate non-trivial work: this ISR can be preempted */
    HAL_GPIO_WritePin(DEBUG_TIM2_GPIO_Port, DEBUG_TIM2_Pin, GPIO_PIN_SET);

    /* --- EXTI0 ISR can fire here if button pressed --- */

    HAL_GPIO_WritePin(DEBUG_TIM2_GPIO_Port, DEBUG_TIM2_Pin, GPIO_PIN_RESET);
}

/* EXTI0 ISR — priority 3, preempts TIM2_IRQHandler */
void EXTI0_IRQHandler(void)
{
    __HAL_GPIO_EXTI_CLEAR_IT(GPIO_PIN_0);
    HAL_GPIO_EXTI_Callback(GPIO_PIN_0);   /* dispatches to user callback */
}

/* Bare-metal critical section using BASEPRI (FreeRTOS-style) */
void ProtectedOperation(void)
{
    uint32_t prev_basepri = __get_BASEPRI();
    __set_BASEPRI(5 << (8 - __NVIC_PRIO_BITS));  /* mask priorities >= 5 */
    __DSB();
    __ISB();

    /* --- critical section: safe to access shared state --- */
    shared_counter++;

    __set_BASEPRI(prev_basepri);  /* restore previous mask level */
    __DSB();
    __ISB();
}

ISR Design Rules

The cardinal rule of ISR design is: do as little as possible inside the ISR. Every microsecond spent in an ISR is a microsecond during which lower-priority interrupts are blocked (or, if PRIMASK is set, all interrupts are blocked). High-latency ISRs cause missed events, overrun buffers, and system instability that manifests as intermittent bugs under load — exactly the class of bug that is hardest to reproduce and diagnose.

The recommended pattern is flag-and-defer: the ISR does the minimum hardware acknowledgement (clear the interrupt flag), sets a volatile flag variable or posts to a queue, and returns. The main loop (or a FreeRTOS task) checks the flag and performs the actual work. This pattern keeps ISR execution time in the tens of nanoseconds while moving the computational work to a context where blocking, printf, and long processing are safe.

Volatile and Memory Barriers

Variables shared between ISR and non-ISR code must be declared volatile. Without volatile, the compiler may cache the variable in a register and never re-read it from memory, causing the main loop to see a stale value forever. Additionally, on Cortex-M4 and M7, a memory barrier (__DSB()) before returning from the ISR ensures that all pending memory writes have completed before the processor resumes background code.

Stack Usage and ISR Stack Depth

Every interrupt entry pushes 8 registers onto the stack (PC, PSR, R0–R3, R12, LR = 32 bytes). Local variables in the ISR are allocated on top. If the FPU is in use, lazy stacking reserves an additional 72 bytes (18 × 4 bytes for S0–S15, FPSCR, and alignment padding). With a nesting depth of N, the maximum additional stack consumption is N × (32 + 72) = N × 104 bytes in the worst case. For a bare-metal system with maximum nesting depth 4, reserve at least 420 bytes of stack headroom beyond the application's normal stack usage.

Use the STM32CubeIDE stack analyser or a watermark pattern (0xDEADBEEF fill at startup, scan at runtime) to measure peak stack usage. A stack overflow on Cortex-M4 without an MPU is catastrophic — it silently corrupts adjacent memory and produces a HardFault far from the actual overflow point, making it one of the hardest classes of firmware bug to diagnose after the fact.

/* ─── Correct vs incorrect ISR patterns ─────────────────────────────────── */

/* SHARED STATE — always volatile */
volatile uint8_t  uart_rx_flag  = 0;
volatile uint8_t  uart_rx_byte  = 0;
volatile uint32_t button_events = 0;

/* === CORRECT: minimal ISR — flag only === */
void USART1_IRQHandler(void)
{
    /* HAL dispatches to callbacks internally */
    HAL_UART_IRQHandler(&huart1);
}

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
    if (huart->Instance == USART1)
    {
        uart_rx_flag = 1;   /* defer all processing to main loop */
        /* Re-arm single-byte receive */
        HAL_UART_Receive_IT(&huart1, &uart_rx_byte, 1);
    }
}

void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
{
    if (GPIO_Pin == BUTTON_Pin)
    {
        button_events++;   /* atomic on Cortex-M4 for 32-bit accesses */
    }
}

/* === INCORRECT: long-running work in ISR (DO NOT DO THIS) === */
/*
void BAD_USART1_IRQHandler(void)
{
    uint8_t ch = USART1->DR & 0xFF;
    sprintf(log_buf, "RX: 0x%02X\r\n", ch);   // WRONG: sprintf in ISR
    HAL_UART_Transmit(&huart2, log_buf, ...);   // WRONG: blocking in ISR
    HAL_Delay(1);                               // WRONG: HAL_Delay in ISR
    process_protocol_frame(ch);                 // WRONG: long processing in ISR
}
*/

/* === MAIN LOOP: deferred processing === */
void main_loop_tick(void)
{
    if (uart_rx_flag)
    {
        uart_rx_flag = 0;
        handle_received_byte(uart_rx_byte);   /* safe: not in ISR context */
    }
    if (button_events)
    {
        uint32_t events = button_events;
        button_events = 0;             /* clear atomically */
        handle_button_press(events);
    }
}

HAL Callback Architecture

STM32 HAL implements a layered callback dispatch system. When a peripheral interrupt fires, the execution path is: hardware IRQ → IRQHandler → HAL peripheral handler → HAL callback. Each layer has a specific responsibility, and understanding all three is necessary for correctly extending HAL behaviour.

The IRQHandler (e.g., USART2_IRQHandler) is defined as a weak symbol in stm32f4xx_it.c — the CubeMX-generated interrupt table file. You override it by providing a non-weak definition of the same name in your own source. By convention, the generated code calls the corresponding HAL handler (HAL_UART_IRQHandler()), which then reads the peripheral status flags, clears them, and dispatches to the appropriate callback.

Registering Callbacks at Runtime (HAL v1.8+)

Starting with STM32 HAL version 1.8 (released with CubeMX for STM32F4 v1.27.0), HAL supports a runtime callback registration API as an alternative to the weak-override pattern. When USE_HAL_UART_REGISTER_CALLBACKS is defined as 1 in stm32f4xx_hal_conf.h, you can use HAL_UART_RegisterCallback() to bind a function pointer to a specific event. This is useful for plugin architectures where the callback implementation is not known at link time.

/* ─── Runtime callback registration (HAL 1.8+) ──────────────────────────── */
/* stm32f4xx_hal_conf.h: #define USE_HAL_UART_REGISTER_CALLBACKS 1         */

static void MyRxCpltCallback(UART_HandleTypeDef *huart)
{
    if (huart->Instance == USART2)
    {
        uart2_rx_flag = 1;
        HAL_UART_Receive_IT(huart, &uart2_rx_byte, 1);
    }
}

void RegisterCallbacks(void)
{
    /* Register callback for RX complete event on UART2 */
    HAL_UART_RegisterCallback(&huart2, HAL_UART_RX_COMPLETE_CB_ID,
                              MyRxCpltCallback);

    /* Unregister (restore default empty weak implementation) */
    /* HAL_UART_UnRegisterCallback(&huart2, HAL_UART_RX_COMPLETE_CB_ID); */
}

The runtime registration approach has a small overhead compared to the weak-override approach (one indirect function call through a pointer per callback invocation), but it enables fully modular driver architectures where peripheral drivers can be loaded and unloaded without recompilation. For most embedded applications, the weak-override approach is simpler and equally effective.

The HAL callbacks themselves are declared __weak in the HAL source (e.g., stm32f4xx_hal_uart.c). When you define a function with the same name in your code, the linker selects your non-weak version. This means you never need to modify HAL source files — a critical principle for maintaining vendor update compatibility.

/* ─── Complete IRQ dispatch chain: USART2 receive ───────────────────────── */

/* Level 1: Hardware IRQ → IRQHandler (in stm32f4xx_it.c, CubeMX generated)
   This is the actual vector table entry. Calls the HAL handler.           */
void USART2_IRQHandler(void)
{
    HAL_UART_IRQHandler(&huart2);
    /* After this returns, NVIC clears the active bit and returns from ISR  */
}

/* Level 2: HAL_UART_IRQHandler (in stm32f4xx_hal_uart.c — do NOT modify)
   Reads USART2->SR, determines which event occurred (RXNE, TC, ORE, etc.),
   clears interrupt flags, updates handle state machine, then calls:        */

/*   HAL_UART_RxCpltCallback()  — when complete transfer done               */
/*   HAL_UART_ErrorCallback()   — on framing/parity/overrun error           */

/* Level 3: User callback — override the __weak HAL version here            */
/* __weak void HAL_UART_RxCpltCallback(...) defined in stm32f4xx_hal_uart.c */
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
    if (huart->Instance == USART2)
    {
        /* At this point huart->pRxBuffPtr has been incremented past the
           received data; use the buffer address you passed to Receive_IT   */
        uart2_rx_flag = 1;

        /* Re-arm for the next byte (interrupt mode: one byte at a time) */
        HAL_UART_Receive_IT(&huart2, &uart2_rx_byte, 1);
    }
}

/* Multiple UART callbacks can coexist safely — check huart->Instance */
void HAL_UART_ErrorCallback(UART_HandleTypeDef *huart)
{
    if (huart->Instance == USART2)
    {
        /* Log the error type and attempt recovery */
        uint32_t error = HAL_UART_GetError(huart);
        if (error & HAL_UART_ERROR_ORE)
        {
            /* Overrun: re-init UART and re-arm receive */
            HAL_UART_Init(huart);
            HAL_UART_Receive_IT(huart, &uart2_rx_byte, 1);
        }
    }
}

Latency Measurement

Interrupt latency is the time elapsed between the hardware event that triggers an interrupt and the execution of the first instruction of the ISR. On Cortex-M4, the hardware interrupt response time is 12 clock cycles under ideal conditions (interrupt assertion while the processor is executing a single-cycle instruction, no wait states, no FPU state save). At 168 MHz, this corresponds to approximately 71 nanoseconds.

In practice, several factors increase latency above this theoretical minimum:

  • Flash wait states: STM32F4 at 168 MHz uses 5 wait states for Flash accesses. The instruction pre-fetch buffer (ART Accelerator) mitigates this for sequential code, but interrupt entry requires fetching the vector table from Flash, adding up to 5 × 6 ns = 30 ns.
  • FPU lazy stacking: If the interrupted code was using FPU registers, the hardware may need to push up to 18 additional FPU registers on the stack (lazy stacking defers this to the first FPU instruction in the ISR). This adds up to 64 additional cycles in the worst case.
  • DMA bus contention: If DMA is actively transferring when the interrupt fires, the Cortex-M4 bus interface must wait for the current DMA burst to complete before the stack push can proceed.
  • Higher-priority ISR active: If a higher-priority ISR is already executing, the new interrupt is pended and will not fire until the higher-priority ISR completes (unless it has even higher priority — which would preempt).
/* ─── GPIO latency measurement: toggle PA0 at interrupt entry and exit ────── */

/* Setup: connect an external signal to EXTI1 (PB1) and monitor both PB1
   (trigger) and PA0 (ISR entry marker) on a two-channel oscilloscope.
   The time between rising edge on PB1 and rising edge on PA0 is the
   total interrupt latency including vector fetch and stack push.          */

/* Enable DWT cycle counter for sub-cycle measurement */
void DWT_Init(void)
{
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
    DWT->CYCCNT       = 0;
    DWT->CTRL        |= DWT_CTRL_CYCCNTENA_Msk;
}

static uint32_t isr_entry_cycle;
static uint32_t isr_exit_cycle;
volatile uint32_t last_latency_cycles;

void EXTI1_IRQHandler(void)
{
    /* First instruction: capture DWT cycle count and assert GPIO */
    isr_entry_cycle = DWT->CYCCNT;
    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_0, GPIO_PIN_SET);   /* rising edge */

    __HAL_GPIO_EXTI_CLEAR_IT(GPIO_PIN_1);   /* clear pending flag */

    /* Minimal work */
    HAL_GPIO_EXTI_Callback(GPIO_PIN_1);

    /* Last instruction before return: capture exit cycle */
    isr_exit_cycle = DWT->CYCCNT;
    last_latency_cycles = isr_exit_cycle - isr_entry_cycle;  /* ISR duration */

    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_0, GPIO_PIN_RESET);  /* falling edge */
    /* Oscilloscope: PA0 pulse width = ISR duration in cycles / 168 ns/cycle */
}

The DWT (Data Watchpoint and Trace) unit's cycle counter is the most accurate software timing mechanism on Cortex-M — it increments at the CPU clock frequency with no overhead and no interrupt-disable required. It is available on all Cortex-M3, M4, M7 devices and is the preferred tool for firmware latency profiling.

Tail-Chaining and Late-Arrival

Cortex-M hardware implements two additional optimisations that reduce interrupt overhead beyond the baseline 12-cycle entry cost:

  • Tail-chaining: When an ISR returns and another interrupt of equal or higher priority is already pending, the processor does not restore the stack frame only to push a new one. Instead, it transitions directly to the next ISR in 6 cycles (the pop-push cycle is eliminated). For applications with many interrupts firing in rapid succession, tail-chaining can roughly halve the per-ISR overhead.
  • Late arrival: If a higher-priority interrupt arrives while the processor is still in the initial 12-cycle stack-push phase of a lower-priority interrupt entry, the processor switches to the higher-priority ISR first. The lower-priority ISR's stack frame is already on the stack, so it will execute via tail-chain when the higher-priority ISR completes. This ensures that high-priority interrupts are serviced with the minimum possible latency even when they arrive during another interrupt's entry sequence.

Both mechanisms are entirely transparent to the programmer — no code changes are required. However, understanding them is important for accurately modelling worst-case interrupt latency in time-critical systems. Software that disables interrupts during critical sections prevents tail-chaining and can significantly increase the latency of any interrupts that fire during the disabled window.

Tail-chaining is visible on a logic analyser: if you toggle a GPIO at the start of each ISR, you will see consecutive ISR GPIO pulses with no gap between them (or a minimal 6-cycle gap) when tail-chaining occurs, versus a full stack-pop + stack-push cycle (~24 cycles) when no tail-chaining happens. This is a useful sanity check that your system is operating as the Cortex-M hardware architecture documentation predicts.

FPU and Interrupt Latency: Lazy Stacking

On Cortex-M4F and M7 (STM32F4, F7, H7), the FPU introduces a potential latency penalty at interrupt entry. The processor uses a technique called lazy stacking: when an interrupt fires, the hardware reserves space on the stack for the FPU context (S0–S15 and FPSCR) but does not immediately save the FPU registers. The save is deferred to the first instruction inside the ISR that actually uses the FPU. If the ISR never uses FPU registers, the save never occurs and the latency is the same as a non-FPU device.

If the ISR does execute a floating-point instruction, the deferred save occurs at that point — adding up to 12 cycles for the register dump. To eliminate this uncertainty for a latency-critical ISR, you can either: (a) avoid all floating-point operations in that ISR, or (b) disable lazy stacking in FPU->FPCCR to force eager stacking (consistent 12-cycle overhead but always paid). For most applications, lazy stacking is the right default.

Common Interrupt Pitfalls

Most STM32 interrupt bugs fall into a small number of well-known categories. Understanding them in advance will save hours of debugging.

Pitfall Symptom Root Cause Fix
Missing IRQHandler Firmware hangs or watchdog triggers on first interrupt No user-defined handler; default_handler loops forever Define the handler in stm32f4xx_it.c or user code
Wrong IRQn Different peripheral fires, or no interrupt at all NVIC_EnableIRQ(TIM3_IRQn) when TIM2 is the source Check IRQn_Type enum in stm32f4xx.h; use CubeMX generated code
Peripheral IE bit not set ISR never fires despite NVIC being configured TIM DIER update IE, USART CR1 RXNEIE, etc. not enabled Use HAL init functions or set peripheral IE bits explicitly
EXTI line conflict Two GPIO pins on same EXTI line; only one works PA5 and PB5 both map to EXTI5 — only one can be active Reassign one GPIO to a line not in use; check SYSCFG_EXTICRx
Missing interrupt flag clear ISR re-enters immediately in a tight loop; system appears hung Interrupt pending flag not cleared before IRET Always clear the source flag (EXTI PR, TIM SR, etc.) at ISR entry
HAL_Delay in ISR SysTick-driven delay never completes; system deadlocks SysTick priority equal to or lower than current ISR priority Never call HAL_Delay() or any blocking function from an ISR
FreeRTOS priority violation Hard fault or assert in FreeRTOS scheduler ISR priority lower than configMAX_SYSCALL_INTERRUPT_PRIORITY calls FromISR API Assign all FreeRTOS-aware ISRs priorities ≥ configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY

The EXTI line conflict deserves special attention because it is silent — the firmware compiles and links without error, and one of the two GPIOs simply never generates interrupts. The root cause is that EXTI lines 0–15 each correspond to a single GPIO pin number (PA0/PB0/PC0 all map to EXTI0, etc.), and only one port can be connected to each EXTI line at a time via the SYSCFG_EXTICRx multiplexer. CubeMX will warn about this conflict, but hand-written code will not.

Debugging Stuck Interrupts with CoreSight

When an interrupt fires but the ISR never seems to execute, or the system appears to hang, the CoreSight debug infrastructure provides powerful diagnostic tools accessible from STM32CubeIDE's debug perspective:

  • NVIC registers view: In the Registers pane, expand NVIC. The ISER0ISER7 registers show which IRQs are enabled. ISPR0ISPR7 show which are currently pending. IABR0IABR7 show which are active (currently executing). If ISPR shows an interrupt pending but IABR never shows it active, the interrupt is being blocked by a higher-priority interrupt or by PRIMASK/BASEPRI.
  • SysTick counter check: If the system is completely hung, check whether SysTick->VAL is counting down. If it is not changing, the processor is in a hard lock (WFI with all interrupts masked, or in a HardFault loop). If it is counting but SysTick callbacks are not firing, the SysTick interrupt is pending but masked.
  • ITM/SWO trace: Insert ITM_SendChar() calls at the start of each ISR. If the trace shows characters from one ISR but not another, the missing ISR is either not enabled, its peripheral is not generating the request, or it is being starved by a higher-priority ISR that never returns.
/* ─── Diagnostic: dump NVIC pending and active state via UART ─────────────── */
void NVIC_DiagnosticDump(void)
{
    char buf[128];

    /* Read ICSR (Interrupt Control and State Register) */
    uint32_t icsr = SCB->ICSR;
    uint32_t vect_active  = icsr & SCB_ICSR_VECTACTIVE_Msk;   /* currently active exception number */
    uint32_t vect_pending = (icsr & SCB_ICSR_VECTPENDING_Msk) >> SCB_ICSR_VECTPENDING_Pos;
    uint8_t  primask      = __get_PRIMASK();
    uint32_t basepri      = __get_BASEPRI();

    snprintf(buf, sizeof(buf),
        "NVIC: active=%lu pending=%lu PRIMASK=%u BASEPRI=%lu\r\n",
        vect_active, vect_pending, primask, basepri);
    HAL_UART_Transmit(&huart2, (uint8_t *)buf, strlen(buf), 10);

    /* Check if any peripheral interrupt is stuck pending */
    for (int i = 0; i < 3; i++)
    {
        uint32_t ispr = NVIC->ISPR[i];   /* Interrupt Set-Pending Register */
        if (ispr)
        {
            snprintf(buf, sizeof(buf), "ISPR[%d] = 0x%08lX (stuck pending!)\r\n", i, ispr);
            HAL_UART_Transmit(&huart2, (uint8_t *)buf, strlen(buf), 10);
        }
    }
}

Exercises

Practice Approach: A two-channel oscilloscope or logic analyser significantly improves visibility for these exercises. SWV (Serial Wire Viewer) in STM32CubeIDE can also be used to track interrupt counts in real time.

BeginnerExercise 1: EXTI Preempting TIM2

Configure an external interrupt on a button connected to PC13 with priority 5. Configure TIM2's update interrupt with priority 8 and set it to fire at 1 kHz. Press the button while TIM2 is generating interrupts. Using printf (via SWO or UART) inside each ISR, log which handler is executing. Verify from the logs that the EXTI callback runs to completion within the TIM2 ISR's execution window — confirming that priority 5 preempts priority 8. Then swap the priorities and confirm that the button interrupt no longer preempts TIM2.

IntermediateExercise 2: SysTick Multi-Rate Task Scheduler

Implement a bare-metal cooperative task scheduler using only SysTick (priority 0) and a software flag array. In the SysTick handler (1 ms period), increment a counter and set flags for three different periods: 1 ms, 10 ms, and 100 ms. In the main loop, check each flag and call the corresponding task function (Task1: toggle LED, Task2: read ADC, Task3: transmit UART). Toggle a separate GPIO at the start and end of each task. Use an oscilloscope to verify timing accuracy and confirm that no task misses its deadline when all three are running simultaneously at full load.

AdvancedExercise 3: Interrupt Latency Profile & Priority Inversion

Measure the interrupt latency for EXTI0 (PA0, driven by a signal generator at 10 kHz) against a background workload of: DMA2 Stream0 running continuous M2M copies, USART1 at 1 Mbit/s DMA, and TIM2 at 10 kHz. For each of the 16 priority levels assigned to EXTI0, record the worst-case latency using the DWT cycle counter technique described in Section 6. Plot latency vs priority level on a graph. Then deliberately create a priority inversion: assign a low-priority mutex-holding task a higher NVIC priority than EXTI0, observe the latency spike, and resolve it using BASEPRI masking to create a bounded critical section.

As a bonus step, enable the FPU and verify that the EXTI0 latency increases when the interrupted background code was executing a floating-point multiply-accumulate loop (demonstrating the lazy stacking cost). Then disable lazy stacking in FPU->FPCCR and confirm that the latency becomes deterministic (constant regardless of whether background code used the FPU), at the cost of a fixed overhead on every interrupt entry. Compare the two approaches in terms of worst-case and best-case latency, and form a conclusion about which is appropriate for your target application's timing budget.

STM32 NVIC Configuration Document Generator

Use this tool to document your interrupt priority assignments for a project. Fill in IRQ assignments, critical section strategy, and latency budgets, then export to Word, Excel, PDF, or PowerPoint. Drafts are saved automatically in your browser.

Conclusion & Next Steps

The NVIC is one of the most powerful — and most frequently misconfigured — features of the STM32 platform. In this article we covered:

  • NVIC architecture: 240 external IRQs + 16 system exceptions, fixed-priority system exceptions (Reset/NMI/HardFault), the interrupt vector table and VTOR relocation.
  • Priority grouping: The five PRIGROUP configurations, preemption vs sub-priority bits, HAL default NVIC_PRIORITYGROUP_4, and the critical rule of setting grouping once before any priorities.
  • Preemption and nesting: Hardware context save/restore, PRIMASK for global masking, BASEPRI for threshold masking (FreeRTOS pattern), stack sizing for nested interrupts.
  • ISR design rules: Flag-and-defer pattern, volatile shared variables, memory barriers, what never to do in an ISR (HAL_Delay, blocking I/O, long processing loops).
  • HAL callback architecture: The three-level dispatch chain (IRQHandler → HAL handler → callback), weak symbol override, and multi-peripheral callback demultiplexing.
  • Latency measurement: DWT cycle counter technique, GPIO toggle measurement, sources of latency above the 12-cycle theoretical minimum.
  • Common pitfalls: Missing handlers, wrong IRQn, peripheral IE bits, EXTI line conflicts, missing flag clears, FreeRTOS priority violations.

Next in the Series

In Part 10: Low-Power Modes, we will master Sleep, Stop, and Standby power modes, configure RTC wakeup from Standby, use the LP-UART to receive data while in Stop mode, and profile current consumption with a Nordic PPK2 or STM32CubeMonitor-Power — essential skills for any battery-powered or energy-constrained embedded design.

Technology