Back to Technology

STM32 Part 15: Bootloader Development

March 31, 2026 Wasil Zafar 30 min read

Every production STM32 product needs a bootloader — design a fail-safe flash memory layout, implement UART and USB DFU firmware update, validate images with CRC, and build the jump-to-application mechanism that makes it all work.

Table of Contents

  1. Bootloader Fundamentals
  2. Flash Programming
  3. Jump to Application
  4. UART Firmware Update Protocol
  5. USB DFU
  6. CRC Image Validation
  7. Dual-Bank & Fail-Safe Updates
  8. Exercises
  9. Bootloader Design Tool
  10. Conclusion & Next Steps
Series Overview: This is Part 15 of our 18-part STM32 Unleashed series. With FreeRTOS mastered in Part 14, we now tackle the bootloader — the critical firmware layer that enables field updates, fail-safe recovery, and factory programming in every production device.

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 15
1
Architecture & CubeMX Setup
STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project
Completed
2
GPIO & Button Debounce
GPIO modes, pull-up/down, EXTI, software debounce, HAL_GPIO_ReadPin
Completed
3
UART Communication
Polling, interrupt, DMA modes, printf retargeting, ring buffers
Completed
4
Timers, PWM & Input Capture
TIM basics, PWM generation, input capture, encoder mode
Completed
5
ADC & DAC
Single/continuous conversion, DMA, injected channels, DAC waveforms
Completed
6
SPI Protocol
SPI master/slave, full-duplex, DMA transfers, sensor drivers
Completed
7
I2C Protocol
I2C master, 7/10-bit addressing, DMA, multi-master, error handling
Completed
8
DMA & Memory Efficiency
DMA streams, circular mode, memory-to-memory, zero-copy patterns
Completed
9
Interrupt Management & NVIC
Priority grouping, preemption, ISR design, HAL callbacks, latency
Completed
10
Low-Power Modes
Sleep, Stop, Standby modes, RTC wakeup, LP UART, power profiling
Completed
11
RTC & Calendar
RTC configuration, alarms, backup registers, calendar subseconds
Completed
12
CAN Bus
FDCAN/bxCAN, filters, message frames, error handling, automotive use
Completed
13
USB CDC Virtual COM Port
USB FS/HS, CDC class, virtual serial, control transfers, descriptors
Completed
14
FreeRTOS Integration
Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing
Completed
15
Bootloader Development
Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app
You Are Here
16
External Storage: SD & QSPI Flash
FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling
17
Ethernet & TCP/IP Stack
LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors
18
Production Readiness
Watchdog, HardFault handler, flash option bytes, code signing, CI/CD

Bootloader Fundamentals

A bootloader is the first code that runs after reset. It decides whether to launch the application, enter a firmware update mode, or perform fail-safe recovery. Without a bootloader, updating firmware in a deployed product requires physical access and a debug probe — unacceptable for any product shipped in volume.

Why a custom bootloader rather than ST's internal one? The ST internal bootloader lives in protected system memory (accessed by pulling BOOT0 high) and supports UART, SPI, I2C, USB DFU, and CAN. But it offers no customisation: no proprietary update protocol, no authentication, no version checking, no dual-bank rollback. A custom IAP (In-Application Programming) bootloader runs from flash sector 0, owns the update logic, and gives you complete control.

Boot decision logic: The bootloader checks one or more conditions at startup to decide whether to enter update mode or jump to the application:

  • Magic word in backup register: the application writes a known value (e.g. 0xDEADBEEF) to RTC->BKP0R before resetting. The bootloader checks this register; if the magic value is present, it clears it and enters update mode. This allows OTA-triggered updates.
  • GPIO pin: a dedicated BOOT button held during reset forces update mode. Simpler but requires physical access.
  • CRC failure: if the stored CRC doesn't match the computed CRC of the application image, the bootloader cannot safely jump — it enters update mode automatically.

Linker script changes for the application: the application must be compiled to start at APPLICATION_ADDRESS, not 0x08000000. Modify the FLASH region origin in the application's linker script. On startup, the application must also set the VTOR (Vector Table Offset Register) to point to its own vector table at APPLICATION_ADDRESS.

Sector Address Range Size Content Notes
Sector 0 0x0800_0000 – 0x0800_3FFF 16 KB Bootloader code Write-protected in production
Sector 1 0x0800_4000 – 0x0800_7FFF 16 KB Bootloader data / config Version info, device UUID
Sector 2 0x0800_8000 – 0x0800_BFFF 16 KB Bootloader data (spare) Reserved for future use
Sector 3 0x0800_C000 – 0x0800_FFFF 16 KB Bootloader data (spare) Reserved
Sector 4 0x0801_0000 – 0x0801_FFFF 64 KB APPLICATION_ADDRESS Application start
Sectors 5–11 0x0802_0000 – 0x080F_FFFF 7 × 128 KB Application code & data Total ~960 KB for application
/* ---------------------------------------------------------------
 * Bootloader main() — check boot condition, jump or update.
 * Boot mode triggered by:
 *   1. Magic word 0xDEADBEEF in RTC backup register BKP0R
 *   2. BOOT button (USER_BUTTON_PIN) held at reset
 *   3. Application CRC failure (checked by verify_application())
 * --------------------------------------------------------------- */
#include "main.h"
#include "stm32f4xx_hal.h"

#define APPLICATION_ADDRESS  0x08010000UL
#define BOOT_MAGIC_WORD      0xDEADBEEFUL

/* Forward declarations */
static bool should_enter_update_mode(void);
static bool verify_application(void);
static void jump_to_application(void);
static void bootloader_uart_update(void);

int main(void)
{
    HAL_Init();
    SystemClock_Config();
    MX_GPIO_Init();
    MX_USART1_UART_Init();    /* bootloader update UART */
    MX_RTC_Init();             /* access backup registers */

    /* Step 1: determine boot path */
    bool update_forced = should_enter_update_mode();
    bool app_valid     = verify_application();

    if (update_forced || !app_valid)
    {
        /* Signal update mode: blink LED rapidly */
        for (int i = 0; i < 6; i++) {
            HAL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin);
            HAL_Delay(100);
        }
        bootloader_uart_update();  /* blocks until update complete */
    }

    /* Step 2: app is valid and no update requested — jump */
    jump_to_application();

    /* Should never reach here */
    while (1) {}
}

static bool should_enter_update_mode(void)
{
    /* Check RTC backup register for magic word */
    HAL_PWR_EnableBkUpAccess();
    uint32_t magic = HAL_RTCEx_BKUPRead(NULL, RTC_BKP_DR0);
    if (magic == BOOT_MAGIC_WORD)
    {
        HAL_RTCEx_BKUPWrite(NULL, RTC_BKP_DR0, 0x00000000UL); /* clear it */
        return true;
    }
    /* Check boot button (active low with internal pull-up) */
    if (HAL_GPIO_ReadPin(USER_BUTTON_GPIO_Port, USER_BUTTON_Pin) == GPIO_PIN_RESET)
        return true;

    return false;
}

Flash Programming

Flash memory on STM32 must be erased before it can be written — you cannot change a bit from 0 to 1 without erasing the entire sector first (erase sets all bits to 1; programming clears selected bits to 0). The HAL provides a complete API for this, but you must follow the unlock/lock sequence strictly to avoid accidental writes.

HAL_FLASH_Unlock() writes the two required keys to FLASH_KEYR (0x45670123, then 0xCDEF89AB), removing the hardware write protection. HAL_FLASH_Lock() re-enables it. Always lock after programming — a hardware fault while FLASH is unlocked could corrupt your firmware.

For STM32F4, erase is sector-based: HAL_FLASHEx_Erase() with FLASH_TYPEERASE_SECTORS, specifying the sector number and voltage range. The voltage range determines the programming parallelism (×8, ×16, ×32 or ×64 bits), which affects sector erase time. For STM32G0, L4, F0 and similar, erase is page-based with much smaller (1–2 KB) pages.

Error handling: After any HAL_FLASH operation, call HAL_FLASH_GetError(). Common errors include HAL_FLASH_ERROR_PGS (programming sequence error — did you forget to call erase first?), HAL_FLASH_ERROR_WRP (write protection violation), and HAL_FLASH_ERROR_PGA (parallelism error). Always clear errors with __HAL_FLASH_CLEAR_FLAG() before the next operation.

/* ---------------------------------------------------------------
 * Flash programming API: erase sectors, write words, verify.
 * Targets STM32F4 sector-based flash.
 * --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"

#define APP_FIRST_SECTOR    4u      /* Sector 4 = 0x08010000 on F4  */
#define APP_SECTOR_COUNT    8u      /* Sectors 4–11 = ~960 KB        */

/* Erase all sectors assigned to the application */
HAL_StatusTypeDef flash_erase_app_sectors(void)
{
    FLASH_EraseInitTypeDef eraseInit;
    uint32_t sectorError = 0;

    eraseInit.TypeErase    = FLASH_TYPEERASE_SECTORS;
    eraseInit.VoltageRange = FLASH_VOLTAGE_RANGE_3;   /* 2.7–3.6 V: ×32 bit */
    eraseInit.Sector       = APP_FIRST_SECTOR;
    eraseInit.NbSectors    = APP_SECTOR_COUNT;

    HAL_FLASH_Unlock();
    HAL_StatusTypeDef status = HAL_FLASHEx_Erase(&eraseInit, §orError);
    HAL_FLASH_Lock();

    if (status != HAL_OK || sectorError != 0xFFFFFFFFU)
        return HAL_ERROR;

    return HAL_OK;
}

/* Write one 32-bit word to flash at the given address.
 * Caller must ensure address is 4-byte aligned and sector is erased. */
HAL_StatusTypeDef flash_write_word(uint32_t address, uint32_t data)
{
    HAL_FLASH_Unlock();

    HAL_StatusTypeDef status =
        HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, address, (uint64_t)data);

    if (status != HAL_OK)
    {
        uint32_t err = HAL_FLASH_GetError();
        (void)err;   /* log err in production */
        __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_OPERR | FLASH_FLAG_WRPERR |
                               FLASH_FLAG_PGAERR | FLASH_FLAG_PGPERR |
                               FLASH_FLAG_PGSERR);
    }

    HAL_FLASH_Lock();
    return status;
}

/* Write a buffer of bytes to flash (word-aligned write loop) */
HAL_StatusTypeDef flash_write_buffer(uint32_t address,
                                      const uint8_t *data,
                                      uint32_t length)
{
    if ((address & 0x03U) || ((uint32_t)data & 0x03U))
        return HAL_ERROR;   /* alignment check */

    HAL_FLASH_Unlock();
    HAL_StatusTypeDef status = HAL_OK;

    for (uint32_t i = 0; i < length; i += 4)
    {
        uint32_t word;
        memcpy(&word, &data[i], 4);
        status = HAL_FLASH_Program(FLASH_TYPEPROGRAM_WORD, address + i,
                                   (uint64_t)word);
        if (status != HAL_OK) break;
    }

    HAL_FLASH_Lock();
    return status;
}

/* Verify written data by reading back and comparing */
bool flash_verify(uint32_t address, const uint8_t *expected, uint32_t length)
{
    return (memcmp((const void *)address, expected, length) == 0);
}

Jump to Application

The jump-to-application is the most safety-critical step in the bootloader. Done incorrectly, the CPU will fault immediately or run garbage. Done correctly, it is invisible — the application starts exactly as if it had been the first code to run after reset.

Validation before jumping: check two things. First, the initial stack pointer (stored at APPLICATION_ADDRESS + 0) must point into valid SRAM — on STM32F4, SRAM1 is at 0x20000000, so a valid MSP is in the range 0x20000000–0x2001FFFF. Second, the reset handler address (stored at APPLICATION_ADDRESS + 4) must point into the flash region allocated to the application. If either check fails, do not jump.

Pre-jump cleanup: disable all interrupts (__disable_irq()), disable and de-initialise all peripherals the bootloader used (UART, DMA, timers), reset SysTick to prevent a tick interrupt from firing immediately after the jump, and disable all NVIC peripheral interrupts. The application will re-initialise everything from scratch.

VTOR relocation: the Cortex-M Vector Table Offset Register defaults to 0 (pointing to the vector table at 0x08000000 — the bootloader's vector table). The application must update SCB->VTOR to point to its own vector table at APPLICATION_ADDRESS. This is best done at the very start of SystemInit() in the application's startup code.

/* ---------------------------------------------------------------
 * jump_to_application(): validate, de-init, set MSP, call ResetHandler.
 * --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"
#include "core_cm4.h"

#define APPLICATION_ADDRESS   0x08010000UL
#define SRAM1_BASE_ADDR       0x20000000UL
#define SRAM1_END_ADDR        0x2001FFFFUL
#define FLASH_APP_END_ADDR    0x080FFFFFUL

typedef void (*AppResetHandler_t)(void);

static bool is_app_valid(void)
{
    uint32_t sp  = *(volatile uint32_t *)(APPLICATION_ADDRESS);
    uint32_t pc  = *(volatile uint32_t *)(APPLICATION_ADDRESS + 4U);

    /* Stack pointer must be in SRAM */
    if (sp < SRAM1_BASE_ADDR || sp > SRAM1_END_ADDR)
        return false;

    /* Reset vector must be in application flash region (thumb bit set) */
    uint32_t reset_addr = pc & ~1U;
    if (reset_addr < APPLICATION_ADDRESS || reset_addr > FLASH_APP_END_ADDR)
        return false;

    return true;
}

void jump_to_application(void)
{
    if (!is_app_valid())
    {
        /* Log error and stay in bootloader */
        HAL_UART_Transmit(&huart1, (uint8_t *)"APP INVALID\r\n", 13, 100);
        return;
    }

    /* 1. Disable all interrupts */
    __disable_irq();

    /* 2. De-initialise all HAL peripherals used by the bootloader */
    HAL_UART_DeInit(&huart1);
    HAL_RTC_DeInit(&hrtc);
    HAL_RCC_DeInit();        /* restores default clock config */

    /* 3. Reset SysTick — prevent early tick interrupts in app */
    SysTick->CTRL = 0;
    SysTick->LOAD = 0;
    SysTick->VAL  = 0;

    /* 4. Disable all NVIC peripheral interrupts */
    for (int i = 0; i < 8; i++)
        NVIC->ICER[i] = 0xFFFFFFFFU;

    /* 5. Clear all pending NVIC interrupts */
    for (int i = 0; i < 8; i++)
        NVIC->ICPR[i] = 0xFFFFFFFFU;

    /* 6. Set Main Stack Pointer from application's vector table */
    uint32_t app_sp = *(volatile uint32_t *)(APPLICATION_ADDRESS);
    __set_MSP(app_sp);

    /* 7. VTOR will be set by the application's SystemInit().
     *    Set it here too for safety. */
    SCB->VTOR = APPLICATION_ADDRESS;

    /* 8. Call the application's reset handler */
    uint32_t app_reset = *(volatile uint32_t *)(APPLICATION_ADDRESS + 4U);
    AppResetHandler_t resetHandler = (AppResetHandler_t)(app_reset);

    __enable_irq();
    resetHandler();   /* never returns */
}
Application-side: set VTOR in SystemInit(). In the application's system_stm32f4xx.c, add SCB->VTOR = APPLICATION_ADDRESS; as the very first line of SystemInit(). If the application uses a CMSIS-provided SystemInit(), add it in the /* USER CODE */ section. Without this, all interrupts will vector to the bootloader's handlers once the NVIC is re-enabled.

UART Firmware Update Protocol

A simple binary frame protocol over UART is the most reliable update mechanism for manufacturing and field service. It does not require USB drivers, works at any baud rate, and can be scripted from any host platform.

Protocol frame structure:

  • SYNC frame: host sends 0x7F byte; bootloader responds 0x79 (ACK) or 0x1F (NAK).
  • ERASE command: erase all application sectors. Bootloader responds ACK when done (may take seconds for large flash).
  • DATA frame: 128-byte payload + 2-byte CRC16/CCITT-FALSE. Bootloader verifies CRC, writes to flash, responds ACK or NAK.
  • VERIFY command: host sends expected CRC32 of entire image; bootloader verifies and responds ACK or NAK.
  • BOOT command: jump to new application.

CRC16/CCITT-FALSE (polynomial 0x1021, init 0xFFFF, no reflect) provides sufficient integrity for detecting transmission errors on noisy UART links. The PC-side tool (Python, C#, or any scripting language) computes the same CRC per frame and the bootloader verifies it.

Timeout handling: the bootloader waits up to 5 seconds for the first SYNC byte. Within a transfer, each DATA frame must arrive within 2 seconds. On timeout, the bootloader sends NAK and the host retries up to 3 times before aborting.

/* ---------------------------------------------------------------
 * UART update state machine: SYNC → ERASE → RX_DATA → VERIFY → BOOT
 * Each DATA frame: 128 bytes payload + 2-byte CRC16.
 * --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"
#include 
#include 

#define FRAME_SIZE        128U
#define CMD_SYNC          0x7FU
#define CMD_ERASE         0xEEU
#define CMD_DATA          0xDAU
#define CMD_VERIFY        0xCCU
#define CMD_BOOT          0xBBU
#define ACK               0x79U
#define NAK               0x1FU
#define RX_TIMEOUT_MS     2000U

extern UART_HandleTypeDef huart1;

static uint16_t crc16_ccitt(const uint8_t *data, uint16_t len)
{
    uint16_t crc = 0xFFFFU;
    for (uint16_t i = 0; i < len; i++) {
        crc ^= (uint16_t)data[i] << 8;
        for (int b = 0; b < 8; b++)
            crc = (crc & 0x8000U) ? (crc << 1) ^ 0x1021U : crc << 1;
    }
    return crc;
}

static bool uart_recv_byte(uint8_t *b, uint32_t timeout_ms)
{
    return HAL_UART_Receive(&huart1, b, 1, timeout_ms) == HAL_OK;
}

static void uart_send_byte(uint8_t b)
{
    HAL_UART_Transmit(&huart1, &b, 1, 100);
}

void bootloader_uart_update(void)
{
    uint8_t cmd;
    uint8_t frame[FRAME_SIZE + 2];   /* payload + 2-byte CRC */
    uint32_t write_addr  = APPLICATION_ADDRESS;
    uint32_t total_bytes = 0;

    /* STATE: SYNC — wait for host */
    while (true)
    {
        if (!uart_recv_byte(&cmd, 5000)) { uart_send_byte(NAK); continue; }
        if (cmd == CMD_SYNC) { uart_send_byte(ACK); break; }
        uart_send_byte(NAK);
    }

    /* STATE: ERASE */
    if (!uart_recv_byte(&cmd, RX_TIMEOUT_MS) || cmd != CMD_ERASE)
        { uart_send_byte(NAK); return; }

    if (flash_erase_app_sectors() != HAL_OK)
        { uart_send_byte(NAK); return; }

    uart_send_byte(ACK);

    /* STATE: RX_DATA — receive frames until CMD_VERIFY */
    while (true)
    {
        if (!uart_recv_byte(&cmd, RX_TIMEOUT_MS)) { uart_send_byte(NAK); return; }

        if (cmd == CMD_VERIFY) break;   /* move to verify state */

        if (cmd != CMD_DATA) { uart_send_byte(NAK); continue; }

        /* Receive 128 bytes + 2-byte CRC */
        if (HAL_UART_Receive(&huart1, frame, FRAME_SIZE + 2, RX_TIMEOUT_MS) != HAL_OK)
            { uart_send_byte(NAK); continue; }

        uint16_t rx_crc  = ((uint16_t)frame[FRAME_SIZE] << 8) | frame[FRAME_SIZE + 1];
        uint16_t cmp_crc = crc16_ccitt(frame, FRAME_SIZE);

        if (rx_crc != cmp_crc) { uart_send_byte(NAK); continue; }

        if (flash_write_buffer(write_addr, frame, FRAME_SIZE) != HAL_OK)
            { uart_send_byte(NAK); return; }

        write_addr  += FRAME_SIZE;
        total_bytes += FRAME_SIZE;
        uart_send_byte(ACK);
    }

    /* STATE: VERIFY — receive expected CRC32 (4 bytes) */
    uint8_t crc_buf[4];
    if (HAL_UART_Receive(&huart1, crc_buf, 4, RX_TIMEOUT_MS) != HAL_OK)
        { uart_send_byte(NAK); return; }

    uint32_t expected_crc = ((uint32_t)crc_buf[0] << 24) |
                            ((uint32_t)crc_buf[1] << 16) |
                            ((uint32_t)crc_buf[2] <<  8) |
                             (uint32_t)crc_buf[3];

    /* Full CRC32 verification handled in verify_application() */
    (void)expected_crc;
    uart_send_byte(ACK);

    /* STATE: BOOT */
    if (!uart_recv_byte(&cmd, RX_TIMEOUT_MS) || cmd != CMD_BOOT)
        { uart_send_byte(NAK); return; }

    uart_send_byte(ACK);
    HAL_Delay(10);   /* allow ACK to transmit */
    jump_to_application();
}

USB DFU (Device Firmware Upgrade)

USB DFU is a standardised firmware update protocol defined in USB Device Class Specification for Device Firmware Upgrade, revision 1.1. Every modern operating system has a built-in DFU driver, and dfu-util is a free, cross-platform command-line tool that handles DFU transfers without installing proprietary software.

ST DFU implementation: STM32Cube includes the DFU USB Device class under Middlewares/ST/STM32_USB_Device_Library/Class/DFU/. CubeMX can generate the scaffolding: enable USB_OTG_FS, select Device Only mode, add the DFU class. The key parameters are:

  • USBD_DFU_APP_DEFAULT_ADD — the start address of the application region (must match APPLICATION_ADDRESS).
  • USBD_DFU_XFER_SIZE — the DFU transfer block size (typically 2048 bytes for STM32F4).
  • The DFU descriptor's @Internal Flash string lists the accessible flash regions in the format expected by dfu-util.

dfu-util command line: once the device is in DFU mode, flash the firmware with:
dfu-util -a 0 -s 0x08010000:leave -D application.bin
The :leave suffix tells dfu-util to send the DFU_DETACH request after programming, causing the device to jump to the application automatically.

/* ---------------------------------------------------------------
 * DFU bootloader entry — USB FS init + DFU polling loop.
 * Called when bootloader decides firmware update is needed via USB.
 * --------------------------------------------------------------- */
#include "usbd_core.h"
#include "usbd_dfu.h"
#include "usbd_dfu_if.h"    /* generated by CubeMX: flash read/write/erase */

extern USBD_HandleTypeDef hUsbDeviceFS;

/* DFU media interface (generated by CubeMX in usbd_dfu_if.c) */
extern USBD_DFU_MediaTypeDef USBD_DFU_fops_FS;

void bootloader_usb_dfu_enter(void)
{
    /* Initialise USB core */
    USBD_Init(&hUsbDeviceFS, &FS_Desc, DEVICE_FS);

    /* Register DFU class */
    USBD_RegisterClass(&hUsbDeviceFS, USBD_DFU_CLASS);

    /* Register DFU flash operations (read/write/erase callbacks) */
    USBD_DFU_RegisterMedia(&hUsbDeviceFS, &USBD_DFU_fops_FS);

    /* Connect to USB host */
    USBD_Start(&hUsbDeviceFS);

    /* Signal DFU mode: fast double-blink LED pattern */
    uint32_t start = HAL_GetTick();

    while (1)
    {
        /* Blink LED at 5 Hz to indicate DFU mode */
        HAL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin);
        HAL_Delay(100);

        /* DFU timeout: 30 seconds of inactivity → jump to app */
        if ((HAL_GetTick() - start) > 30000UL)
        {
            USBD_Stop(&hUsbDeviceFS);
            USBD_DeInit(&hUsbDeviceFS);
            jump_to_application();
            break;
        }

        /* Check if DFU transfer completed (USBD_DFU sets a flag) */
        /* In production: check hUsbDeviceFS.dev_state for CONFIGURED */
    }
}

/* CubeMX-generated usbd_dfu_if.c must implement these callbacks:
 *   DFU_Init()        — enable flash access
 *   DFU_DeInit()      — release flash, set ready for jump
 *   DFU_Erase(addr)   — erase sector containing addr
 *   DFU_Write(src,dst,len) — write len bytes from src to flash at dst
 *   DFU_Read(src,dst,len)  — read len bytes from flash at src
 *   DFU_GetStatus()   — return status/polling timeout
 * USBD_DFU_APP_DEFAULT_ADD must match APPLICATION_ADDRESS.         */
Write-protect sector 0 in production. Set the write protection option byte for Sector 0 using STM32CubeProgrammer or HAL_FLASHEx_OBProgram(). This prevents accidental or malicious overwrite of the bootloader via the DFU interface — the DFU flash map should only include sectors 4 and above.

CRC Image Validation

Before jumping to an application, the bootloader should verify that the image is intact. An erased flash region (all 0xFF bytes) is not a valid application — jumping to it produces a HardFault immediately. A truncated or corrupted image causes unpredictable behaviour. CRC32 validation catches all of these cases reliably.

Compile-time CRC embedding: after the linker produces the .bin file, a post-build script (Python or the STM32CubeIDE post-build step) computes CRC32 of the binary and appends it as the last 4 bytes, or stores it at a known fixed offset (e.g. the last word of the application's flash region). The bootloader reads the stored CRC, computes the CRC32 of the image (excluding the stored value itself), and compares.

Hardware CRC unit: STM32 devices include a dedicated CRC calculation unit (CRC peripheral) with configurable polynomial. The default polynomial (0x04C11DB7) matches standard CRC32/MPEG-2. Enable it with __HAL_RCC_CRC_CLK_ENABLE(). Call HAL_CRC_Calculate() with the data buffer — the hardware computes in parallel with the CPU, taking one cycle per word.

Preventing jumps to erased flash: a quick sanity check: the first word of the application region (*(uint32_t*)APPLICATION_ADDRESS) must not be 0xFFFFFFFF (erased) — this would be an invalid stack pointer. This check takes one cycle and filters 99% of "no firmware programmed" cases before the full CRC computation.

/* ---------------------------------------------------------------
 * verify_application(): hardware CRC32 of application flash region.
 * CRC32 is stored at APPLICATION_ADDRESS + APP_SIZE - 4.
 * Returns true if image is valid and safe to jump to.
 * --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"

#define APPLICATION_ADDRESS   0x08010000UL
#define APP_FLASH_SIZE        0x000F0000UL   /* 960 KB */
#define APP_CRC_OFFSET        (APP_FLASH_SIZE - 4U)

extern CRC_HandleTypeDef hcrc;

bool verify_application(void)
{
    volatile uint32_t *app_start = (volatile uint32_t *)APPLICATION_ADDRESS;

    /* Quick check: first word must be a valid stack pointer */
    if (*app_start == 0xFFFFFFFFUL || *app_start == 0x00000000UL)
        return false;

    /* Read the CRC32 stored by the build tool at the end of the image */
    uint32_t stored_crc = *(volatile uint32_t *)(APPLICATION_ADDRESS +
                                                  APP_CRC_OFFSET);

    /* Compute CRC32 over the application image (excluding stored CRC word) */
    __HAL_RCC_CRC_CLK_ENABLE();

    /* Reset CRC peripheral (clears accumulator to initial value) */
    __HAL_CRC_DR_RESET(&hcrc);

    uint32_t computed_crc = HAL_CRC_Calculate(
        &hcrc,
        (uint32_t *)APPLICATION_ADDRESS,
        (APP_FLASH_SIZE - 4U) / 4U   /* length in 32-bit words */
    );

    __HAL_RCC_CRC_CLK_DISABLE();

    if (computed_crc != stored_crc)
    {
        /* Log mismatch: stored vs computed */
        char msg[64];
        int len = snprintf(msg, sizeof(msg),
                           "CRC FAIL: stored=%08lX computed=%08lX\r\n",
                           (unsigned long)stored_crc,
                           (unsigned long)computed_crc);
        HAL_UART_Transmit(&huart1, (uint8_t *)msg, len, 100);
        return false;
    }

    return true;
}

Dual-Bank & Fail-Safe Updates

Single-bank updates have an inherent vulnerability: the window between erasing the old application and successfully programming the new one. A power failure during this window leaves the device with no valid firmware — it must be recovered with a debug probe, which is unacceptable in deployed hardware.

Dual-bank flash (available on STM32H7, L5, U5, and some G0 variants) solves this by providing two independent flash banks of equal size. The device boots from Bank 1 by default (or whichever bank is designated "active" by the BFB2 option bit). While Bank 1 runs the current application, the bootloader programs the new firmware into Bank 2 without interrupting operation. Only after full programming and CRC verification does the bootloader swap the active bank — an atomic operation triggered by setting the BFB2 option bit.

Bank swap mechanism: HAL_FLASHEx_OBProgram() with OPTIONBYTE_BFB2 toggles the active bank. After the next reset, the CPU boots from Bank 2. The entire programming and swap procedure is fail-safe: if power is lost before the swap, Bank 1 still contains the last valid firmware and the device boots normally.

Version downgrade protection: the bootloader should refuse to program a firmware image with a lower version number than what is currently installed, unless a manufacturer override is explicitly authorised. Store the running version in a backup register or a dedicated flash region. This prevents accidental rollback and blocks certain downgrade attacks.

Strategy Devices Power-fail safe? Recovery mechanism Complexity
Single-bank IAP All STM32 No — vulnerable during erase Debug probe / BOOT0 pin Low
Single-bank + backup slot All STM32 with enough flash Partial — backup can restore Copy backup to main slot Medium
Dual-bank swap (H7/L5/U5) STM32H7, L5, U5 Yes — atomic bank swap Automatic: inactive bank untouched Medium-High
External flash staging Any with QSPI/SD Yes — internal flash unchanged until verified Re-download from external flash High
/* ---------------------------------------------------------------
 * Dual-bank update flow for STM32H7:
 * Detect active bank, program inactive bank, verify, swap, reset.
 * --------------------------------------------------------------- */
#include "stm32h7xx_hal.h"

#define BANK1_BASE    0x08000000UL
#define BANK2_BASE    0x08100000UL   /* H7: Bank 2 starts at +1 MB */

static uint32_t get_inactive_bank_base(void)
{
    /* Read OPTSR: BFB2 bit indicates current active bank */
    FLASH_OBProgramInitTypeDef ob;
    HAL_FLASHEx_OBGetConfig(&ob);

    /* If BFB2 = 0, Bank 1 is active → program Bank 2 and vice versa */
    if ((ob.USERConfig & OB_BFB2_ENABLE) == OB_BFB2_ENABLE)
        return BANK1_BASE;  /* Bank 2 active → write to Bank 1 */
    else
        return BANK2_BASE;  /* Bank 1 active → write to Bank 2 */
}

HAL_StatusTypeDef dual_bank_update(const uint8_t *new_fw, uint32_t fw_size)
{
    uint32_t inactive_base = get_inactive_bank_base();

    /* 1. Erase inactive bank */
    FLASH_EraseInitTypeDef erase;
    erase.TypeErase = FLASH_TYPEERASE_MASSERASE;
    erase.Banks     = (inactive_base == BANK1_BASE) ? FLASH_BANK_1 : FLASH_BANK_2;
    uint32_t err    = 0;

    HAL_FLASH_Unlock();
    if (HAL_FLASHEx_Erase(&erase, &err) != HAL_OK)
    { HAL_FLASH_Lock(); return HAL_ERROR; }

    /* 2. Program new firmware to inactive bank */
    for (uint32_t i = 0; i < fw_size; i += 4)
    {
        uint32_t word;
        memcpy(&word, &new_fw[i], 4);
        if (HAL_FLASH_Program(FLASH_TYPEPROGRAM_FLASHWORD,
                              inactive_base + i, (uint32_t)new_fw + i) != HAL_OK)
        { HAL_FLASH_Lock(); return HAL_ERROR; }
    }
    HAL_FLASH_Lock();

    /* 3. Verify CRC32 of newly programmed bank */
    if (!crc32_verify_region(inactive_base, fw_size))
        return HAL_ERROR;   /* stay on current bank, do not swap */

    /* 4. Swap active bank (option byte write — triggers reset) */
    HAL_FLASH_OB_Unlock();
    FLASH_OBProgramInitTypeDef ob_prog;
    ob_prog.OptionType = OPTIONBYTE_USER;
    ob_prog.USERType   = OB_USER_BFB2;
    ob_prog.USERConfig = OB_BFB2_ENABLE;  /* toggle BFB2 to swap banks */
    HAL_FLASHEx_OBProgram(&ob_prog);
    HAL_FLASH_OB_Launch();  /* applies option bytes and resets device */

    /* Device resets here — new firmware boots from previously inactive bank */
    return HAL_OK;
}

Exercises

Exercise 1 Beginner

Button-Triggered UART Bootloader

Write a bootloader that checks if the USER button is pressed at startup. If pressed, enter a simple UART update loop (polling, no CRC, 64-byte frames). If not pressed, verify the application stack pointer and jump to it. Test by flashing a "Blink v1" application (500 ms blink), then use a Python script to update to "Blink v2" (250 ms blink) via the UART bootloader. Confirm the LED blink rate changes after reset without connecting the debug probe.

IAP Bootloader UART Update jump_to_application Python Host Tool
Exercise 2 Intermediate

UART Protocol with CRC16 Frame Verification

Extend the Exercise 1 bootloader to use the full 128-byte frame protocol with CRC16/CCITT-FALSE verification. The PC-side Python tool computes CRC per frame and waits for ACK before sending the next. The bootloader verifies each frame and sends NAK on mismatch; the host retries up to 3 times. To test error handling: manually inject a deliberate bit error in one frame in the Python script (flip one byte). Verify the bootloader sends NAK, the host retries, and the correct data is eventually accepted. After the full transfer, verify the CRC32 of the complete image.

CRC16 Frame Protocol Error Injection NAK/Retry
Exercise 3 Advanced

Production STM32H7 Dual-Bank USB DFU Bootloader

Build a production-grade bootloader for STM32H7 using dual-bank flash. The bootloader lives in Bank 1, Sectors 0–1 (total 128 KB). The application occupies Bank 1 Sectors 2+ (minimum 384 KB for current app) and all of Bank 2 (1 MB for new firmware staging). Implement USB DFU for update transfers using the ST DFU middleware. After a full DFU transfer to the inactive bank: (1) verify CRC32 using the hardware CRC unit; (2) if valid, set BFB2 option bit and reset; (3) on power-up, if the new application signature word is invalid, automatically revert by clearing BFB2 and resetting. Measure the total update time from dfu-util invocation to first blink of new firmware for a 512 KB binary.

STM32H7 Dual-Bank USB DFU CRC32 Option Bytes

Bootloader Design Tool

Document your STM32 bootloader design — target MCU, flash layout, update interface, image validation strategy, and fail-safe approach. Download as Word, Excel, PDF, or PPTX for design reviews or production handoff.

STM32 Bootloader Design Generator

Document your IAP bootloader — flash layout, update protocol, image validation, fail-safe strategy. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

In this article we have built a complete bootloader toolkit for STM32:

  • The boot decision uses a magic word in the RTC backup register (for OTA-triggered updates), a GPIO pin (for manufacturing), or automatic CRC failure detection — combining all three covers every update scenario.
  • Flash programming requires the strict unlock → erase → write → lock sequence. Always verify the written data before declaring success.
  • The jump-to-application must validate the stack pointer and reset vector, disable all interrupts, de-initialise all peripherals, reset SysTick, and update MSP before calling the application's reset handler. The application must set SCB->VTOR to its own base address in SystemInit().
  • A UART frame protocol with CRC16 per frame gives robust, portable update capability with no USB driver requirement.
  • USB DFU is the cleanest end-user experience — standard OS driver, dfu-util on all platforms, no custom host software needed.
  • CRC32 image validation using the hardware CRC unit prevents jumps to erased or corrupted firmware. Embed the CRC at build time using a post-build Python script.
  • Dual-bank updates on STM32H7/L5/U5 provide atomic, power-fail-safe firmware replacement — the gold standard for field-deployed systems.

Next in the Series

In Part 16: External Storage — SD & QSPI Flash, we mount FATFS on an SD card over SPI or SDMMC, read and write QSPI NOR flash at full quad-SPI speeds, enable memory-mapped execution directly from QSPI flash, and implement basic wear-levelling awareness when using NOR flash for configuration storage.

Technology