Back to Technology

STM32 Part 16: External Storage — SD Card & QSPI Flash

March 31, 2026 Wasil Zafar 29 min read

When internal flash isn't enough — connect an SD card for gigabytes of file storage using FATFS, add a QSPI NOR flash chip for XIP (execute-in-place) capability, and implement wear-levelled data logging for production data recorders.

Table of Contents

  1. External Storage Options
  2. SD Card with SDMMC & FATFS
  3. FATFS File Operations
  4. QSPI NOR Flash
  5. Memory-Mapped (XIP) Mode
  6. Wear-Levelling for Data Logging
  7. FatFS on QSPI Flash
  8. Exercises
  9. Storage Configuration Tool
  10. Conclusion & Next Steps
Series Overview: This is Part 16 of our 18-part STM32 Unleashed series. We now tackle the full spectrum of external storage — SD cards, QSPI NOR flash, wear levelling, and memory-mapped execution — skills that are indispensable for data loggers, firmware update systems, and resource-rich embedded applications.

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 16
1
Architecture & CubeMX Setup
STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project
Completed
2
GPIO & Button Debounce
GPIO modes, pull-up/down, EXTI, software debounce, HAL_GPIO_ReadPin
Completed
3
UART Communication
Polling, interrupt, DMA modes, printf retargeting, ring buffers
Completed
4
Timers, PWM & Input Capture
TIM basics, PWM generation, input capture, encoder mode
Completed
5
ADC & DAC
Single/continuous conversion, DMA, injected channels, DAC waveforms
Completed
6
SPI Protocol
SPI master/slave, full-duplex, DMA transfers, sensor drivers
Completed
7
I2C Protocol
I2C master, 7/10-bit addressing, DMA, multi-master, error handling
Completed
8
DMA & Memory Efficiency
DMA streams, circular mode, memory-to-memory, zero-copy patterns
Completed
9
Interrupt Management & NVIC
Priority grouping, preemption, ISR design, HAL callbacks, latency
Completed
10
Low-Power Modes
Sleep, Stop, Standby modes, RTC wakeup, LP UART, power profiling
Completed
11
RTC & Calendar
RTC configuration, alarms, backup registers, calendar subseconds
Completed
12
CAN Bus
FDCAN/bxCAN, filters, message frames, error handling, automotive use
Completed
13
USB CDC Virtual COM Port
USB FS/HS, CDC class, virtual serial, control transfers, descriptors
Completed
14
FreeRTOS Integration
Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing
Completed
15
Bootloader Development
Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app
Completed
16
External Storage: SD & QSPI Flash
FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling
You Are Here
17
Ethernet & TCP/IP Stack
LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors
18
Production Readiness
Watchdog, HardFault handler, flash option bytes, code signing, CI/CD

External Storage Options for STM32

Every serious embedded application eventually outgrows the internal flash of its microcontroller. Whether you need to store megabytes of calibration data, log thousands of sensor readings per second, hold large audio or font assets, or execute code that simply won't fit in the MCU's internal flash, you need external storage. The STM32 family supports a rich set of external storage interfaces — each with different capacity, speed, endurance, and cost trade-offs.

Understanding the full spectrum before committing to a design is critical. The wrong choice leads to either wasted cost (EEPROM used for logging) or catastrophic wear-out (NOR flash used as a swap partition). Here is the complete landscape:

  • SD/microSD card (1 GB–2 TB): Removable, FAT32/exFAT filesystem, accessible from a PC out of the box. The STM32 SDMMC peripheral drives the card in 4-bit wide mode at 25 or 50 MHz. Alternatively, any SPI master can drive an SD card in 1-bit SPI mode, though throughput is significantly reduced.
  • SPI NOR Flash (W25Q series, 1 MB–256 MB): Byte-addressable reads, page (256-byte) writes, sector (4 KB) erases. Typical endurance is 100,000 erase cycles per sector. The Winbond W25Q128JV (16 MB) is the most common choice for embedded applications worldwide.
  • SPI NAND Flash: Higher density than NOR at lower cost per megabyte. Block-erase only (128 KB blocks), requires a wear-levelling layer and bad-block management. Not byte-addressable — unsuitable for XIP without an MMU.
  • QSPI/OctoSPI NOR Flash: The same NOR flash cell accessed over a 4-bit (QSPI) or 8-bit (OctoSPI) parallel interface. The STM32 QUADSPI peripheral supports memory-mapped mode, allowing the CPU to read flash at 0x90000000 as if it were normal memory — enabling execute-in-place (XIP) for code and read-only data.
  • EEPROM (I2C, SPI): Very small (typically 1 KB–512 KB), byte or word writable without erase, effectively unlimited write endurance (1,000,000+ cycles). Ideal for configuration data, calibration coefficients, and non-volatile counters — not for bulk data logging.
Storage Type Capacity Interface Write Unit Erase Unit Endurance XIP? Relative Cost
microSD (SDMMC) 1 GB–2 TB SDMMC 4-bit 512-byte sector Cluster Managed internally No Low
microSD (SPI) 1 GB–2 TB SPI 1-bit 512-byte sector Cluster Managed internally No Low
SPI NOR Flash 1 MB–256 MB SPI up to 104 MHz Page (256 B) Sector (4 KB) 100k cycles/sector No (standard SPI) Medium
QSPI NOR Flash 1 MB–256 MB QUADSPI 4-bit Page (256 B) Sector (4 KB) 100k cycles/sector Yes Medium
OctoSPI NOR Flash Up to 512 MB OctoSPI 8-bit Page (256 B) Sector (4 KB) 100k cycles/sector Yes High
SPI NAND Flash 128 MB–8 GB SPI up to 120 MHz Page (2 KB) Block (128 KB) 100k cycles/block No Low
I2C EEPROM 1 KB–512 KB I2C up to 1 MHz Byte None (byte-erase) 1M+ cycles No Low

The code below shows the handle type declarations for both SDMMC and QSPI — the two interfaces we will use throughout this article:

/* ================================================================
 * External Storage Handle Declarations
 * SDMMC1 for microSD (4-bit, 25 MHz)
 * QUADSPI for W25Q128JV NOR Flash (quad mode, 80 MHz)
 * ================================================================ */

#include "stm32f4xx_hal.h"
#include "stm32f4xx_hal_sd.h"
#include "stm32f4xx_hal_qspi.h"

/* SDMMC handle — initialised by MX_SDIO_SD_Init() */
SD_HandleTypeDef hsd;

/* QUADSPI handle — initialised by MX_QUADSPI_Init() */
QSPI_HandleTypeDef hqspi;

/* CubeMX-generated SDMMC init (4-bit, 25 MHz, card type auto-detect) */
void MX_SDIO_SD_Init(void)
{
    hsd.Instance                 = SDIO;
    hsd.Init.ClockEdge           = SDIO_CLOCK_EDGE_RISING;
    hsd.Init.ClockBypass         = SDIO_CLOCK_BYPASS_DISABLE;
    hsd.Init.ClockPowerSave      = SDIO_CLOCK_POWER_SAVE_DISABLE;
    hsd.Init.BusWide             = SDIO_BUS_WIDE_4B;
    hsd.Init.HardwareFlowControl = SDIO_HARDWARE_FLOW_CONTROL_DISABLE;
    hsd.Init.ClockDiv            = 2;   /* 48 MHz SDIO_CK / (2+2) = 12 MHz */
    HAL_SD_Init(&hsd);
    HAL_SD_ConfigWideBusOperation(&hsd, SDIO_BUS_WIDE_4B);
}

/* CubeMX-generated QUADSPI init for W25Q128JV (16 MB, 3-byte addr) */
void MX_QUADSPI_Init(void)
{
    hqspi.Instance               = QUADSPI;
    hqspi.Init.ClockPrescaler    = 1;     /* 168 MHz AHB / (1+1) = 84 MHz */
    hqspi.Init.FifoThreshold     = 4;
    hqspi.Init.SampleShifting    = QSPI_SAMPLE_SHIFTING_HALFCYCLE;
    hqspi.Init.FlashSize         = 23;    /* 2^(23+1) = 16 MB */
    hqspi.Init.ChipSelectHighTime = QSPI_CS_HIGH_TIME_2_CYCLE;
    hqspi.Init.ClockMode         = QSPI_CLOCK_MODE_0;
    hqspi.Init.FlashID           = QSPI_FLASH_ID_1;
    hqspi.Init.DualFlash         = QSPI_DUALFLASH_DISABLE;
    HAL_QSPI_Init(&hqspi);
}

SD Card with SDMMC & FATFS

The STM32 SDMMC (or SDIO on F4) peripheral implements the SD Host Controller Specification directly in hardware. When configured for 4-bit wide bus operation at 25 MHz, it delivers approximately 12.5 MB/s raw throughput — more than enough for audio recording, high-rate sensor logging, or firmware update image transfer. Compare this to the SPI SD card mode, which at 10 MHz SPI clock delivers roughly 1.25 MB/s in 1-bit mode.

The FATFS middleware, included in STM32Cube, sits between your application and the block-level hardware driver. FATFS was written by ChaN and is a de-facto standard for embedded FAT/exFAT implementations. STM32CubeMX generates the necessary diskio.c glue layer that connects FATFS to the HAL SD driver.

CubeMX Setup

  1. In the Pinout & Configuration view, enable SDIO (F4) or SDMMC1 (F7/H7).
  2. Set Bus Width to 4 bits and the clock divider to achieve 25 MHz or lower.
  3. Under Middleware → FATFS, enable it and select SD Card as the physical drive.
  4. In FATFS configuration, set _USE_LFN to 1 (heap) for long filename support, _VOLUMES to 1, and _FS_TINY to 0 for best performance.
  5. Enable DMA for SDIO/SDMMC (DMA2 Stream 3/6 for F4) to avoid CPU-blocking block transfers.

FATFS Core API

The essential FATFS functions your application will call:

  • f_mount(&fs, "0:", 1) — mount the filesystem on logical drive 0. The second argument is the path, the third forces immediate mounting.
  • f_open(&fil, "path/file.txt", mode) — open or create a file. Returns FR_OK on success.
  • f_write(&fil, buf, len, &bw) — write len bytes from buf; bw receives actual bytes written.
  • f_read(&fil, buf, len, &br) — read up to len bytes into buf.
  • f_close(&fil) — flush write buffers and release the file object.
  • f_mkdir("logs") — create a directory.
  • f_sync(&fil) — flush without closing, for periodic safety-flush during long logging sessions.
/* ================================================================
 * SD Card Init via FATFS + Write 10 sensor readings to CSV
 * Requires: FATFS middleware enabled, SDMMC DMA configured
 * ================================================================ */

#include "fatfs.h"
#include <stdio.h>
#include <string.h>

FATFS   SDFatFS;      /* FATFS work area for logical drive 0 */
FIL     SDFile;       /* File object */
FRESULT fres;         /* FATFS return code */

void sd_write_sensor_data(void)
{
    char    line[64];
    UINT    bytes_written;
    uint32_t tick_start;

    /* Mount the filesystem — forces immediate card detection */
    fres = f_mount(&SDFatFS, (TCHAR const*)"0:", 1);
    if (fres != FR_OK) {
        printf("f_mount failed: %d\r\n", fres);
        return;
    }

    /* Create (or overwrite) data.csv on the SD card root */
    fres = f_open(&SDFile, "0:/data.csv",
                  FA_CREATE_ALWAYS | FA_WRITE);
    if (fres != FR_OK) {
        printf("f_open failed: %d\r\n", fres);
        f_mount(NULL, "0:", 0);
        return;
    }

    /* Write CSV header */
    const char *header = "tick_ms,adc_raw,voltage_mV\r\n";
    f_write(&SDFile, header, strlen(header), &bytes_written);

    /* Append 10 sensor readings */
    for (int i = 0; i < 10; i++) {
        uint32_t tick   = HAL_GetTick();
        uint32_t adc    = HAL_ADC_GetValue(&hadc1);
        uint32_t mv     = (adc * 3300UL) / 4095UL;

        snprintf(line, sizeof(line),
                 "%lu,%lu,%lu\r\n", tick, adc, mv);

        fres = f_write(&SDFile, line, strlen(line), &bytes_written);
        if (fres != FR_OK || bytes_written != strlen(line)) {
            printf("Write error at record %d: %d\r\n", i, fres);
            break;
        }
        HAL_Delay(100);  /* 100 ms between samples */
    }

    /* Close file — flushes and updates directory entry */
    f_close(&SDFile);

    /* Unmount — good practice before power-off */
    f_mount(NULL, "0:", 0);

    printf("SD write complete\r\n");
}
DMA and DCACHE Warning (H7): On STM32H7, the D-cache is enabled by default. DMA operates on physical memory, bypassing the cache. Any buffer used for DMA transfers must be placed in a non-cached memory region (e.g., SRAM4 at 0x38000000) or must be explicitly invalidated after a DMA receive and cleaned before a DMA transmit. Failure to do this causes subtle data corruption that looks like random bit errors in your SD card files.

FATFS File Operations

FATFS exposes a POSIX-like file API. Mastering the open flags, error codes, and random-access functions gives you the flexibility to implement anything from a simple data logger to a structured filesystem application.

Open Flags

The f_open mode argument is a bitfield of the following flags:

  • FA_READ — open for reading. The file must exist.
  • FA_WRITE — open for writing. Can be combined with FA_READ for read/write.
  • FA_CREATE_ALWAYS — create a new file; if it exists, truncate it to zero length.
  • FA_CREATE_NEW — create a new file; fail if it already exists.
  • FA_OPEN_EXISTING — open existing file (default if no create flag given).
  • FA_OPEN_APPEND — open existing file and seek to the end before writing. Equivalent to fopen("a").

Useful Utility Functions

  • f_printf(&fil, fmt, ...) — formatted write, like fprintf. Requires _USE_STRFUNC >= 1.
  • f_lseek(&fil, offset) — move file pointer to absolute byte offset. Use f_size(&fil) to seek to end.
  • f_stat("path", &fno) — get file/directory metadata (size, date, attributes).
  • f_getfree("0:", &fre_clust, &pfs) — query free clusters. Multiply by cluster size to get free bytes.
  • f_unlink("path") — delete a file or empty directory.
  • f_rename("old", "new") — rename/move a file.

FATFS Error Codes

Always check the return value of every FATFS call. Production code must handle all failure modes gracefully:

Error Code Value Meaning Typical Cause & Response
FR_OK 0 Success Operation completed normally
FR_DISK_ERR 1 Hard disk error DMA error, CRC failure — retry once, then unmount
FR_INT_ERR 2 Internal assertion failed FATFS work area corrupted — re-mount
FR_NOT_READY 3 Drive not ready Card removed or not inserted — check CD pin
FR_NO_FILE 4 File not found Wrong path — check directory and filename spelling
FR_NO_PATH 5 Path not found Intermediate directory missing — call f_mkdir first
FR_DENIED 7 Access denied Write-protect tab engaged, or read-only file attribute
FR_NO_SPACE 20 Volume full Not enough free clusters — call f_getfree before logging
/* ================================================================
 * Production Data Logger: RTC timestamp + ADC to CSV
 * Flushes every 10 records; handles FATFS errors with retry
 * ================================================================ */

#include "fatfs.h"
#include "rtc.h"
#include <stdio.h>
#include <string.h>

#define LOG_FLUSH_INTERVAL   10   /* f_sync every N records */
#define LOG_MAX_RETRIES       3

static FATFS  fs;
static FIL    logfile;
static uint32_t record_count = 0;

static FRESULT logger_open(void)
{
    FRESULT fr;
    /* Create "logs" directory if not present */
    fr = f_mkdir("0:/logs");
    if (fr != FR_OK && fr != FR_EXIST) return fr;

    /* Append to existing log or create new one */
    fr = f_open(&logfile, "0:/logs/sensor.csv",
                FA_OPEN_APPEND | FA_WRITE);
    if (fr == FR_NO_FILE) {
        /* File doesn't exist yet — create with header */
        fr = f_open(&logfile, "0:/logs/sensor.csv",
                    FA_CREATE_ALWAYS | FA_WRITE);
        if (fr == FR_OK) {
            f_printf(&logfile,
                     "date,time,adc_raw,voltage_mV,temperature_C\r\n");
        }
    }
    return fr;
}

void logger_append_record(void)
{
    RTC_TimeTypeDef t;
    RTC_DateTypeDef d;
    UINT bw;
    char line[80];
    FRESULT fr;
    int retries = 0;

    HAL_RTC_GetTime(&hrtc, &t, RTC_FORMAT_BIN);
    HAL_RTC_GetDate(&hrtc, &d, RTC_FORMAT_BIN);

    uint32_t adc  = HAL_ADC_GetValue(&hadc1);
    uint32_t mv   = (adc * 3300UL) / 4095UL;
    int32_t  temp = (int32_t)((mv - 760) / 2.5f + 25);

    snprintf(line, sizeof(line),
             "20%02d-%02d-%02d,%02d:%02d:%02d,%lu,%lu,%ld\r\n",
             d.Year, d.Month, d.Date,
             t.Hours, t.Minutes, t.Seconds,
             adc, mv, temp);

retry:
    fr = f_write(&logfile, line, strlen(line), &bw);
    if (fr != FR_OK && retries++ < LOG_MAX_RETRIES) {
        /* Re-mount and reopen on error */
        f_close(&logfile);
        f_mount(NULL, "0:", 0);
        HAL_Delay(20);
        f_mount(&fs, "0:", 1);
        logger_open();
        goto retry;
    }

    record_count++;
    if (record_count % LOG_FLUSH_INTERVAL == 0) {
        f_sync(&logfile);  /* Safety flush — survive power loss */
    }
}

QSPI NOR Flash

The QUADSPI peripheral on STM32F4/F7/H7 provides a 4-bit parallel SPI interface that delivers up to 4x the throughput of standard SPI at the same clock frequency. With a 84 MHz QUADSPI clock and quad I/O mode, the effective read bandwidth reaches ~42 MB/s — sufficient to execute code directly from the flash chip.

The Winbond W25Q128JV is a 16 MB (128 Mbit) NOR flash device in SOIC-8 or WSON-8 package. It is pin-compatible with the entire W25Q family (W25Q16 through W25Q256) and supports all standard SPI commands plus an extended quad I/O mode. Key specifications: 3.3 V supply, 133 MHz max clock, 256-byte page program, 4 KB sector erase, 100,000 erase cycles per sector, data retention 20 years.

Critical W25Q128JV Command Bytes

  • 0x06 — Write Enable (WREN). Must precede every program or erase command.
  • 0x04 — Write Disable (WRDI). Sent automatically after page program completes.
  • 0x05 — Read Status Register 1 (RDSR1). Bit 0 = BUSY — poll until clear after erase.
  • 0x35 — Read Status Register 2. Bit 1 = QE (Quad Enable) — must be set for quad mode.
  • 0x40 — Write Status Register 2 to set QE bit (enable quad I/O).
  • 0x9F — Read JEDEC ID: returns manufacturer (0xEF), memory type (0x40), capacity (0x18 for 16 MB).
  • 0xEB — Fast Read Quad I/O (4-4-4 mode): address and data on all 4 pins.
  • 0x02 — Page Program (1-1-1): up to 256 bytes per call.
  • 0x20 — Sector Erase (4 KB): sets all bits in the addressed sector to 1.
  • 0xD8 — Block Erase (64 KB): erases a 64 KB block in ~1 second.
  • 0xC7 — Chip Erase: erases the entire device (~40 seconds for 16 MB).
/* ================================================================
 * W25Q128JV QSPI Driver — Read JEDEC, Sector Erase,
 * Page Program, and Quad Fast Read
 * ================================================================ */

#define W25Q_CMD_JEDEC_ID     0x9F
#define W25Q_CMD_WRITE_EN     0x06
#define W25Q_CMD_READ_SR1     0x05
#define W25Q_CMD_SECTOR_ERASE 0x20
#define W25Q_CMD_PAGE_PROG    0x02
#define W25Q_CMD_FAST_READ_QIO 0xEB
#define W25Q_TIMEOUT_MS       5000

/* Read JEDEC ID: returns 0x00EF4018 for W25Q128JV */
uint32_t qspi_read_jedec(void)
{
    QSPI_CommandTypeDef cmd = {0};
    uint8_t buf[3];

    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_JEDEC_ID;
    cmd.DataMode        = QSPI_DATA_1_LINE;
    cmd.NbData          = 3;
    cmd.AddressMode     = QSPI_ADDRESS_NONE;
    cmd.DummyCycles     = 0;

    HAL_QSPI_Command(&hqspi, &cmd, W25Q_TIMEOUT_MS);
    HAL_QSPI_Receive(&hqspi, buf, W25Q_TIMEOUT_MS);

    return ((uint32_t)buf[0] << 16) |
           ((uint32_t)buf[1] <<  8) |
            (uint32_t)buf[2];
}

/* Poll BUSY bit in Status Register 1 until clear */
static HAL_StatusTypeDef qspi_wait_idle(uint32_t timeout_ms)
{
    QSPI_CommandTypeDef cmd = {0};
    QSPI_AutoPollingTypeDef cfg = {0};

    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_READ_SR1;
    cmd.DataMode        = QSPI_DATA_1_LINE;

    cfg.Match           = 0x00;   /* BUSY = 0 */
    cfg.Mask            = 0x01;   /* Check bit 0 only */
    cfg.MatchMode       = QSPI_MATCH_MODE_AND;
    cfg.Interval        = 0x10;
    cfg.AutomaticStop   = QSPI_AUTOMATIC_STOP_ENABLE;
    cfg.StatusBytesSize = 1;

    return HAL_QSPI_AutoPolling(&hqspi, &cmd, &cfg, timeout_ms);
}

/* Write Enable — must call before every erase or program */
static void qspi_write_enable(void)
{
    QSPI_CommandTypeDef cmd = {0};
    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_WRITE_EN;
    HAL_QSPI_Command(&hqspi, &cmd, W25Q_TIMEOUT_MS);
}

/* Erase one 4 KB sector at the given 24-bit address */
HAL_StatusTypeDef qspi_sector_erase(uint32_t addr)
{
    QSPI_CommandTypeDef cmd = {0};

    qspi_write_enable();

    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_SECTOR_ERASE;
    cmd.AddressMode     = QSPI_ADDRESS_1_LINE;
    cmd.AddressSize     = QSPI_ADDRESS_24_BITS;
    cmd.Address         = addr & ~0xFFFUL; /* Align to 4 KB */
    cmd.DataMode        = QSPI_DATA_NONE;

    HAL_QSPI_Command(&hqspi, &cmd, W25Q_TIMEOUT_MS);
    return qspi_wait_idle(W25Q_TIMEOUT_MS); /* Erase ~50 ms */
}

/* Program up to 256 bytes at addr (must be erased first) */
HAL_StatusTypeDef qspi_page_program(uint32_t addr,
                                     const uint8_t *data,
                                     uint16_t len)
{
    QSPI_CommandTypeDef cmd = {0};

    if (len == 0 || len > 256) return HAL_ERROR;

    qspi_write_enable();

    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_PAGE_PROG;
    cmd.AddressMode     = QSPI_ADDRESS_1_LINE;
    cmd.AddressSize     = QSPI_ADDRESS_24_BITS;
    cmd.Address         = addr;
    cmd.DataMode        = QSPI_DATA_1_LINE;
    cmd.NbData          = len;

    HAL_QSPI_Command(&hqspi, &cmd, W25Q_TIMEOUT_MS);
    HAL_QSPI_Transmit(&hqspi, (uint8_t*)data, W25Q_TIMEOUT_MS);
    return qspi_wait_idle(W25Q_TIMEOUT_MS); /* Program ~0.7 ms */
}

/* Quad Fast Read — 4-bit data, 2 dummy cycles after address */
HAL_StatusTypeDef qspi_read_data(uint32_t addr,
                                  uint8_t *buf,
                                  uint32_t len)
{
    QSPI_CommandTypeDef cmd = {0};

    cmd.InstructionMode = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction     = W25Q_CMD_FAST_READ_QIO;
    cmd.AddressMode     = QSPI_ADDRESS_4_LINES;
    cmd.AddressSize     = QSPI_ADDRESS_24_BITS;
    cmd.Address         = addr;
    cmd.DataMode        = QSPI_DATA_4_LINES;
    cmd.NbData          = len;
    cmd.DummyCycles     = 4;  /* W25Q128JV requires 4 dummy cycles in QPI */

    HAL_QSPI_Command(&hqspi, &cmd, W25Q_TIMEOUT_MS);
    return HAL_QSPI_Receive(&hqspi, buf, W25Q_TIMEOUT_MS);
}

Memory-Mapped (XIP) Mode

One of the most powerful features of the STM32 QUADSPI peripheral is memory-mapped mode — also called XIP (execute-in-place). In this mode, the CPU accesses the QSPI flash as if it were normal read-only memory, starting at address 0x90000000 on STM32F4/F7 or 0x90000000/0x70000000 on H7. The QUADSPI hardware transparently issues read commands to the flash chip whenever the CPU (or DMA) reads from that address range.

Practical applications of XIP mode:

  • Large lookup tables: Sine/cosine tables for motor control, gamma correction tables for displays, dithering matrices for audio DACs.
  • Font and graphics assets: Bitmap fonts, icons, UI images — particularly useful for colour TFT display projects where the internal flash would be entirely consumed by assets.
  • Audio samples: WAV or raw PCM data for tone generation or speech synthesis.
  • Executable code: The linker can place entire functions or modules in a .text_qspi section and execute them directly from the flash — effectively giving you 16 MB of addressable program memory. Be aware of the fetch latency (a few extra cycles per cache miss) and ensure functions are not called from ISRs where deterministic latency is required.
/* ================================================================
 * Enable QUADSPI Memory-Mapped Mode (XIP)
 * Maps W25Q128JV to 0x90000000 on STM32F4/F7
 * ================================================================ */

#define QSPI_XIP_BASE  0x90000000UL

HAL_StatusTypeDef qspi_enable_memory_mapped(void)
{
    QSPI_CommandTypeDef      cmd  = {0};
    QSPI_MemoryMappedTypeDef cfg  = {0};

    /* Configure the read command used by the memory-mapped engine */
    cmd.InstructionMode   = QSPI_INSTRUCTION_1_LINE;
    cmd.Instruction       = 0xEB;            /* Fast Read Quad I/O */
    cmd.AddressMode       = QSPI_ADDRESS_4_LINES;
    cmd.AddressSize       = QSPI_ADDRESS_24_BITS;
    cmd.AlternateByteMode = QSPI_ALTERNATE_BYTES_4_LINES;
    cmd.AlternateBytes    = 0xFF;            /* Continuous mode pattern */
    cmd.AlternateBytesSize= QSPI_ALTERNATE_BYTES_8_BITS;
    cmd.DummyCycles       = 4;
    cmd.DataMode          = QSPI_DATA_4_LINES;
    cmd.DdrMode           = QSPI_DDR_MODE_DISABLE;
    cmd.SIOOMode          = QSPI_SIOO_INST_ONLY_FIRST_CMD;

    /* No timeout — memory-mapped mode runs indefinitely */
    cfg.TimeOutActivation = QSPI_TIMEOUT_COUNTER_DISABLE;

    HAL_StatusTypeDef ret =
        HAL_QSPI_MemoryMapped(&hqspi, &cmd, &cfg);

    if (ret != HAL_OK) {
        printf("XIP enable failed: %d\r\n", ret);
        return ret;
    }

    printf("XIP active — flash mapped at 0x%08lX\r\n",
           (uint32_t)QSPI_XIP_BASE);
    return HAL_OK;
}

/* Example: read 1 KB of data through the XIP memory map */
void xip_read_demo(void)
{
    /* After qspi_enable_memory_mapped(), just dereference the pointer */
    const uint8_t *flash_ptr = (const uint8_t *)QSPI_XIP_BASE;

    uint32_t checksum = 0;
    for (uint32_t i = 0; i < 1024; i++) {
        checksum += flash_ptr[i];   /* CPU reads, QSPI fetches automatically */
    }

    printf("XIP 1 KB checksum: 0x%08lX\r\n", checksum);

    /* Function pointer example: call a function stored in QSPI flash
     * (function must be placed in .text_qspi section by linker) */
    typedef void (*qspi_func_t)(void);
    qspi_func_t qspi_entry =
        (qspi_func_t)(QSPI_XIP_BASE | 0x00000001UL); /* Thumb bit */
    /* qspi_entry(); -- uncomment if you have code flashed there */
}
XIP and DCache on H7: On STM32H7, if the D-cache is enabled, configure the MPU to mark the QSPI region (0x90000000–0x9FFFFFFF) as Normal, Non-Cacheable or Write-Through to prevent stale cache entries when the flash content is updated. For code execution (I-cache), this is handled automatically by the hardware cache coherency of the Cortex-M7 instruction fetch path.

Wear-Levelling for Data Logging

NOR flash endurance is finite: the W25Q128JV guarantees 100,000 erase cycles per 4 KB sector. For a data logger that appends to a single sector every second, that sector would reach end-of-life in about 27 hours. The solution is circular (round-robin) logging: distribute writes across many sectors so that each sector is erased only once every N records, where N is the number of sectors in your log ring.

With 128 sectors (512 KB log area), each sector experiences only 1/128 of the total erase count. A 100 Hz logger writing 128-byte records would take over 350 years to exhaust any single sector — comfortably within product lifetime.

Sector Header Design

Each sector in the ring carries a small header that allows the firmware to find the current write position after power-cycling, without scanning every byte of flash:

  • Magic word (4 bytes): e.g., 0xDEADC0DE. A blank (erased) sector has 0xFFFFFFFF — instantly distinguishable.
  • Sequence number (4 bytes): Monotonically increasing 32-bit counter. The sector with the highest sequence number in the ring is the most recently written one.
  • Record count (2 bytes): How many records are currently stored in this sector.
  • CRC (2 bytes): CRC-16 over the header — detects a partially written header from a power loss mid-erase.
/* ================================================================
 * Circular QSPI Log — wear-levelled logging with sequence numbers
 * Log area: 128 sectors × 4 KB = 512 KB starting at QSPI addr 0
 * Record size: 64 bytes (fixed)
 * ================================================================ */

#define LOG_SECTOR_SIZE    4096U
#define LOG_SECTOR_COUNT   128U
#define LOG_RECORD_SIZE    64U
#define LOG_MAGIC          0xDEADC0DEUL
#define LOG_HEADER_SIZE    12U  /* magic(4) + seq(4) + reccount(2) + crc(2) */
#define LOG_RECS_PER_SEC   ((LOG_SECTOR_SIZE - LOG_HEADER_SIZE) / LOG_RECORD_SIZE)

typedef struct {
    uint32_t magic;
    uint32_t sequence;
    uint16_t record_count;
    uint16_t header_crc;
} LogSectorHeader;

static uint32_t current_sector = 0;
static uint32_t current_seq    = 0;
static uint16_t current_rec    = 0;

/* Find the sector with the highest sequence number (power-on resume) */
uint32_t circular_log_find_latest(void)
{
    uint32_t best_seq    = 0;
    uint32_t best_sector = 0;
    LogSectorHeader hdr;

    for (uint32_t s = 0; s < LOG_SECTOR_COUNT; s++) {
        uint32_t addr = s * LOG_SECTOR_SIZE;
        qspi_read_data(addr, (uint8_t*)&hdr, sizeof(hdr));

        if (hdr.magic == LOG_MAGIC && hdr.sequence > best_seq) {
            best_seq    = hdr.sequence;
            best_sector = s;
        }
    }

    /* Restore state from found sector */
    uint32_t addr = best_sector * LOG_SECTOR_SIZE;
    qspi_read_data(addr, (uint8_t*)&hdr, sizeof(hdr));
    current_sector = best_sector;
    current_seq    = hdr.sequence;
    current_rec    = hdr.record_count;

    printf("Resume: sector=%lu seq=%lu rec=%u\r\n",
           best_sector, best_seq, current_rec);
    return best_sector;
}

/* Write one 64-byte record to the circular log */
HAL_StatusTypeDef circular_log_write(const uint8_t *record)
{
    /* If current sector is full, advance to next sector */
    if (current_rec >= LOG_RECS_PER_SEC) {
        current_sector = (current_sector + 1) % LOG_SECTOR_COUNT;
        current_seq++;
        current_rec = 0;

        /* Erase new sector */
        uint32_t erase_addr = current_sector * LOG_SECTOR_SIZE;
        if (qspi_sector_erase(erase_addr) != HAL_OK) {
            return HAL_ERROR;
        }

        /* Write sector header */
        LogSectorHeader hdr = {
            .magic        = LOG_MAGIC,
            .sequence     = current_seq,
            .record_count = 0,
            .header_crc   = 0  /* TODO: compute CRC-16 */
        };
        qspi_page_program(erase_addr,
                          (uint8_t*)&hdr, sizeof(hdr));
    }

    /* Calculate address of this record within the sector */
    uint32_t rec_addr = (current_sector * LOG_SECTOR_SIZE)
                      + LOG_HEADER_SIZE
                      + (current_rec * LOG_RECORD_SIZE);

    /* Program the record */
    HAL_StatusTypeDef ret =
        qspi_page_program(rec_addr, record, LOG_RECORD_SIZE);
    if (ret != HAL_OK) return ret;

    /* Update record count in sector header */
    current_rec++;
    uint16_t count = current_rec;
    uint32_t count_addr = (current_sector * LOG_SECTOR_SIZE)
                         + offsetof(LogSectorHeader, record_count);
    /* NOR flash bits can only be cleared (0); incrementing count
     * requires a full sector erase — use a dedicated count page or
     * bitfield scheme in production. This simplified version assumes
     * the count field is written only once per sector. */

    return HAL_OK;
}

FatFS on QSPI Flash

For applications that need a proper filesystem on QSPI NOR flash — rather than the custom circular log above — there are two practical options:

LittleFS (Recommended for NOR Flash)

LittleFS is an open-source embedded filesystem designed specifically for NOR flash by ARM Research. It provides:

  • Built-in wear levelling: Dynamic wear-levelling distributes erasures automatically without any application-level sector management.
  • Power-loss resilience: Copy-on-write (COW) metadata updates guarantee filesystem consistency even on sudden power loss.
  • Small footprint: ~4 KB RAM, configurable block size, works with 4 KB NOR sectors.
  • Simple port: Provide four function pointers (read, prog, erase, sync) pointing to your QSPI driver — that's the entire HAL layer.

The LittleFS port to QSPI requires mapping five function pointers in the lfs_config structure:

  • lfs_cfg.readqspi_read_data()
  • lfs_cfg.progqspi_page_program()
  • lfs_cfg.eraseqspi_sector_erase()
  • lfs_cfg.sync → a no-op for NOR flash (no write buffer to flush)
  • lfs_cfg.read_size / prog_size / block_size = 256 / 256 / 4096

FATFS Diskio Layer on QSPI

If you need FAT compatibility (files readable by a PC after extraction, or compatibility with existing FATFS-based code), you can implement the FATFS diskio layer on top of your QSPI driver. The key consideration is that FATFS does not understand NOR flash erase semantics — a sector write always requires erase-then-program. The disk_write() implementation must:

  1. Read the 4 KB NOR sector into a RAM buffer.
  2. Modify the 512-byte FATFS logical sector within that buffer.
  3. Erase the 4 KB NOR sector.
  4. Re-program all 4 KB from the RAM buffer.

This read-modify-erase-write cycle requires 4 KB of RAM for the sector buffer and inflicts one extra erase per FATFS sector write. For small NOR flash devices used as configuration storage (not bulk logging), this is acceptable. For high-write applications, LittleFS or the custom circular log is preferable.

Choosing the Right Approach: Use SD card + FATFS for large files and PC interoperability. Use LittleFS on QSPI for wear-safe embedded storage. Use the circular log for high-throughput, fixed-size records. Use FATFS on QSPI only when you specifically need FAT compatibility on NOR flash and writes are infrequent.

Exercises

Exercise 1 Beginner

Mount an SD Card and Read/Write a File

Mount a FAT32 microSD card using SDMMC (4-bit mode). Create a file called test.txt, write the string "Hello SD Card!", close the file, re-open it for reading, read it back, and print the contents over UART. Use f_getfree() to display the available space in megabytes. Verify the file is visible on a PC when the card is removed and inserted into a card reader.

FATFS SDMMC f_open f_getfree
Exercise 2 Intermediate

Triggered High-Rate Datalogger

Build a triggered datalogger. On the first button press, start recording ADC readings at 10 kHz to an SD card file (adc_log.bin), using DMA-driven ADC and a double-buffer scheme so that writing to the SD card does not interrupt the ADC stream. On the second button press, stop recording and close the file. Compute and store a CRC32 checksum of the entire file in the last 4 bytes. Verify the file and checksum integrity with a Python script on a PC.

DMA Double Buffer ADC 10 kHz CRC32 Python Verification
Exercise 3 Advanced

Power-Loss–Safe Circular QSPI Log

Connect a W25Q128JV QSPI flash. Implement the wear-levelled circular log from Section 6, extended with: (a) CRC-16 over the sector header to detect partial writes, (b) a sector sequence that wraps correctly at sector 127 → 0, and (c) a verified power-loss test: power-cycle the STM32 mid-write (pull VDD while writing sector header), then verify on restart that no records are missing and the next write continues from the correct position. Log sensor data + RTC timestamp in each 64-byte record. Verify 10,000 records written and read back correctly with zero corruption.

QSPI Wear Levelling CRC-16 Power-Loss Safe

STM32 Storage Configuration Tool

Use this tool to document your external storage design — storage type selection, SD card bus configuration, QSPI chip details, FATFS settings, and wear-levelling strategy. Download as Word, Excel, PDF, or PPTX for project documentation or design review.

STM32 Storage Configuration Generator

Document your external storage design — SD card, QSPI flash, FATFS settings, and wear-levelling strategy. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

In this article we have built a complete toolkit for STM32 external storage:

  • Storage landscape: The seven major external storage options — microSD (SDMMC/SPI), SPI NOR flash, QSPI NOR flash, OctoSPI flash, SPI NAND, and I2C EEPROM — were compared across capacity, interface speed, write/erase granularity, endurance, XIP capability, and cost. Choose based on your application's read/write pattern, not just capacity.
  • SDMMC + FATFS: The SDMMC peripheral with 4-bit DMA gives ~12 MB/s throughput. FATFS provides a robust FAT32/exFAT layer with a POSIX-like API. The combination is ideal for field-deployable data loggers where SD cards are removed for PC-side analysis.
  • FATFS file operations: Mastering open flags, error codes, f_sync() for power-loss safety, and f_getfree() for capacity monitoring transforms your code from a demo into a production-quality logger.
  • QSPI NOR flash driver: The HAL_QSPI_Command / HAL_QSPI_Transmit / HAL_QSPI_Receive trio drives the complete W25Q128JV command set. Auto-polling for the BUSY bit eliminates CPU-blocking spin loops.
  • XIP memory-mapped mode: HAL_QSPI_MemoryMapped() maps 16 MB of NOR flash to the CPU's address space, enabling direct pointer access to lookup tables, font data, audio assets, and even executable code without explicit read commands.
  • Wear-levelled circular log: Distributing erase cycles across N sectors via a sequence-number ring extends NOR flash lifetime from hours to decades. Sector headers with magic words and sequence numbers enable instant power-on resume without a full scan.
  • Filesystem selection: LittleFS is the right choice for a proper filesystem on NOR flash — built-in wear levelling, power-loss safety, and a compact footprint make it the industry standard for embedded non-volatile storage.

Next in the Series

In Part 17: Ethernet & TCP/IP Stack, we will wire up an external Ethernet PHY (LAN8720A), integrate LwIP, configure DHCP, build a UDP telemetry sender, implement a TCP server and HTTP endpoint, and publish sensor data over MQTT — transforming your STM32 into a fully networked embedded node.

Technology