Back to Technology

STM32 Part 6: SPI Protocol

March 31, 2026 Wasil Zafar 25 min read

From bit-order and clock phase theory through DMA full-duplex transfers to a reusable multi-device SPI bus manager — everything you need to connect sensors, displays, and flash chips to STM32 via SPI.

Table of Contents

  1. SPI Protocol Fundamentals
  2. HAL SPI API
  3. SPI DMA Mode
  4. Multi-Device SPI Bus
  5. Sensor Driver: W25Q128 Flash
  6. SPI vs I2C vs UART
  7. Exercises
  8. SPI Design Tool
  9. Conclusion & Next Steps
Series Overview: This is Part 6 of our 18-part STM32 Unleashed series. In this part we master SPI in polling, interrupt, and DMA modes, then build a reusable multi-device bus manager and a complete W25Q128 NOR flash driver.

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 6
1
Architecture & CubeMX Setup
STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project
Completed
2
GPIO & Button Debounce
GPIO modes, pull-up/down, EXTI, software debounce, HAL_GPIO_ReadPin
Completed
3
UART Communication
Polling, interrupt, DMA modes, printf retargeting, ring buffers
Completed
4
Timers, PWM & Input Capture
TIM basics, PWM generation, input capture, encoder mode
Completed
5
ADC & DAC
Single/continuous conversion, DMA, injected channels, DAC waveforms
Completed
6
SPI Protocol
SPI master/slave, full-duplex, DMA transfers, sensor drivers
You Are Here
7
I2C Protocol
I2C master, 7/10-bit addressing, DMA, multi-master, error handling
8
DMA & Memory Efficiency
DMA streams, circular mode, memory-to-memory, zero-copy patterns
9
Interrupt Management & NVIC
Priority grouping, preemption, ISR design, HAL callbacks, latency
10
Low-Power Modes
Sleep, Stop, Standby modes, RTC wakeup, LP UART, power profiling
11
RTC & Calendar
RTC configuration, alarms, backup registers, calendar subseconds
12
CAN Bus
FDCAN/bxCAN, filters, message frames, error handling, automotive use
13
USB CDC Virtual COM Port
USB FS/HS, CDC class, virtual serial, control transfers, descriptors
14
FreeRTOS Integration
Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing
15
Bootloader Development
Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app
16
External Storage: SD & QSPI Flash
FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling
17
Ethernet & TCP/IP Stack
LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors
18
Production Readiness
Watchdog, HardFault handler, flash option bytes, code signing, CI/CD

SPI Protocol Fundamentals

The Serial Peripheral Interface (SPI) is a synchronous, full-duplex serial communication protocol developed by Motorola in the 1980s. Unlike UART (no clock line) or I2C (open-drain, shared clock and data), SPI uses a dedicated clock driven by the master and separate data lines for transmit and receive. This makes SPI inherently faster and simpler at the hardware level, at the cost of requiring more pins and a dedicated chip-select line per slave device.

SPI is the protocol of choice for high-speed peripherals: SPI NOR flash (W25Q128 at up to 80 MHz), TFT displays (ST7789 at up to 62.5 MHz), ADCs (ADS8865 at up to 50 MHz), and RF transceivers (CC1101 at up to 6.5 MHz). Understanding its timing model precisely is non-negotiable for reliable driver development.

4-Wire SPI: SCLK, MOSI, MISO, NSS

The classic SPI bus has four signals, each with a defined role:

  • SCLK (Serial Clock) — always driven by the master. Slaves use it to sample incoming data and to clock out their own data. The frequency is set by the master and must not exceed the slave's maximum rated clock speed.
  • MOSI (Master Out Slave In) — data driven by the master and received by the selected slave. On STM32, this maps to the SPI TX pin.
  • MISO (Master In Slave Out) — data driven by the selected slave and received by the master. On STM32, this maps to the SPI RX pin. In a half-duplex or simplex scenario this line may be absent.
  • NSS / CS (Chip Select) — active-low signal driven by the master to select a specific slave. When NSS is asserted low, the addressed slave drives MISO and accepts MOSI data; all unselected slaves tri-state their MISO output.
Multiple Slaves: With N slaves sharing SCLK, MOSI, and MISO, you need N individual CS lines. The STM32 hardware NSS pin can only manage one CS automatically; for multi-device buses, use software NSS (GPIO outputs) for every device, including the first.

Clock Polarity (CPOL) and Phase (CPHA) — 4 Modes

SPI defines two clock parameters that together determine when data is valid relative to clock edges. Getting these wrong is the single most common cause of SPI communication failure.

  • CPOL (Clock Polarity) defines the idle (inactive) state of SCLK. CPOL=0 means SCLK idles low; CPOL=1 means SCLK idles high.
  • CPHA (Clock Phase) defines which edge latches data. CPHA=0 means data is captured on the first (leading) clock edge; CPHA=1 means data is captured on the second (trailing) clock edge.
SPI Mode CPOL CPHA SCLK Idle State Data Captured On Common Devices
Mode 0 0 0 Low Rising edge W25Q128 flash, MPU-9250, SD cards, ADS1118
Mode 1 0 1 Low Falling edge MAX7219 LED driver, some ADCs
Mode 2 1 0 High Falling edge Some Atmel AVR as slave, nRF24L01+
Mode 3 1 1 High Rising edge ADXL345 accelerometer, MAX31855 thermocouple, MCP4921 DAC

The rule of thumb: always check the slave device datasheet timing diagram first. Look for CPOL and CPHA values, or look at the clock waveform showing which edge transitions data. If a datasheet shows "clock idles high, data valid on rising edge" — that is CPOL=1, CPHA=1 → Mode 3.

NSS Hardware vs Software Management

STM32 SPI peripheral offers two NSS modes. Hardware NSS (SSM=0): the peripheral automatically drives the NSS pin low when transmitting and releases it high on completion — useful for single-slave scenarios. Software NSS (SSM=1, SSI=1): the NSS pin is treated as a regular GPIO; you drive it manually with HAL_GPIO_WritePin() before and after each transaction. Software NSS is almost always the right choice for multi-device buses or when precise CS timing control is needed.

In CubeMX: set NSS Signal Type to Software. Configure the CS GPIO as an output push-pull, initially high (deasserted). Toggle it manually in your driver code surrounding each SPI transfer.

/* ---------------------------------------------------------------
 * SPI1 Init — Mode 0 (CPOL=0, CPHA=0), 10 MHz, Software NSS
 * Target: STM32F4, SPI1 on APB2 (84 MHz) → Prescaler /8 = 10.5 MHz
 * --------------------------------------------------------------- */
SPI_HandleTypeDef hspi1;

static void MX_SPI1_Init(void)
{
    hspi1.Instance               = SPI1;
    hspi1.Init.Mode              = SPI_MODE_MASTER;
    hspi1.Init.Direction         = SPI_DIRECTION_2LINES;   /* full-duplex */
    hspi1.Init.DataSize          = SPI_DATASIZE_8BIT;
    hspi1.Init.CLKPolarity       = SPI_POLARITY_LOW;       /* CPOL = 0   */
    hspi1.Init.CLKPhase          = SPI_PHASE_1EDGE;        /* CPHA = 0   */
    hspi1.Init.NSS               = SPI_NSS_SOFT;           /* software   */
    hspi1.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_8;/* ~10.5 MHz  */
    hspi1.Init.FirstBit          = SPI_FIRSTBIT_MSB;
    hspi1.Init.TIMode            = SPI_TIMODE_DISABLE;
    hspi1.Init.CRCCalculation    = SPI_CRCCALCULATION_DISABLE;
    hspi1.Init.CRCPolynomial     = 10;

    if (HAL_SPI_Init(&hspi1) != HAL_OK) {
        Error_Handler();
    }
}

HAL SPI API

STM32 HAL provides three transfer models for SPI: polling (blocking, CPU stalls until transfer complete), interrupt (non-blocking, CPU free during transfer, completion signalled via callback), and DMA (non-blocking, DMA engine moves data, CPU completely free). Each model has HAL functions for transmit-only, receive-only, and full-duplex transmit+receive.

Polling Mode

Polling mode is the simplest — call the function, it blocks until the transfer is complete (or a timeout fires), then returns. Use it for low-throughput, latency-insensitive operations or during driver bring-up.

  • HAL_SPI_Transmit(&hspi1, pTxData, Size, Timeout) — sends Size bytes from pTxData. MISO is ignored.
  • HAL_SPI_Receive(&hspi1, pRxData, Size, Timeout) — clocks out dummy bytes on MOSI while capturing MISO into pRxData.
  • HAL_SPI_TransmitReceive(&hspi1, pTxData, pRxData, Size, Timeout) — true full-duplex: sends from pTxData while simultaneously receiving into pRxData.

Interrupt Mode and Callbacks

Interrupt variants append _IT to the function name. They return HAL_OK immediately; the HAL SPI ISR runs in the background and calls a callback when done:

  • HAL_SPI_TxCpltCallback(SPI_HandleTypeDef *hspi) — transmit-only complete.
  • HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) — full-duplex complete.
  • HAL_SPI_ErrorCallback(SPI_HandleTypeDef *hspi) — error occurred; check HAL_SPI_GetError(hspi).

The code below demonstrates the most common SPI pattern in embedded firmware: a JEDEC ID read from a W25Q128 NOR flash. The JEDEC command (0x9F) returns three bytes: manufacturer ID, memory type, and capacity. This is the canonical "hello world" for any SPI NOR flash driver.

/* ---------------------------------------------------------------
 * Read JEDEC ID from W25Q128 NOR flash via SPI polling
 * W25Q128: Mode 0 or Mode 3, up to 80 MHz clock
 * Expected response: Manufacturer=0xEF, Type=0x40, Capacity=0x18
 * CS_PIN: PA4 configured as GPIO output, pull-up, idle HIGH
 * --------------------------------------------------------------- */
#define W25Q_CS_GPIO_PORT   GPIOA
#define W25Q_CS_PIN         GPIO_PIN_4
#define W25Q_CS_LOW()       HAL_GPIO_WritePin(W25Q_CS_GPIO_PORT, W25Q_CS_PIN, GPIO_PIN_RESET)
#define W25Q_CS_HIGH()      HAL_GPIO_WritePin(W25Q_CS_GPIO_PORT, W25Q_CS_PIN, GPIO_PIN_SET)

#define W25Q_CMD_JEDEC_ID   0x9F

typedef struct {
    uint8_t manufacturer;   /* 0xEF for Winbond */
    uint8_t mem_type;       /* 0x40 for W25Q series */
    uint8_t capacity;       /* 0x18 = 16 MB (128 Mbit) */
} W25Q_JEDEC_t;

HAL_StatusTypeDef w25q_read_jedec_id(SPI_HandleTypeDef *hspi, W25Q_JEDEC_t *jedec)
{
    uint8_t cmd    = W25Q_CMD_JEDEC_ID;
    uint8_t rx[3]  = {0};
    HAL_StatusTypeDef ret;

    W25Q_CS_LOW();

    /* Send command byte */
    ret = HAL_SPI_Transmit(hspi, &cmd, 1, HAL_MAX_DELAY);
    if (ret != HAL_OK) {
        W25Q_CS_HIGH();
        return ret;
    }

    /* Receive 3-byte ID response */
    ret = HAL_SPI_Receive(hspi, rx, 3, HAL_MAX_DELAY);
    W25Q_CS_HIGH();

    if (ret == HAL_OK) {
        jedec->manufacturer = rx[0];
        jedec->mem_type     = rx[1];
        jedec->capacity     = rx[2];
    }
    return ret;
}

SPI DMA Mode

DMA-driven SPI is indispensable for bandwidth-intensive peripherals. Consider an ST7789 TFT display at 320×240 pixels with 16-bit colour depth: a full-frame push is 320 × 240 × 2 = 153,600 bytes. At 40 MHz SPI, that takes roughly 3.84 ms — but with polling, the CPU is entirely blocked for those 3.84 ms. At 60 fps you would spend 23% of CPU time just pushing pixels. DMA eliminates this waste entirely.

The key insight is that HAL_SPI_TransmitReceive_DMA() configures the DMA controller to read from the TX buffer and write to the RX buffer simultaneously, fires the SPI peripheral, and returns immediately to the caller. The DMA generates an interrupt on completion, which the HAL uses to invoke your callback — at which point you deassert CS and signal your application layer.

Critical Rule: Never deassert CS (drive it high) before the DMA completion callback fires. Deasserting early truncates the transfer. The callback is the only safe place to release CS in DMA mode.

The double-buffering (ping-pong) pattern below is the professional approach for continuous streaming. While DMA is pushing buffer A to the display, the CPU is rendering the next frame into buffer B. When the DMA callback fires, we swap pointers — zero CPU stall, zero frame tearing.

/* ---------------------------------------------------------------
 * DMA SPI transmit to ST7789 TFT — double-buffer ping-pong
 * SPI2, DMA1 Stream4 (TX), DMA1 Stream3 (RX dummy)
 * Two framebuffers: 320x240x2 = 153,600 bytes each in SRAM
 * ---------------------------------------------------------------
 * Requires: SPI2 + DMA configured in CubeMX,
 *           HAL_SPI_TxCpltCallback weak override below
 * --------------------------------------------------------------- */
#define LCD_WIDTH   320
#define LCD_HEIGHT  240
#define FB_SIZE     (LCD_WIDTH * LCD_HEIGHT * 2)  /* 16-bit colour */

static uint8_t framebuf_A[FB_SIZE];
static uint8_t framebuf_B[FB_SIZE];

static volatile uint8_t *active_fb  = framebuf_A;  /* DMA is sending this */
static volatile uint8_t *render_fb  = framebuf_B;  /* CPU renders into this */
static volatile uint8_t  dma_busy   = 0;

#define LCD_CS_LOW()   HAL_GPIO_WritePin(GPIOB, GPIO_PIN_12, GPIO_PIN_RESET)
#define LCD_CS_HIGH()  HAL_GPIO_WritePin(GPIOB, GPIO_PIN_12, GPIO_PIN_SET)
#define LCD_DC_DATA()  HAL_GPIO_WritePin(GPIOB, GPIO_PIN_13, GPIO_PIN_SET)

/* Call this after rendering a frame into render_fb */
void lcd_flush_frame(SPI_HandleTypeDef *hspi)
{
    while (dma_busy) { /* wait for previous DMA to finish */ }

    /* Swap buffers */
    volatile uint8_t *tmp = active_fb;
    active_fb  = render_fb;
    render_fb  = tmp;

    dma_busy = 1;
    LCD_DC_DATA();
    LCD_CS_LOW();

    /* Launch DMA transfer — returns immediately */
    HAL_SPI_Transmit_DMA(hspi, (uint8_t *)active_fb, FB_SIZE);
}

/* DMA completion callback — deassert CS here, NOT before */
void HAL_SPI_TxCpltCallback(SPI_HandleTypeDef *hspi)
{
    if (hspi->Instance == SPI2) {
        LCD_CS_HIGH();
        dma_busy = 0;
        /* Optionally signal a FreeRTOS semaphore here */
    }
}

For full-duplex DMA (e.g. reading sensor data while simultaneously sending a command), use HAL_SPI_TransmitReceive_DMA() and override HAL_SPI_TxRxCpltCallback() instead. Ensure both TX and RX DMA streams are configured in CubeMX.

Multi-Device SPI Bus

Real applications routinely attach 3–6 devices to a single SPI bus: a NOR flash, an accelerometer, a DAC, an LCD, and an RF transceiver, each with its own CS pin. The bus manager's job is to ensure exactly one CS is asserted at any time, that CS setup/hold timing requirements are met, and that in an RTOS environment, concurrent access attempts from different tasks are serialised through a mutex.

CS timing requirements: most SPI devices require a minimum CS setup time (t_CS) of 5–20 ns before the first SCLK edge, and a minimum hold time after the last SCLK edge (t_CH) of 5–10 ns. At low SPI speeds (a few MHz) these constraints are easily satisfied by the GPIO write itself. At very high speeds (40+ MHz), you may need to insert a single NOP between CS assert and the HAL_SPI call.

In FreeRTOS, the SPI bus is a shared resource. The canonical pattern is a binary semaphore or mutex that any task must acquire before touching the SPI bus and must release immediately after deasserting CS. This prevents one task from interleaving its SPI bytes with another task's transaction.

/* ---------------------------------------------------------------
 * Multi-device SPI bus manager
 * Supports up to 8 devices sharing one SPI peripheral
 * Thread-safe: uses a FreeRTOS mutex for bus arbitration
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "semphr.h"

typedef struct {
    SPI_HandleTypeDef *hspi;      /* SPI peripheral handle         */
    GPIO_TypeDef      *cs_port;   /* GPIO port for chip select     */
    uint16_t           cs_pin;    /* GPIO pin mask for chip select */
    uint32_t           max_speed; /* max SCLK in Hz (for logging)  */
    const char        *name;      /* device name string            */
} spi_device_t;

static SemaphoreHandle_t spi_bus_mutex = NULL;

/* Initialise the bus manager — call once before any transfers */
void spi_bus_init(void)
{
    spi_bus_mutex = xSemaphoreCreateMutex();
    configASSERT(spi_bus_mutex != NULL);
}

/* Thread-safe SPI full-duplex transfer for a specific device.
 * pTx and pRx may point to the same buffer for in-place operations.
 * Returns HAL_OK on success, HAL_TIMEOUT if bus is not free in time. */
HAL_StatusTypeDef spi_transfer(spi_device_t *dev,
                                const uint8_t *pTx,
                                uint8_t       *pRx,
                                uint16_t       len,
                                uint32_t       timeout_ms)
{
    if (xSemaphoreTake(spi_bus_mutex,
                       pdMS_TO_TICKS(timeout_ms)) != pdTRUE) {
        return HAL_TIMEOUT;
    }

    HAL_GPIO_WritePin(dev->cs_port, dev->cs_pin, GPIO_PIN_RESET); /* CS low  */
    __NOP(); __NOP();   /* ≥5 ns setup time at 168 MHz core */

    HAL_StatusTypeDef ret =
        HAL_SPI_TransmitReceive(dev->hspi,
                                (uint8_t *)pTx, pRx, len,
                                HAL_MAX_DELAY);

    __NOP(); __NOP();   /* hold time before CS release */
    HAL_GPIO_WritePin(dev->cs_port, dev->cs_pin, GPIO_PIN_SET);   /* CS high */

    xSemaphoreGive(spi_bus_mutex);
    return ret;
}

Sensor Driver Example — W25Q128 NOR Flash

The W25Q128FV is a 128 Mbit (16 MB) SPI NOR flash that supports Mode 0 and Mode 3, up to 80 MHz dual/quad SPI. It is ubiquitous in embedded designs for firmware storage, logging, and configuration data. Understanding its command set is the foundation for any SPI flash driver.

Key register commands: 0x9F (JEDEC ID), 0x05 (Read Status Register-1), 0x06 (Write Enable — must precede every program or erase), 0x02 (Page Program — write up to 256 bytes), 0x20 (Sector Erase — erase 4 KB at an address), 0x03 (Read Data — read any length). All address arguments are 24-bit (3 bytes, MSB first).

The status register BUSY bit (bit 0 of SR1) must be polled after any write or erase operation — the W25Q128 drives this bit high during internal operations and clears it when complete. Attempting another command while BUSY=1 is silently ignored.

/* ---------------------------------------------------------------
 * W25Q128 NOR Flash Driver — core operations
 * Assumes: SPI handle hspi1, CS on GPIOA pin 4 (W25Q_CS_LOW/HIGH macros)
 * --------------------------------------------------------------- */
#define W25Q_CMD_READ_SR1    0x05
#define W25Q_CMD_WRITE_EN    0x06
#define W25Q_CMD_READ_DATA   0x03
#define W25Q_CMD_PAGE_PROG   0x02
#define W25Q_CMD_SECT_ERASE  0x20
#define W25Q_SR1_BUSY_MASK   0x01
#define W25Q_PAGE_SIZE       256U

/* Poll BUSY bit — returns when flash is idle */
static void w25q_wait_ready(SPI_HandleTypeDef *hspi)
{
    uint8_t cmd = W25Q_CMD_READ_SR1;
    uint8_t sr1 = 0;
    do {
        W25Q_CS_LOW();
        HAL_SPI_Transmit(hspi, &cmd, 1, HAL_MAX_DELAY);
        HAL_SPI_Receive(hspi,  &sr1, 1, HAL_MAX_DELAY);
        W25Q_CS_HIGH();
    } while (sr1 & W25Q_SR1_BUSY_MASK);
}

/* Read len bytes from 24-bit flash address into buf */
HAL_StatusTypeDef w25q_read_data(SPI_HandleTypeDef *hspi,
                                  uint32_t addr, uint8_t *buf, uint32_t len)
{
    uint8_t cmd[4] = {
        W25Q_CMD_READ_DATA,
        (addr >> 16) & 0xFF,
        (addr >>  8) & 0xFF,
        (addr      ) & 0xFF
    };
    W25Q_CS_LOW();
    HAL_SPI_Transmit(hspi, cmd, 4, HAL_MAX_DELAY);
    HAL_StatusTypeDef ret = HAL_SPI_Receive(hspi, buf, len, HAL_MAX_DELAY);
    W25Q_CS_HIGH();
    return ret;
}

/* Program one page (up to 256 bytes) — address must be page-aligned */
HAL_StatusTypeDef w25q_page_program(SPI_HandleTypeDef *hspi,
                                     uint32_t addr,
                                     const uint8_t *data, uint16_t len)
{
    if (len > W25Q_PAGE_SIZE) return HAL_ERROR;

    /* Issue Write Enable before every program operation */
    uint8_t wen = W25Q_CMD_WRITE_EN;
    W25Q_CS_LOW();
    HAL_SPI_Transmit(hspi, &wen, 1, HAL_MAX_DELAY);
    W25Q_CS_HIGH();

    /* Send Page Program command + 24-bit address + data payload */
    uint8_t hdr[4] = {
        W25Q_CMD_PAGE_PROG,
        (addr >> 16) & 0xFF,
        (addr >>  8) & 0xFF,
        (addr      ) & 0xFF
    };
    W25Q_CS_LOW();
    HAL_SPI_Transmit(hspi, hdr,  4,   HAL_MAX_DELAY);
    HAL_SPI_Transmit(hspi, (uint8_t *)data, len, HAL_MAX_DELAY);
    W25Q_CS_HIGH();

    w25q_wait_ready(hspi);   /* typically 0.4–3 ms for page program */
    return HAL_OK;
}

SPI vs I2C vs UART — When to Choose

Every experienced embedded engineer asks the same question when picking up a new sensor or memory chip: which protocol should I use? The answer depends on speed requirements, board layout constraints, available pins, and how many devices need to coexist. The table below consolidates the key trade-offs.

Protocol Typical Speed Wires Multi-device Full-duplex Distance Typical Use
SPI Up to 80+ MHz 4 (+ 1 CS per device) CS per device Yes Short (<30 cm) Flash, displays, ADC/DAC, RF ICs
I2C 100 kHz–1 MHz 2 (SDA + SCL) 7/10-bit address No (half-duplex) Short (<1 m) Sensors, EEPROMs, PMICs, RTC
UART Up to ~15 Mbit/s 2 (TX + RX) Point-to-point only Yes Up to 15 m (RS-232) GPS modules, BT/WiFi modules, PC debug
CAN Up to 1 Mbit/s (5 Mbit/s FDCAN) 2 differential Up to 110 nodes No Up to 1 km Automotive, industrial fieldbus
1-Wire 15.4 kbit/s 1 (+ GND) ROM-based addressing No Up to 100 m DS18B20 temperature sensors

Choose SPI when: you need maximum throughput (flash, displays, high-speed ADCs), when you have sufficient GPIO pins for CS lines, when the peripheral is physically close on the PCB, and when full-duplex operation simplifies your protocol.

Choose I2C when: you have many low-speed sensors sharing a 2-wire bus, pin count is critical, and the devices support addressable I2C (most sensors do). I2C's ACK mechanism also makes it easier to detect communication failures.

Avoid SPI when: you have more than 5–6 devices (CS proliferation becomes unmanageable), when long cable runs introduce capacitance that distorts the clock edge, or when the slave device requires a specific I2C or UART interface only.

Practical Decision Guide: In 80% of embedded sensor designs, SPI handles the fast peripherals (flash, display, RF) and I2C handles the slow sensors (IMU, barometer, temperature, RTC). UART handles external module communication (GPS, WiFi, BT). These three protocols together cover the vast majority of embedded interface needs.

Exercises

Exercise 1 Beginner

W25Q NOR Flash JEDEC ID and Page Verify

Connect a W25Q64 or W25Q128 SPI NOR flash to your STM32 development board. Configure SPI1 in Mode 0 (CPOL=0, CPHA=0) at 8 MHz with software NSS. Read the JEDEC ID using the 0x9F command and verify the manufacturer byte (0xEF for Winbond). Read the Status Register-1 and verify the BUSY bit is clear. Write a 256-byte page at address 0x000000 (after erasing the sector with command 0x20) and read it back, performing a byte-for-byte comparison to confirm data integrity. Print all results over UART at 115200 baud.

SPI W25Q Flash JEDEC ID Page Program
Exercise 2 Intermediate

DMA SPI Display Refresh with Framebuffer

Drive an SPI TFT display (ST7789 240×240 or ILI9341 320×240) over DMA SPI. Allocate a 16-bit colour framebuffer in SRAM. Implement an lcd_fill_rect(x, y, w, h, colour) function that writes to the framebuffer, and a lcd_flush() function that kicks off a DMA transfer. Achieve at least 30 frames per second refresh by measuring the time between flush calls with a GPIO toggle and oscilloscope. Draw a moving rectangle that scrolls horizontally and verify no visible tearing occurs during the update.

DMA SPI ST7789 Framebuffer Double Buffer
Exercise 3 Advanced

3-Device SPI Bus Manager with FreeRTOS Mutex

Build a 3-device SPI bus manager with the spi_device_t structure and spi_transfer() wrapper function shown in Section 4. Attach three devices to one SPI bus: a W25Q NOR flash, an ADXL345 accelerometer (Mode 3), and an MCP4921 12-bit DAC (Mode 0). Create three FreeRTOS tasks, each responsible for one device, all running at 100 Hz. Each task must acquire the bus mutex before transferring. Verify correct operation by checking each device's response register contents. Use a logic analyser or GPIO toggles to confirm that no two CS lines are ever asserted simultaneously under simultaneous task scheduling pressure.

FreeRTOS SPI Mutex Multi-device Bus Logic Analyser

SPI Bus Design Tool

Use this tool to document your STM32 SPI configuration — peripheral instance, clock mode, transfer method, connected devices, and design notes. Download as Word, Excel, PDF, or PPTX for project documentation or hardware review.

STM32 SPI Bus Design Generator

Document your SPI peripheral configuration and connected device list. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

In this part we have built a complete SPI skill set from the ground up:

  • The 4-wire SPI bus (SCLK, MOSI, MISO, NSS) and its full-duplex capability make it the fastest on-board serial protocol available — ideal for flash, displays, high-speed ADCs, and RF transceivers.
  • CPOL and CPHA define four clock modes. Always read the slave datasheet's timing diagram to determine the correct mode — a mismatch produces corrupted data that can look valid, making this a notoriously tricky bug.
  • HAL polling mode is sufficient for low-throughput, initialisation-time operations. HAL interrupt mode frees the CPU for short transactions. DMA mode is essential for any bulk data transfer — displays, flash page reads, streaming ADC data.
  • A multi-device SPI bus manager with per-device CS control and an RTOS mutex is the professional pattern for any design with more than one SPI slave.
  • The W25Q128 driver pattern (JEDEC ID read, status polling, write-enable before program/erase) applies to virtually every SPI NOR flash device on the market.

Next in the Series

In Part 7: I2C Protocol, we master the 2-wire I2C bus — address scanning, register read/write with HAL_I2C_Mem_Read, DMA I2C for high-rate sensor fusion, I2C multiplexers (TCA9548A) for devices with fixed addresses, and robust bus hang recovery through hardware bit-banging. We will build a complete BMP280 pressure and temperature driver using only 2 wires.

Technology