STM32 Architecture & CubeMX Setup — STM32 Unleashed Series Part 1

                        
                        Series Overview: This is Part 1 of our 18-part STM32 Unleashed series. We journey from architectural fundamentals through professional HAL driver development — covering GPIO, UART, timers, ADC, SPI, I2C, DMA, interrupts, low-power modes, FreeRTOS, bootloaders, and production readiness.
                    

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 1

1

Architecture & CubeMX Setup

STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project

You Are Here

2

The STM32 Family

STMicroelectronics' STM32 portfolio is one of the most widely deployed families of 32-bit microcontrollers on the planet. From the ultra-low-power STM32L0 running at 32 MHz to the 550 MHz STM32H7 with dual-core Cortex-M7/M4, there is an STM32 for virtually every embedded application. The common thread — and the key to your productivity — is STM32Cube ecosystem: a unified HAL, middleware stack, code generation tool (CubeMX), and IDE (CubeIDE) that span the entire family.

Before writing any code, you need to understand what you're actually targeting. The STM32 is not a single chip — it is a family architecture, and the decisions you make when selecting a product line and configuring its clocks will affect every peripheral driver you write for the rest of the project.

Product Lines & Naming

STM32 part numbers follow a consistent pattern: STM32[family][sub-family][pin count][flash size][package]. Understanding this nomenclature lets you decode any STM32 datasheet immediately.

Family	Core	Max Freq	Key Strength	Typical Use Cases
F0	Cortex-M0	48 MHz	Ultra-low cost	Simple I/O, LED control, keyboard
F1	Cortex-M3	72 MHz	Widely available, legacy	Blue Pill boards, hobbyist projects
F3	Cortex-M4F	72 MHz	FPU + motor control timers	Motor drives, power conversion
F4	Cortex-M4F	180 MHz	Performance + peripherals	Audio, imaging, communications hub
F7	Cortex-M7	216 MHz	High performance, L1 cache	HMI, video, high-speed DSP
H7	Cortex-M7 (+M4)	550 MHz	Flagship performance	Industrial control, AI inference
G0	Cortex-M0+	64 MHz	Value + efficiency	Consumer electronics, SMPS
G4	Cortex-M4F	170 MHz	Motor + mixed signal	BLDC drives, inverters, UPS
L0/L1	Cortex-M0+/M3	32 MHz	Ultra-low power	Battery IoT nodes, meters, tags
L4/L5	Cortex-M4F/M33	120 MHz	Low power + performance	Wearables, portable instruments
U5	Cortex-M33	160 MHz	PSA Level 3 security	Connected IoT, payment terminals
WB/WL	Cortex-M4+M0+	64 MHz	Integrated BLE/802.15.4	Wireless sensors, Thread, Zigbee

                        
                        Selection Rule of Thumb: Start with the STM32F4 (specifically the F401 or F411) for learning — it's well-documented, affordable, has an FPU, runs fast enough for any tutorial project, and its HAL patterns transfer directly to every other STM32 family. Graduate to the G4, H7, or U5 when your application demands it.
                    

Cortex-M Core Selection

The Cortex-M core inside your STM32 determines which instructions you can use, whether you have hardware floating-point, and how the memory protection unit (MPU) behaves. This is not academic — it affects your linker script, compiler flags, and runtime library selection.

Core	ISA	FPU	DSP SIMD	TrustZone	GCC -mcpu flag
Cortex-M0/M0+	ARMv6-M	No	No	No	`cortex-m0plus`
Cortex-M3	ARMv7-M	No	No	No	`cortex-m3`
Cortex-M4	ARMv7E-M	Optional	Yes	No	`cortex-m4 -mfpu=fpv4-sp-d16`
Cortex-M7	ARMv7E-M	DP FPU	Yes	No	`cortex-m7 -mfpu=fpv5-d16`
Cortex-M33	ARMv8-M Main	Optional	Yes	Yes	`cortex-m33 -mfpu=fpv5-sp-d16`

Memory Layout & Bus Architecture

Every STM32 shares the ARM Cortex-M fixed memory map, with vendor-specific peripheral placement on top. Understanding this map is critical for writing linker scripts and diagnosing address faults.

/* STM32F407 Representative Memory Map */

/* Code Region */
0x00000000 - 0x1FFFFFFF   /* Code (aliased to Flash or SRAM) */
0x08000000 - 0x080FFFFF   /* Flash memory (1 MB on F407VG) */
0x1FFF0000 - 0x1FFF77FF   /* System memory (ST bootloader ROM) */
0x1FFF7800 - 0x1FFF7A0F   /* OTP (One-Time Programmable) area */

/* SRAM Region */
0x20000000 - 0x2001BFFF   /* SRAM1 (112 KB) */
0x2001C000 - 0x2001FFFF   /* SRAM2 (16 KB) */
0x10000000 - 0x1000FFFF   /* CCM data RAM (64 KB, CPU-only, no DMA) */

/* Peripheral Region */
0x40000000 - 0x400233FF   /* APB1 peripherals (USART2, SPI2, I2C1...) */
0x40010000 - 0x40014BFF   /* APB2 peripherals (USART1, SPI1, ADC...) */
0x40020000 - 0x400223FF   /* AHB1 peripherals (GPIO, DMA, RCC, CRC) */
0x50000000 - 0x50060BFF   /* AHB2 peripherals (USB OTG FS, DCMI) */
0x60000000 - 0xDFFFFFFF   /* FMC (external SDRAM, NOR, NAND) */

/* System Region */
0xE0000000 - 0xE00FFFFF   /* Cortex-M system (NVIC, SysTick, DWT, ITM) */

                        
                        CCM RAM Warning: The Core Coupled Memory (CCM) on STM32F4 is connected directly to the CPU data bus — not to the AHB bus matrix. This means DMA cannot access CCM RAM. Placing DMA buffers in CCM is a common source of silent data corruption. Always put DMA buffers in SRAM1.
                    

The AHB (Advanced High-performance Bus) matrix is the backbone of the STM32 interconnect. Multiple bus masters — the CPU, DMA1, DMA2, and Ethernet (on F4) — connect to bus slaves (Flash, SRAM, APB bridges) through this matrix. Understanding which master can access which slave at what bandwidth is essential for high-throughput DMA design.

Clock System Deep Dive

The clock system is the heart of any STM32 project. Every peripheral — UART baud rate, SPI bit rate, timer tick, ADC sampling frequency — derives its clock from a source you configure. Getting the clock tree wrong produces subtle bugs: baud rates off by exactly 2×, timers ticking too fast, ADC conversions taking longer than expected.

HSI, HSE, LSI, LSE

STM32 devices offer multiple clock sources, each with different accuracy, power consumption, and startup time characteristics:

Source	Typical Freq	Accuracy	Power	Startup	Typical Use
HSI	8–64 MHz (family)	±1–2%	Low	~2 µs	Default boot source, no crystal needed
HSE	4–26 MHz (crystal)	±20–50 ppm	Medium	~2 ms	PLL source for max system clock accuracy
LSI	32 kHz (nominal)	±30–50%	Very low	~40 µs	IWDG, rough RTC (unreliable for timekeeping)
LSE	32.768 kHz (crystal)	±20 ppm	Very low	~200 ms	RTC, calendar, low-power wakeup

PLL Configuration

The Phase-Locked Loop multiplies a low-frequency input (HSI or HSE) to produce the high-frequency SYSCLK. On the STM32F4, the PLL is configured with three parameters: M (input divider), N (multiplier), and P (output divider for SYSCLK). Two additional outputs, Q (USB/SDIO/RNG) and R (some families), add flexibility.

The golden rule: the PLL VCO (voltage-controlled oscillator) must run between 100 MHz and 432 MHz on the F4. Work backwards from your target SYSCLK:

/* PLL calculation for STM32F4, HSE = 8 MHz, target SYSCLK = 168 MHz */
/*
 *  VCO_in  = HSE / M  = 8 MHz / 8  = 1 MHz   (must be 1–2 MHz)
 *  VCO_out = VCO_in * N = 1 MHz * 336 = 336 MHz (must be 100–432 MHz)
 *  SYSCLK  = VCO_out / P = 336 MHz / 2 = 168 MHz
 *  USB/SDIO = VCO_out / Q = 336 MHz / 7 = 48 MHz (exact 48 MHz required!)
 */

RCC_OscInitTypeDef osc = {0};
osc.OscillatorType    = RCC_OSCILLATORTYPE_HSE;
osc.HSEState          = RCC_HSE_ON;
osc.PLL.PLLState      = RCC_PLL_ON;
osc.PLL.PLLSource     = RCC_PLLSOURCE_HSE;
osc.PLL.PLLM          = 8;
osc.PLL.PLLN          = 336;
osc.PLL.PLLP          = RCC_PLLP_DIV2;
osc.PLL.PLLQ          = 7;
HAL_RCC_OscConfig(&osc);

RCC_ClkInitTypeDef clk = {0};
clk.ClockType      = RCC_CLOCKTYPE_SYSCLK | RCC_CLOCKTYPE_HCLK |
                     RCC_CLOCKTYPE_PCLK1  | RCC_CLOCKTYPE_PCLK2;
clk.SYSCLKSource   = RCC_SYSCLKSOURCE_PLLCLK;
clk.AHBCLKDivider  = RCC_SYSCLK_DIV1;   /* HCLK  = 168 MHz */
clk.APB1CLKDivider = RCC_HCLK_DIV4;     /* PCLK1 =  42 MHz (max 42 MHz) */
clk.APB2CLKDivider = RCC_HCLK_DIV2;     /* PCLK2 =  84 MHz (max 84 MHz) */
/* Flash latency MUST be set before increasing SYSCLK */
HAL_RCC_ClockConfig(&clk, FLASH_LATENCY_5);

                        
                        Flash Latency: At 168 MHz with 3.3V supply, the STM32F4 Flash requires 5 wait states (FLASH_LATENCY_5). If you increase SYSCLK without setting the correct wait states first, the CPU will fetch corrupt instructions. CubeMX handles this automatically — but if you configure clocks manually, always set __HAL_FLASH_SET_LATENCY() before calling HAL_RCC_ClockConfig().
                    

AHB, APB1 & APB2 Prescalers

After SYSCLK is established, the clock tree splits into bus clocks through prescalers. Getting these wrong affects every peripheral you configure:

HCLK (AHB clock) — drives the CPU, memory, DMA, and GPIO. Always equal to SYSCLK on high-performance STM32 families.
PCLK1 (APB1 clock) — drives USART2–5, SPI2–3, I2C1–3, basic timers (TIM2–7, TIM12–14). Maximum 42 MHz on F4.
PCLK2 (APB2 clock) — drives USART1, SPI1, ADC1–3, advanced timers (TIM1, TIM8–11). Maximum 84 MHz on F4.

                        
                        Timer Clock Multiplier: When the APB prescaler is not 1, the timer input clock is doubled by the hardware. So with PCLK1 = 42 MHz (APB1 prescaler = 4), TIM2's clock source is 84 MHz — not 42 MHz. This catches many developers off guard when calculating timer periods.
                    

HAL vs LL vs Bare-Metal

STM32 gives you three distinct levels of hardware abstraction, and professional developers choose between them deliberately — not by default. Each level has a different trade-off between portability, performance, and code size.

HAL Architecture

The STM32 HAL (Hardware Abstraction Layer) is ST's high-level driver framework. Every HAL function follows predictable naming: HAL_[Peripheral]_[Action](handle, ...). HAL drivers maintain state in handle structures (UART_HandleTypeDef, SPI_HandleTypeDef, etc.) that persist across calls, enabling interrupt and DMA modes without global variables.

/* HAL UART transmit — polling mode */
UART_HandleTypeDef huart2;

/* Initialise (typically generated by CubeMX) */
huart2.Instance        = USART2;
huart2.Init.BaudRate   = 115200;
huart2.Init.WordLength = UART_WORDLENGTH_8B;
huart2.Init.StopBits   = UART_STOPBITS_1;
huart2.Init.Parity     = UART_PARITY_NONE;
huart2.Init.Mode       = UART_MODE_TX_RX;
huart2.Init.HwFlowCtl  = UART_HWCONTROL_NONE;
HAL_UART_Init(&huart2);

/* Transmit with timeout */
uint8_t msg[] = "Hello STM32\r\n";
HAL_UART_Transmit(&huart2, msg, sizeof(msg)-1, HAL_MAX_DELAY);

/* Non-blocking interrupt mode — returns immediately */
HAL_UART_Transmit_IT(&huart2, msg, sizeof(msg)-1);
/* Completion signalled via HAL_UART_TxCpltCallback() */

HAL advantages: readable code, easy portability between STM32 families, full interrupt and DMA support via callbacks, and direct CubeMX code generation. HAL disadvantage: overhead. A HAL GPIO write takes ~10 cycles; a direct register write takes 1 cycle. For GPIO bit-banging or time-critical ISRs, HAL is too slow.

Low-Layer (LL) Drivers

LL drivers are ST's thin wrapper around registers — inline functions that map directly to peripheral register operations with almost zero overhead. They give you register-level speed with slightly better readability than raw register access, and they're still generated by CubeMX.

/* LL GPIO — set PA5 high, then low */
LL_GPIO_SetOutputPin(GPIOA, LL_GPIO_PIN_5);   /* ~1 cycle */
LL_GPIO_ResetOutputPin(GPIOA, LL_GPIO_PIN_5); /* ~1 cycle */

/* LL UART — wait for TX empty, then write */
while (!LL_USART_IsActiveFlag_TXE(USART2)) {}
LL_USART_TransmitData8(USART2, 'A');

Direct Register Access

Bare-metal register access uses CMSIS device headers directly — no HAL, no LL. You access peripheral registers through structs defined in the device header (e.g., stm32f407xx.h). This is the fastest possible approach, but requires you to read every register description in the reference manual.

/* Direct register access — enable GPIOA clock, configure PA5 output */
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;    /* Enable GPIOA clock */

/* Clear MODE bits for PA5, set to output (01) */
GPIOA->MODER &= ~(GPIO_MODER_MODER5);
GPIOA->MODER |=  (GPIO_MODER_MODER5_0); /* Output mode */

/* Set output type to push-pull (default), speed to high */
GPIOA->OTYPER  &= ~GPIO_OTYPER_OT5;     /* Push-pull */
GPIOA->OSPEEDR |=  GPIO_OSPEEDR_OSPEED5; /* High speed */

/* Toggle using Bit Set/Reset Register (atomic operation) */
GPIOA->BSRR = GPIO_BSRR_BS5;  /* Set PA5 */
GPIOA->BSRR = GPIO_BSRR_BR5;  /* Reset PA5 */

Choosing the Right Abstraction

Scenario	Recommended Level	Reason
Peripheral initialisation (UART, SPI, I2C)	HAL or CubeMX	Complex init sequences, portability
DMA transfers, interrupt-driven comms	HAL	Callback infrastructure, handle state management
Time-critical ISR GPIO toggling	LL or bare-metal	Single-cycle throughput required
Bit-bang protocols (WS2812, 1-Wire)	Bare-metal registers	Cycle-accurate timing needed
Clock and power configuration	HAL (CubeMX generated)	Complex sequencing, flash latency, voltage scaling
New developer, learning project	HAL	Readable, well-documented, CubeMX support
Production firmware, code size critical	LL + selective bare-metal	Smallest binary, predictable performance

CubeMX & CubeIDE Deep Dive

STM32CubeMX is the graphical configuration tool that generates initialisation code from your peripheral and clock settings. STM32CubeIDE is the Eclipse-based IDE that wraps CubeMX, the GCC toolchain, and an OpenOCD/ST-Link debug interface into a single environment. For most STM32 work, this is your primary development environment.

CubeMX Code Generation Workflow

The CubeMX workflow follows a consistent pattern regardless of which peripheral you're configuring:

Select target MCU — choose exact part number (e.g., STM32F407VGTx). This loads the correct pin count, flash/SRAM sizes, and available peripherals.
Pinout & Configuration — assign peripherals to pins using the graphical pinout view. CubeMX enforces alternate function constraints.
Clock Configuration — use the clock tree diagram to configure PLL, bus prescalers, and peripheral clocks. CubeMX validates that no maximum frequency is exceeded.
Project Manager — set project name, IDE (CubeIDE, Keil, IAR), and code generation options. The "Generate peripheral initialization as a pair of .c/.h files" option keeps generated code modular.
Generate Code — CubeMX writes main.c, stm32f4xx_hal_msp.c, peripheral init files, and the linker script. Your user code goes between /* USER CODE BEGIN */ and /* USER CODE END */ markers — everything outside these markers is overwritten on the next generation.

                        
                        USER CODE Markers: CubeMX regenerates everything outside the user code markers. If you add code outside these sections, it will be deleted on the next "Generate Code". Always put your application logic, includes, and variables inside the markers. CubeIDE syntax-highlights these sections differently to remind you.
                    

Configuring the Clock Tree

The CubeMX clock tree is a graphical representation of the RCC (Reset and Clock Control) register settings. You can either type in your target frequencies and let CubeMX solve the PLL parameters, or manually set M/N/P/Q values. CubeMX will highlight any violated constraint in red.

For the STM32F407 running at maximum speed, the optimal clock tree configuration is:

# CubeMX Clock Configuration Summary (STM32F407, HSE=8MHz, max speed)
# Clock source: HSE (8 MHz crystal)
# PLL: M=8, N=336, P=2, Q=7
# SYSCLK  = 168 MHz
# HCLK    = 168 MHz (AHB  prescaler = 1)
# PCLK1   =  42 MHz (APB1 prescaler = 4)  → TIM2/3 input = 84 MHz
# PCLK2   =  84 MHz (APB2 prescaler = 2)  → TIM1/8 input = 168 MHz
# USB/OTG =  48 MHz (PLL Q = 7)           → exact 48 MHz for USB
# Flash latency: 5 wait states at 168 MHz, 3.3V

CubeMX Pitfalls & Best Practices

CubeMX accelerates development but introduces pitfalls that catch beginners and experienced developers alike:

Pitfall	Symptom	Fix
Code outside USER CODE markers	Custom code disappears after regeneration	Always use `/* USER CODE BEGIN */` blocks
HSE not enabled in oscillator config	Falls back to HSI, wrong baud rates	Enable HSE and verify RCC_OscInitTypeDef.HSEState
DMA buffer in CCM RAM	DMA transfer completes but data is wrong	Use `__attribute__((section(".sram1")))` or place in default SRAM
Forgetting NVIC priority group	Nested interrupts behave unexpectedly	Set HAL_NVIC_SetPriorityGrouping() once in SystemClock_Config()
GPIO alternate function not assigned	SPI/UART pin outputs nothing	Check GPIO_InitTypeDef.Alternate matches peripheral
SysTick used by both HAL and FreeRTOS	HAL timeouts work incorrectly under RTOS	Move HAL timebase to TIM6 when using FreeRTOS

Build System & Toolchain

CubeIDE manages the build internally, but professional STM32 development increasingly uses command-line build systems for CI/CD pipelines, reproducible builds, and editor freedom. Understanding the toolchain lets you build STM32 firmware from any machine without CubeIDE installed.

arm-none-eabi-gcc Setup

The ARM GNU toolchain (arm-none-eabi-gcc) is the standard open-source compiler for bare-metal ARM targets. "none" means no OS, "eabi" means Embedded ABI. Install it from the ARM Developer website or your package manager:

# Ubuntu/Debian
sudo apt install gcc-arm-none-eabi binutils-arm-none-eabi

# macOS (Homebrew)
brew install --cask gcc-arm-embedded

# Verify installation
arm-none-eabi-gcc --version
# arm-none-eabi-gcc (GNU Arm Embedded Toolchain 12.2) 12.2.1 20221205

# Essential flags for STM32F407 (Cortex-M4F, hard FP)
CFLAGS = -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=hard
CFLAGS += -DSTM32F407xx -DUSE_HAL_DRIVER
CFLAGS += -O2 -ffunction-sections -fdata-sections -Wall
LDFLAGS = -T STM32F407VGTx_FLASH.ld -Wl,--gc-sections -Wl,-Map=output.map

Makefile vs CMake

CubeMX generates a Makefile by default for command-line builds. This is functional but not scalable for large projects. CMake has become the preferred build system for professional STM32 work, especially when integrating multiple libraries, unit testing, and CI pipelines.

# Minimal CMake toolchain file for STM32F407 (arm-none-eabi.cmake)
set(CMAKE_SYSTEM_NAME Generic)
set(CMAKE_SYSTEM_PROCESSOR ARM)

set(CMAKE_C_COMPILER arm-none-eabi-gcc)
set(CMAKE_CXX_COMPILER arm-none-eabi-g++)
set(CMAKE_ASM_COMPILER arm-none-eabi-gcc)
set(CMAKE_OBJCOPY arm-none-eabi-objcopy)
set(CMAKE_SIZE arm-none-eabi-size)

set(CPU_FLAGS "-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=hard")
set(CMAKE_C_FLAGS "${CPU_FLAGS} -ffunction-sections -fdata-sections -Wall" CACHE STRING "")
set(CMAKE_EXE_LINKER_FLAGS "${CPU_FLAGS} -Wl,--gc-sections -specs=nano.specs" CACHE STRING "")

OpenOCD & ST-Link

OpenOCD (Open On-Chip Debugger) is the open-source debug server that connects your development machine to the STM32 via ST-Link or J-Link. CubeIDE uses it transparently, but you can also drive it directly for scripted flashing in CI pipelines:

# Flash firmware via OpenOCD (ST-Link V2)
openocd -f interface/stlink.cfg \
        -f target/stm32f4x.cfg \
        -c "program build/firmware.elf verify reset exit"

# Start debug server (GDB connects on port 3333)
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg

# In another terminal, start GDB session
arm-none-eabi-gdb build/firmware.elf
(gdb) target remote :3333
(gdb) monitor reset halt
(gdb) load
(gdb) continue

First HAL Project: Blink

The canonical blink example reveals more about the STM32 than its simplicity suggests. Let's implement it three ways — with HAL, with LL, and with direct register access — and compare what the generated assembly looks like.

HAL Initialisation Sequence

Every CubeMX-generated main.c follows the same initialisation sequence. Understanding this sequence prevents you from calling HAL functions before the hardware is ready:

int main(void)
{
    /* 1. HAL_Init: configures SysTick for 1 ms timebase,
     *    sets NVIC priority grouping, initialises Flash prefetch */
    HAL_Init();

    /* 2. SystemClock_Config: configures HSE, PLL, bus prescalers,
     *    flash latency. Generated entirely by CubeMX. */
    SystemClock_Config();

    /* 3. Peripheral MX_xxx_Init functions: configure each peripheral
     *    in the order CubeMX generates them. */
    MX_GPIO_Init();
    MX_USART2_UART_Init();

    /* 4. User application loop */
    while (1)
    {
        /* USER CODE BEGIN WHILE */
        HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
        HAL_Delay(500);
        /* USER CODE END WHILE */
    }
}

Blink with HAL_GPIO_TogglePin

The HAL GPIO configuration generated by CubeMX for a standard Nucleo-F401RE board (LED on PA5):

/* gpio.c — generated by CubeMX, lives in MX_GPIO_Init() */
static void MX_GPIO_Init(void)
{
    GPIO_InitTypeDef GPIO_InitStruct = {0};

    /* Enable GPIOA clock */
    __HAL_RCC_GPIOA_CLK_ENABLE();

    /* Set PA5 output level low (LED off initially) */
    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_RESET);

    /* Configure PA5 as push-pull output, no pull, medium speed */
    GPIO_InitStruct.Pin   = GPIO_PIN_5;
    GPIO_InitStruct.Mode  = GPIO_MODE_OUTPUT_PP;
    GPIO_InitStruct.Pull  = GPIO_NOPULL;
    GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
    HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
}

/* In main() while(1) loop */
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
HAL_Delay(500); /* 500 ms, blocks on SysTick */

Same Blink, Register-Level

Here is the identical blink written with direct register access. This version compiles to roughly 40 bytes of Flash versus ~2 KB for the HAL version (including HAL library overhead):

#include "stm32f4xx.h"  /* CMSIS device header — all register definitions */

static volatile uint32_t tick;

void SysTick_Handler(void)
{
    tick++;
}

static void delay_ms(uint32_t ms)
{
    uint32_t start = tick;
    while ((tick - start) < ms) {}
}

int main(void)
{
    /* Configure SysTick: 16 MHz HSI / 16000 = 1 kHz (1 ms period) */
    SysTick_Config(16000000U / 1000U);

    /* Enable GPIOA peripheral clock on AHB1 */
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
    (void)RCC->AHB1ENR; /* dummy read — wait for clock to propagate */

    /* Configure PA5: output, push-pull, no pull, medium speed */
    GPIOA->MODER   = (GPIOA->MODER & ~GPIO_MODER_MODER5) | (1u << 10);
    GPIOA->OTYPER  &= ~GPIO_OTYPER_OT5;
    GPIOA->OSPEEDR &= ~GPIO_OSPEEDR_OSPEED5;
    GPIOA->PUPDR   &= ~GPIO_PUPDR_PUPD5;

    for (;;)
    {
        GPIOA->BSRR = GPIO_BSRR_BS5;   /* Atomic set PA5 high */
        delay_ms(500);
        GPIOA->BSRR = GPIO_BSRR_BR5;   /* Atomic reset PA5 low */
        delay_ms(500);
    }
}

                        
                        BSRR is Atomic: The GPIO Bit Set/Reset Register (BSRR) is a write-only, 32-bit register. The upper 16 bits reset pins, the lower 16 bits set pins. Because it's a single write, the operation is inherently atomic — no read-modify-write cycle that could be interrupted by an ISR. Always prefer BSRR over ODR for thread-safe GPIO manipulation.
                    

Exercises

Exercise 1 Beginner

CubeMX Clock Tree Exploration

Open CubeMX (or STM32CubeIDE's Device Configuration tool) and create a new project for any STM32F4 device. Navigate to the Clock Configuration tab. Configure the system to use HSE (8 MHz) with PLL to achieve: (a) 168 MHz SYSCLK, (b) exact 48 MHz for USB. Observe the flash latency setting CubeMX selects automatically and verify the APB timer clock multiplier rule applies to TIM2.

CubeMX Clock Tree PLL Configuration

Exercise 2 Intermediate

Compare HAL vs Register-Level Code Size

Create two CubeIDE projects targeting the same STM32F4 MCU. In Project A, implement a 500 ms blink using HAL_GPIO_TogglePin and HAL_Delay (full HAL enabled). In Project B, implement the same blink using only CMSIS register access with a SysTick delay. Compile both at -O2 and compare: (a) total Flash usage from the .map file, (b) the generated assembly for the toggle operation, (c) worst-case SysTick interrupt latency in each approach.

HAL Register Access Code Size Assembly

Exercise 3 Advanced

Command-Line Build Without CubeIDE

Take the CubeMX-generated Makefile project from Exercise 2. Build it entirely from the command line using arm-none-eabi-gcc and make. Then convert the Makefile to a CMake project: write a CMakeLists.txt and a toolchain file, add the HAL sources, CMSIS headers, and linker script. Flash the resulting .elf using OpenOCD or ST-Link CLI without opening CubeIDE. Verify the LED blinks at the correct 500 ms period by measuring the GPIO toggle with a logic analyser or oscilloscope.

CMake Makefile OpenOCD CI/CD Ready

STM32 Project Configuration

Use this tool to document your STM32 project configuration — target MCU, clock tree settings, development environment, and toolchain choices. Download as Word, Excel, PDF, or PPTX for project documentation or team onboarding.

STM32 Project Configuration Generator

Document your STM32 project setup — MCU selection, clock configuration, and toolchain. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Project Name *

STM32 Series *

Cortex-M Core

Development Environment

Flash & SRAM Sizes

Clock Configuration

Peripherals Used

Project Goals & Requirements

Author Name

Conclusion & Next Steps

In this opening article we have established the foundation every STM32 developer needs:

The STM32 family spans over a dozen product lines — F0 through H7, G0/G4, L0–U5, WB/WL — all sharing the ARM Cortex-M architecture and the STM32Cube HAL ecosystem, enabling skills transfer across families.
The clock system is the most critical configuration step: HSI/HSE → PLL → SYSCLK → AHB/APB bus clocks determine the performance and accuracy of every peripheral. Misconfigured clocks produce subtle, hard-to-diagnose bugs.
HAL, LL, and bare-metal register access each have their place. HAL for most application code, LL for performance-critical paths, direct registers for cycle-accurate timing and minimal footprint.
CubeMX accelerates development dramatically but requires discipline: always use USER CODE markers, verify clock settings, and be aware of DMA/CCM RAM incompatibility.
A professional STM32 toolchain is arm-none-eabi-gcc + CMake + OpenOCD — reproducible, CI/CD friendly, and IDE-agnostic.

Next in the Series

In Part 2: GPIO & Button Debounce, we'll master every GPIO mode (input, output, alternate function, analog), implement a reliable software debounce algorithm for mechanical buttons, configure External Interrupts (EXTI) for edge-triggered events, and build a state machine that drives an LED pattern from button inputs — with all three abstraction levels compared.

Cookie Consent

Cookie Preferences

STM32 Part 1: Architecture & CubeMX Setup

Table of Contents

STM32 Unleashed: HAL Driver Development

Architecture & CubeMX Setup

GPIO & Button Debounce

UART Communication

Timers, PWM & Input Capture

ADC & DAC

SPI Protocol

I2C Protocol

DMA & Memory Efficiency

Interrupt Management & NVIC

Low-Power Modes

RTC & Calendar

CAN Bus

USB CDC Virtual COM Port

FreeRTOS Integration

Bootloader Development

External Storage: SD & QSPI Flash

Ethernet & TCP/IP Stack

Production Readiness