Back to Technology

Embedded Systems Series Part 2: STM32 & ARM Cortex-M Development

January 25, 2026 Wasil Zafar 50 min read

Dive deep into ARM Cortex-M architecture, STM32 peripherals, HAL programming, and hands-on microcontroller development with practical examples.

Table of Contents

  1. ARM Cortex-M Architecture
  2. STM32 Microcontroller Families
  3. Development Environment Setup
  4. HAL Programming Fundamentals
  5. GPIO Configuration & Control
  6. Timers & PWM
  7. UART Communication
  8. ADC & DAC Peripherals
  9. DMA (Direct Memory Access)
  10. Conclusion & Next Steps

ARM Cortex-M Architecture

Series Navigation: This is Part 2 of the 12-part Embedded Systems Series. Start with Part 1: Fundamentals if you haven't already.

The ARM Cortex-M family is the industry-leading architecture for microcontrollers, powering billions of devices from smart watches to industrial controllers. Unlike application processors (Cortex-A), Cortex-M cores are optimized for real-time, low-power embedded applications.

Cortex-M Family Comparison

M0/M0+ M3 M4 M7
FeatureM0/M0+M3M4M7
Pipeline2-stage3-stage3-stage6-stage
DSP InstructionsNoNoYesYes
FPUNoNoOptional (SP)SP + DP
CacheNoNoNoI/D Cache
Max Frequency~48MHz~120MHz~180MHz~400MHz+
Typical UseSimple sensorsGeneral MCUMotor, AudioHigh-perf DSP

Cortex-M Programmer's Model

All Cortex-M cores share a common programmer's model:

  • 16 general-purpose registers: R0-R12 (general), R13 (SP), R14 (LR), R15 (PC)
  • Program Status Registers (xPSR): Flags, exception number, thumb state
  • Thumb-2 instruction set: Mix of 16-bit and 32-bit instructions
  • Two stack pointers: MSP (Main) and PSP (Process) for OS support
  • Two privilege levels: Privileged and Unprivileged (for RTOS)
// Accessing core registers (compiler intrinsics)
#include "cmsis_gcc.h"

uint32_t stack_ptr = __get_MSP();      // Get Main Stack Pointer
uint32_t primask = __get_PRIMASK();    // Get interrupt mask

__disable_irq();  // Disable interrupts globally
// Critical section
__enable_irq();   // Re-enable interrupts

__DSB();  // Data Synchronization Barrier
__ISB();  // Instruction Synchronization Barrier

STM32 Microcontroller Families

STMicroelectronics' STM32 is the most popular ARM Cortex-M implementation, with hundreds of variants organized into families:

STM32 Family Guide

Value Line Mainstream High Performance
  • STM32F0: Cortex-M0, entry-level, ultra-low-cost ($0.30+)
  • STM32F1: Cortex-M3, legacy mainstream, huge ecosystem
  • STM32F3: Cortex-M4, analog-rich (ADC, comparators)
  • STM32F4: Cortex-M4F, most popular high-performance line
  • STM32F7: Cortex-M7, higher speed, caches, LCD controller
  • STM32H7: Cortex-M7 @ 480MHz, dual-core options
  • STM32L0/L4/L5: Ultra-low-power variants
  • STM32G0/G4: Newest mainstream, replaces F0/F3
  • STM32U5: Ultra-low-power with TrustZone security
Choosing an STM32:
  • Cost-sensitive: STM32F0, STM32G0
  • General purpose: STM32F4 (best ecosystem)
  • Low power: STM32L4, STM32U5
  • High performance: STM32H7
  • Learning: STM32F4 Discovery, Nucleo boards

Development Environment Setup

STM32CubeIDE Installation

STM32CubeIDE is ST's free, all-in-one IDE combining Eclipse, GCC toolchain, and STM32CubeMX code generator.

# Download from st.com/stm32cubeide (Windows/Mac/Linux)
# Includes:
# - Eclipse-based IDE
# - ARM GCC toolchain
# - ST-Link debugger support
# - STM32CubeMX project configurator

# Alternative: PlatformIO in VS Code
pip install platformio
# Then install STM32 platform in PlatformIO

Project Structure

MySTM32Project/
+-- Core/
¦   +-- Inc/              # Header files
¦   ¦   +-- main.h
¦   ¦   +-- stm32f4xx_it.h
¦   +-- Src/              # Source files
¦       +-- main.c
¦       +-- stm32f4xx_it.c   # Interrupt handlers
¦       +-- system_stm32f4xx.c
+-- Drivers/
¦   +-- CMSIS/            # ARM CMSIS headers
¦   +-- STM32F4xx_HAL_Driver/  # HAL library
+-- STM32F446RETX_FLASH.ld   # Linker script
+-- Makefile or .project

HAL Programming Fundamentals

The Hardware Abstraction Layer (HAL) provides portable APIs across STM32 families. While it adds some overhead, it dramatically speeds development.

STM32 Software Layers

HAL LL Direct Register
  • HAL (High-Level): Full abstraction, portable, callback-based
  • LL (Low-Level): Thin wrappers, close to hardware, faster
  • Direct Register: Maximum control, minimum overhead
// HAL initialization pattern
int main(void) {
    // 1. Reset peripherals, initialize Flash and SysTick
    HAL_Init();
    
    // 2. Configure system clock (generated by CubeMX)
    SystemClock_Config();
    
    // 3. Initialize peripherals
    MX_GPIO_Init();
    MX_USART2_UART_Init();
    MX_TIM2_Init();
    
    // 4. Application loop
    while (1) {
        // Your code here
        HAL_Delay(1000);  // Uses SysTick
    }
}

// HAL callback pattern (weak functions to override)
void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin) {
    if (GPIO_Pin == USER_BUTTON_Pin) {
        HAL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin);
    }
}

GPIO Configuration & Control

General Purpose Input/Output (GPIO) pins are the fundamental interface between your MCU and the external world.

GPIO Modes

STM32 GPIO Modes:
  • Input: Floating, Pull-up, Pull-down
  • Output: Push-Pull, Open-Drain
  • Alternate Function: UART, SPI, I2C, PWM, etc.
  • Analog: ADC/DAC input
// GPIO configuration with HAL
void MX_GPIO_Init(void) {
    GPIO_InitTypeDef GPIO_InitStruct = {0};
    
    // Enable GPIO clock
    __HAL_RCC_GPIOA_CLK_ENABLE();
    
    // Configure PA5 as output (LED on Nucleo boards)
    GPIO_InitStruct.Pin = GPIO_PIN_5;
    GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;  // Push-Pull
    GPIO_InitStruct.Pull = GPIO_NOPULL;
    GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
    HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
    
    // Configure PC13 as input with interrupt (User button)
    GPIO_InitStruct.Pin = GPIO_PIN_13;
    GPIO_InitStruct.Mode = GPIO_MODE_IT_FALLING;  // Interrupt on falling edge
    GPIO_InitStruct.Pull = GPIO_PULLUP;
    HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
    
    // Enable EXTI interrupt
    HAL_NVIC_SetPriority(EXTI15_10_IRQn, 0, 0);
    HAL_NVIC_EnableIRQ(EXTI15_10_IRQn);
}

// GPIO control
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_SET);    // Set high
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_RESET);  // Set low
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);                 // Toggle

GPIO_PinState state = HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13);  // Read

Timers & PWM

STM32 timers are incredibly versatile—used for delays, PWM generation, input capture, and event counting.

Timer Types

Basic General Purpose Advanced
  • Basic (TIM6, TIM7): Simple counting, DAC triggers
  • General Purpose (TIM2-5): 4 channels, capture/compare, PWM
  • Advanced (TIM1, TIM8): Motor control, complementary outputs, break input
// Timer-based delay (using TIM2)
void TIM2_Init(void) {
    __HAL_RCC_TIM2_CLK_ENABLE();
    
    htim2.Instance = TIM2;
    htim2.Init.Prescaler = 84 - 1;      // 84MHz / 84 = 1MHz (1µs ticks)
    htim2.Init.CounterMode = TIM_COUNTERMODE_UP;
    htim2.Init.Period = 0xFFFFFFFF;     // 32-bit timer
    htim2.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
    HAL_TIM_Base_Init(&htim2);
    HAL_TIM_Base_Start(&htim2);
}

void delay_us(uint32_t us) {
    uint32_t start = __HAL_TIM_GET_COUNTER(&htim2);
    while ((__HAL_TIM_GET_COUNTER(&htim2) - start) < us);
}

// PWM Generation (LED brightness control)
void PWM_Init(void) {
    TIM_OC_InitTypeDef sConfigOC = {0};
    
    htim3.Instance = TIM3;
    htim3.Init.Prescaler = 84 - 1;      // 1µs resolution
    htim3.Init.Period = 1000 - 1;       // 1kHz PWM frequency
    HAL_TIM_PWM_Init(&htim3);
    
    sConfigOC.OCMode = TIM_OCMODE_PWM1;
    sConfigOC.Pulse = 500;              // 50% duty cycle
    sConfigOC.OCPolarity = TIM_OCPOLARITY_HIGH;
    HAL_TIM_PWM_ConfigChannel(&htim3, &sConfigOC, TIM_CHANNEL_1);
    
    HAL_TIM_PWM_Start(&htim3, TIM_CHANNEL_1);
}

// Change duty cycle at runtime
void set_brightness(uint16_t duty) {  // 0-1000
    __HAL_TIM_SET_COMPARE(&htim3, TIM_CHANNEL_1, duty);
}

UART Communication

UART is the simplest serial protocol—essential for debugging, console output, and communication with sensors and modules.

// UART initialization (115200 baud, 8N1)
UART_HandleTypeDef huart2;

void MX_USART2_UART_Init(void) {
    huart2.Instance = USART2;
    huart2.Init.BaudRate = 115200;
    huart2.Init.WordLength = UART_WORDLENGTH_8B;
    huart2.Init.StopBits = UART_STOPBITS_1;
    huart2.Init.Parity = UART_PARITY_NONE;
    huart2.Init.Mode = UART_MODE_TX_RX;
    huart2.Init.HwFlowCtl = UART_HWCONTROL_NONE;
    HAL_UART_Init(&huart2);
}

// Blocking transmit
char msg[] = "Hello STM32!\r\n";
HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);

// Blocking receive
uint8_t rx_buffer[64];
HAL_UART_Receive(&huart2, rx_buffer, 10, 1000);  // 1s timeout

// Interrupt-driven receive (non-blocking)
void start_uart_rx(void) {
    HAL_UART_Receive_IT(&huart2, &rx_byte, 1);
}

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
    if (huart == &huart2) {
        process_byte(rx_byte);
        HAL_UART_Receive_IT(&huart2, &rx_byte, 1);  // Restart
    }
}

// printf redirect (for debugging)
int _write(int file, char *ptr, int len) {
    HAL_UART_Transmit(&huart2, (uint8_t*)ptr, len, HAL_MAX_DELAY);
    return len;
}

ADC & DAC Peripherals

ADC (Analog-to-Digital Converter)

STM32 ADCs convert analog voltages to digital values—essential for reading sensors like temperature, light, and potentiometers.

// ADC configuration (12-bit, single conversion)
ADC_HandleTypeDef hadc1;

void MX_ADC1_Init(void) {
    ADC_ChannelConfTypeDef sConfig = {0};
    
    hadc1.Instance = ADC1;
    hadc1.Init.ClockPrescaler = ADC_CLOCK_SYNC_PCLK_DIV4;
    hadc1.Init.Resolution = ADC_RESOLUTION_12B;
    hadc1.Init.ScanConvMode = DISABLE;
    hadc1.Init.ContinuousConvMode = DISABLE;
    hadc1.Init.ExternalTrigConvEdge = ADC_EXTERNALTRIGCONVEDGE_NONE;
    hadc1.Init.DataAlign = ADC_DATAALIGN_RIGHT;
    hadc1.Init.NbrOfConversion = 1;
    HAL_ADC_Init(&hadc1);
    
    // Configure channel (PA0 = ADC1_IN0)
    sConfig.Channel = ADC_CHANNEL_0;
    sConfig.Rank = 1;
    sConfig.SamplingTime = ADC_SAMPLETIME_84CYCLES;
    HAL_ADC_ConfigChannel(&hadc1, &sConfig);
}

// Read ADC value
uint16_t read_adc(void) {
    HAL_ADC_Start(&hadc1);
    HAL_ADC_PollForConversion(&hadc1, HAL_MAX_DELAY);
    return HAL_ADC_GetValue(&hadc1);  // 0-4095 for 12-bit
}

// Convert to voltage (assuming 3.3V reference)
float read_voltage(void) {
    return (read_adc() * 3.3f) / 4095.0f;
}

DAC (Digital-to-Analog Converter)

// DAC configuration
DAC_HandleTypeDef hdac;

void MX_DAC_Init(void) {
    DAC_ChannelConfTypeDef sConfig = {0};
    
    hdac.Instance = DAC;
    HAL_DAC_Init(&hdac);
    
    sConfig.DAC_Trigger = DAC_TRIGGER_NONE;
    sConfig.DAC_OutputBuffer = DAC_OUTPUTBUFFER_ENABLE;
    HAL_DAC_ConfigChannel(&hdac, &sConfig, DAC_CHANNEL_1);
    
    HAL_DAC_Start(&hdac, DAC_CHANNEL_1);
}

// Output voltage (0-3.3V mapped to 0-4095)
void set_dac_voltage(float voltage) {
    uint32_t value = (uint32_t)((voltage / 3.3f) * 4095.0f);
    HAL_DAC_SetValue(&hdac, DAC_CHANNEL_1, DAC_ALIGN_12B_R, value);
}

DMA (Direct Memory Access)

DMA transfers data between memory and peripherals without CPU involvement—crucial for high-throughput applications like audio, ADC streaming, and communication.

Why Use DMA?
  • CPU is free to do other work during transfers
  • Higher throughput than interrupt-driven I/O
  • Essential for audio, video, high-speed comms
  • Lower latency for peripheral-to-memory transfers
// DMA-based UART transmit
DMA_HandleTypeDef hdma_usart2_tx;

void DMA_Init(void) {
    __HAL_RCC_DMA1_CLK_ENABLE();
    
    hdma_usart2_tx.Instance = DMA1_Stream6;
    hdma_usart2_tx.Init.Channel = DMA_CHANNEL_4;
    hdma_usart2_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
    hdma_usart2_tx.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_usart2_tx.Init.MemInc = DMA_MINC_ENABLE;
    hdma_usart2_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_usart2_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
    hdma_usart2_tx.Init.Mode = DMA_NORMAL;
    hdma_usart2_tx.Init.Priority = DMA_PRIORITY_LOW;
    HAL_DMA_Init(&hdma_usart2_tx);
    
    __HAL_LINKDMA(&huart2, hdmatx, hdma_usart2_tx);
}

// Non-blocking transmit with DMA
uint8_t large_buffer[1024];
HAL_UART_Transmit_DMA(&huart2, large_buffer, 1024);

// Callback when DMA transfer complete
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) {
    // Transfer complete, start next or signal completion
}

// DMA-based ADC continuous conversion (circular mode)
uint16_t adc_buffer[100];

void start_adc_dma(void) {
    HAL_ADC_Start_DMA(&hadc1, (uint32_t*)adc_buffer, 100);
}

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* hadc) {
    // Buffer full, process or signal
}

Conclusion & What's Next

You've now mastered the core STM32 development skills—understanding ARM Cortex-M architecture, configuring GPIO, timers, UART, ADC/DAC, and DMA. These fundamentals apply across all STM32 families and most ARM-based microcontrollers.

Key Takeaways:
  • Cortex-M0/M3/M4/M7 differ in performance and features (DSP, FPU)
  • STM32CubeIDE + HAL libraries accelerate development
  • GPIO modes: Input, Output, Alternate Function, Analog
  • Timers are versatile: delays, PWM, capture, counting
  • DMA offloads data transfers from the CPU

In Part 3, we'll dive into Real-Time Operating Systems (RTOS) with FreeRTOS and Zephyr—essential for complex multi-tasking embedded applications.

Next Steps

Technology