Series Overview: This is Part 14 of our 18-part STM32 Unleashed series. We have covered GPIO, UART, timers, ADC, SPI, I2C, DMA, interrupts, low-power modes, RTC, CAN, and USB. Now we add an RTOS to orchestrate everything.
1
Architecture & CubeMX Setup
STM32 family, clock tree, HAL vs LL, CubeMX workflow, first project
Completed
2
GPIO & Button Debounce
GPIO modes, pull-up/down, EXTI, software debounce, HAL_GPIO_ReadPin
Completed
3
UART Communication
Polling, interrupt, DMA modes, printf retargeting, ring buffers
Completed
4
Timers, PWM & Input Capture
TIM basics, PWM generation, input capture, encoder mode
Completed
5
ADC & DAC
Single/continuous conversion, DMA, injected channels, DAC waveforms
Completed
6
SPI Protocol
SPI master/slave, full-duplex, DMA transfers, sensor drivers
Completed
7
I2C Protocol
I2C master, 7/10-bit addressing, DMA, multi-master, error handling
Completed
8
DMA & Memory Efficiency
DMA streams, circular mode, memory-to-memory, zero-copy patterns
Completed
9
Interrupt Management & NVIC
Priority grouping, preemption, ISR design, HAL callbacks, latency
Completed
10
Low-Power Modes
Sleep, Stop, Standby modes, RTC wakeup, LP UART, power profiling
Completed
11
RTC & Calendar
RTC configuration, alarms, backup registers, calendar subseconds
Completed
12
CAN Bus
FDCAN/bxCAN, filters, message frames, error handling, automotive use
Completed
13
USB CDC Virtual COM Port
USB FS/HS, CDC class, virtual serial, control transfers, descriptors
Completed
14
FreeRTOS Integration
Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing
You Are Here
15
Bootloader Development
Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app
16
External Storage: SD & QSPI Flash
FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling
17
Ethernet & TCP/IP Stack
LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors
18
Production Readiness
Watchdog, HardFault handler, flash option bytes, code signing, CI/CD
Why FreeRTOS on STM32
The superloop — a while(1) loop calling functions in sequence — is the natural starting point for any embedded project. It is predictable, debuggable, and has zero overhead. But as applications grow, the superloop reveals fundamental limitations that no amount of careful ordering or clever flag management can fully overcome.
The core problem is priority. In a superloop, every function call is implicitly the same priority. If your UART receive handler takes 2 ms and your motor control loop needs to run every 1 ms, you have a problem. You can partially work around this with interrupt service routines, but ISRs should be short — and the moment you start doing real work in ISRs, you have reinvented a poorly-designed RTOS without the documentation.
FreeRTOS fundamentals: FreeRTOS is a real-time operating system kernel for embedded systems. The core abstraction is the task — a function that runs as if it owns the CPU, but actually shares it with other tasks through preemptive scheduling. Each task has a Task Control Block (TCB) containing its stack pointer, state, priority, name, and runtime statistics. The scheduler runs on every tick interrupt, examining the ready list and context-switching to the highest-priority runnable task.
CubeMX integrates FreeRTOS seamlessly: enable it under Middleware, define tasks in the UI, and CubeMX generates freertos.c containing task creation calls and osKernelStart(). The heap is sized through configTOTAL_HEAP_SIZE in FreeRTOSConfig.h.
The CMSIS-RTOS2 wrapper provides a standardised API over FreeRTOS (and other kernels): osThreadNew() instead of xTaskCreate(), osMutexNew() instead of xSemaphoreCreateMutex(). CubeMX defaults to the CMSIS-RTOS2 layer, but you can use native FreeRTOS calls directly — and this article covers both so you understand what the wrapper is hiding.
| Aspect |
Superloop |
FreeRTOS |
When to Use |
| Complexity |
Low — single execution path |
Higher — context switching, TCBs, heap |
Superloop for simple, single-function devices |
| Jitter |
High — tasks block each other |
Low — preemption isolates tasks |
FreeRTOS when timing guarantees are needed |
| Stack overhead |
Single stack, shared by all code |
Separate stack per task (configurable) |
Superloop on 4–8 KB SRAM parts |
| Debugging |
Easy — one call stack |
Harder — multiple stacks, race conditions |
Enable FreeRTOS runtime stats for production |
| Code size |
Minimal |
~5–10 KB Flash, ~1–2 KB RAM for kernel |
FreeRTOS when Flash > 32 KB |
| Inter-task comms |
Shared globals (unsafe) |
Queues, semaphores, mutexes (thread-safe) |
FreeRTOS whenever data flows between contexts |
/* ---------------------------------------------------------------
* FreeRTOS task creation — xTaskCreate and task function prototype
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"
/* Task handle — NULL if we don't need to reference the task later */
static TaskHandle_t xLedTaskHandle = NULL;
static TaskHandle_t xUartTaskHandle = NULL;
/* Task function prototype: void name(void *pvParameters) */
void vLedTask(void *pvParameters);
void vUartTask(void *pvParameters);
int main(void)
{
HAL_Init();
SystemClock_Config(); /* configures SYSCLK, AHB, APB */
MX_GPIO_Init();
MX_USART2_UART_Init();
/* Create LED blink task
* xTaskCreate(pvTaskCode, pcName, usStackDepth_words,
* pvParameters, uxPriority, pxCreatedTask) */
xTaskCreate(vLedTask, /* task function */
"LED", /* name (debug only) */
128, /* stack depth in words */
NULL, /* parameter passed in */
1, /* priority (1 = low) */
&xLedTaskHandle); /* task handle out */
xTaskCreate(vUartTask, "UART", 256, NULL, 2, &xUartTaskHandle);
vTaskStartScheduler(); /* never returns on success */
while (1) {} /* should never reach here */
}
/* LED task: toggles LED every 500 ms */
void vLedTask(void *pvParameters)
{
(void)pvParameters;
for (;;)
{
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
vTaskDelay(pdMS_TO_TICKS(500)); /* blocks, yields CPU */
}
}
/* UART task: prints heartbeat message every 1 s */
void vUartTask(void *pvParameters)
{
(void)pvParameters;
const char *msg = "Heartbeat\r\n";
for (;;)
{
HAL_UART_Transmit(&huart2, (uint8_t *)msg, 11, HAL_MAX_DELAY);
vTaskDelay(pdMS_TO_TICKS(1000));
}
}
Stack depth is in words, not bytes. On a 32-bit Cortex-M, one word = 4 bytes. A stack depth of 128 allocates 512 bytes. Always convert: if you want a 1 KB stack, pass 256.
HAL Timebase Conflict
This is the most common silent killer in STM32+FreeRTOS projects. The HAL library uses SysTick to implement HAL_GetTick() and HAL_Delay(). FreeRTOS also uses SysTick as its RTOS tick source. When both try to configure SysTick, one configuration silently overwrites the other.
The failure mode is insidious: HAL_Delay() works fine in main() before vTaskStartScheduler(), then hangs forever once the scheduler is running. The reason: HAL_Delay() polls HAL_GetTick() in a busy-wait loop, but under FreeRTOS the SysTick handler's priority may cause the tick counter to stop incrementing when called from inside a task.
The solution is to move the HAL timebase to a basic timer — TIM6 is ideal because it has no input capture or output compare channels that might conflict with application code. In CubeMX: System Core → SYS → Timebase Source → TIM6.
NVIC priority rule: FreeRTOS defines configMAX_SYSCALL_INTERRUPT_PRIORITY — any ISR that calls FreeRTOS API functions must have a numerical priority value greater than or equal to this (lower urgency). The SysTick used by FreeRTOS itself must have the lowest priority of all ISRs in the system. When you move HAL timebase to TIM6, assign TIM6 a higher urgency (lower numerical value) than SysTick so HAL_GetTick() always advances.
/* ---------------------------------------------------------------
* TIM6 timebase — HAL_InitTick() override using basic timer TIM6
* Generated by CubeMX when Timebase Source = TIM6
* In: stm32f4xx_hal_timebase_tim.c
* --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"
extern TIM_HandleTypeDef htim6;
/* Called by HAL_Init() to configure the tick time base.
* We override the weak default which uses SysTick. */
HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority)
{
RCC_ClkInitTypeDef clkconfig;
uint32_t uwTimclock, uwAPB1Prescaler;
uint32_t uwPrescalerValue;
uint32_t pFLatency;
/* Enable TIM6 clock */
__HAL_RCC_TIM6_CLK_ENABLE();
/* Get clocks frequencies */
HAL_RCC_GetClockConfig(&clkconfig, &pFLatency);
uwAPB1Prescaler = clkconfig.APB1CLKDivider;
if (uwAPB1Prescaler == RCC_HCLK_DIV1)
uwTimclock = HAL_RCC_GetPCLK1Freq();
else
uwTimclock = 2UL * HAL_RCC_GetPCLK1Freq();
/* 1 MHz timer clock: prescaler = TimClock / 1000000 - 1 */
uwPrescalerValue = (uint32_t)((uwTimclock / 1000000U) - 1U);
htim6.Instance = TIM6;
htim6.Init.Period = (1000000U / 1000U) - 1U; /* 1 ms period */
htim6.Init.Prescaler = uwPrescalerValue;
htim6.Init.ClockDivision = 0;
htim6.Init.CounterMode = TIM_COUNTERMODE_UP;
htim6.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE;
if (HAL_TIM_Base_Init(&htim6) != HAL_OK)
return HAL_ERROR;
/* Configure TIM6 update interrupt at TickPriority */
HAL_NVIC_SetPriority(TIM6_DAC_IRQn, TickPriority, 0U);
HAL_NVIC_EnableIRQ(TIM6_DAC_IRQn);
return HAL_TIM_Base_Start_IT(&htim6);
}
/* TIM6 interrupt handler — increments uwTick */
void TIM6_DAC_IRQHandler(void)
{
HAL_TIM_IRQHandler(&htim6);
}
void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim)
{
if (htim->Instance == TIM6)
HAL_IncTick();
}
Priority numbers are backwards. On Cortex-M with 4-bit priority (16 levels), priority 0 is the highest urgency. FreeRTOS configures SysTick at configKERNEL_INTERRUPT_PRIORITY (typically 255 = lowest). Set TIM6 at priority 6 (in CubeMX NVIC settings) so it preempts SysTick and HAL_GetTick() always advances.
Tasks & Scheduling
Every FreeRTOS task is always in one of four states:
- Running — currently executing on the CPU. Only one task can be Running at a time.
- Ready — eligible to run but waiting for the CPU (a higher-priority task is running).
- Blocked — waiting for a timeout, queue item, semaphore, or notification. Does not consume CPU time.
- Suspended — explicitly removed from scheduling with
vTaskSuspend(). Does not unblock on timeouts.
Priority levels range from 0 (the Idle task) to configMAX_PRIORITIES - 1. The FreeRTOS scheduler is strictly priority-preemptive: the highest-priority Ready task always runs. Among tasks at the same priority, round-robin time-slicing occurs (configurable with configUSE_TIME_SLICING).
Stack sizing: Each task requires its own stack. configMINIMAL_STACK_SIZE (typically 128 words = 512 bytes) is the bare minimum for the Idle task. Real tasks that call HAL functions, use printf, or have deep call stacks may need 512–2048 words. Use uxTaskGetStackHighWaterMark(NULL) to measure the actual high-water mark (the closest the stack came to overflow) during runtime testing.
vTaskDelayUntil() is the correct primitive for periodic tasks. Unlike vTaskDelay() which delays relative to the current time (accumulating drift), vTaskDelayUntil() blocks until an absolute tick count — guaranteeing a fixed period regardless of task execution time.
/* ---------------------------------------------------------------
* Three periodic FreeRTOS tasks using vTaskDelayUntil()
* SensorTask: reads ADC every 10 ms
* ProcessTask: applies moving average filter
* LogTask: outputs results via UART every 100 ms
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
extern ADC_HandleTypeDef hadc1;
extern UART_HandleTypeDef huart2;
static QueueHandle_t xAdcQueue; /* ADC readings: SensorTask → ProcessTask */
static QueueHandle_t xLogQueue; /* filtered values: ProcessTask → LogTask */
/* ── SensorTask: 10 ms period ── */
void vSensorTask(void *pvParameters)
{
(void)pvParameters;
TickType_t xLastWakeTime = xTaskGetTickCount();
const TickType_t xPeriod = pdMS_TO_TICKS(10);
uint32_t adcValue;
for (;;)
{
HAL_ADC_Start(&hadc1);
HAL_ADC_PollForConversion(&hadc1, 5);
adcValue = HAL_ADC_GetValue(&hadc1);
HAL_ADC_Stop(&hadc1);
/* non-blocking send; discard if queue full (back-pressure) */
xQueueSend(xAdcQueue, &adcValue, 0);
vTaskDelayUntil(&xLastWakeTime, xPeriod); /* fixed 10 ms period */
}
}
/* ── ProcessTask: runs as fast as data arrives ── */
void vProcessTask(void *pvParameters)
{
(void)pvParameters;
uint32_t raw, filtered = 0;
/* 8-tap moving average accumulator */
uint32_t buf[8] = {0};
uint8_t idx = 0;
for (;;)
{
xQueueReceive(xAdcQueue, &raw, portMAX_DELAY);
buf[idx] = raw;
idx = (idx + 1) & 0x07;
filtered = 0;
for (int i = 0; i < 8; i++) filtered += buf[i];
filtered >>= 3; /* divide by 8 */
xQueueSend(xLogQueue, &filtered, 0);
}
}
/* ── LogTask: 100 ms period, logs to UART ── */
void vLogTask(void *pvParameters)
{
(void)pvParameters;
TickType_t xLastWakeTime = xTaskGetTickCount();
const TickType_t xPeriod = pdMS_TO_TICKS(100);
uint32_t value;
char buf[64];
int len;
for (;;)
{
while (xQueueReceive(xLogQueue, &value, 0) == pdTRUE)
{
len = snprintf(buf, sizeof(buf), "ADC: %lu mV\r\n",
(value * 3300UL) / 4095UL);
HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
}
vTaskDelayUntil(&xLastWakeTime, xPeriod);
}
}
Queues
A FreeRTOS queue is a thread-safe FIFO that works between tasks and between ISRs and tasks. It is the primary mechanism for passing data across execution contexts without shared globals or manual locking. Queues are type-safe by convention: you define the item size at creation time and always use the same type when sending and receiving.
xQueueCreate(length, itemSize) allocates from the FreeRTOS heap. The queue stores length items, each itemSize bytes. For a queue of structs, pass sizeof(MyStruct_t) — FreeRTOS copies the struct by value on send and receive.
From an ISR, you must use the FromISR variants: xQueueSendFromISR(), xQueueReceiveFromISR(). These take an additional pxHigherPriorityTaskWoken parameter. If a higher-priority task was unblocked by the queue operation, you must call portYIELD_FROM_ISR(*pxHigherPriorityTaskWoken) at the end of the ISR to trigger an immediate context switch rather than waiting for the next tick.
Queue of structs vs queue of pointers: Copying large structs into a queue is safe but wastes cycles. For large payloads, allocate from a memory pool, write the data, send a pointer through the queue, and free after the receiver is done. This is the zero-copy pattern — the queue transmits only a pointer (4 bytes) regardless of payload size.
/* ---------------------------------------------------------------
* ISR-to-task sensor queue:
* UART ISR pushes a SensorReading_t into a queue of 10.
* ProcessTask receives and logs each reading.
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "queue.h"
/* Struct holding one sensor measurement */
typedef struct {
uint32_t timestamp_ms; /* HAL_GetTick() at capture time */
uint16_t raw_adc; /* 12-bit ADC reading */
int16_t temperature_dC; /* temperature in 0.1 °C units */
} SensorReading_t;
static QueueHandle_t xSensorQueue;
/* Called from main() before vTaskStartScheduler() */
void SensorQueue_Init(void)
{
xSensorQueue = xQueueCreate(10, sizeof(SensorReading_t));
configASSERT(xSensorQueue != NULL);
}
/* ── UART Rx Complete Callback (runs in ISR context) ── */
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
BaseType_t xHigherPriorityTaskWoken = pdFALSE;
SensorReading_t reading;
reading.timestamp_ms = HAL_GetTick();
reading.raw_adc = 2048; /* placeholder: parse from DMA buffer */
reading.temperature_dC = 235; /* placeholder: 23.5 °C */
/* Send to queue from ISR; do NOT block (timeout = 0) */
xQueueSendFromISR(xSensorQueue, &reading, &xHigherPriorityTaskWoken);
/* Trigger context switch if ProcessTask has higher priority */
portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}
/* ── ProcessTask: receives readings and logs them ── */
void vProcessTask(void *pvParameters)
{
(void)pvParameters;
SensorReading_t reading;
char buf[80];
int len;
for (;;)
{
/* Block indefinitely until a reading arrives */
xQueueReceive(xSensorQueue, &reading, portMAX_DELAY);
len = snprintf(buf, sizeof(buf),
"[%6lu ms] ADC=%4u Temp=%3d.%1d C\r\n",
reading.timestamp_ms,
reading.raw_adc,
reading.temperature_dC / 10,
reading.temperature_dC % 10);
HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
}
}
Semaphores & Mutexes
Queues transmit data. Semaphores and mutexes synchronise access and events.
Binary semaphore is a signalling primitive. One context (an ISR or task) gives the semaphore; another task waits on it with xSemaphoreTake(). This is the correct pattern for "ISR signals task that work is ready" — far cleaner than polling a volatile flag, because the task blocks and yields the CPU while waiting.
Counting semaphore extends the binary semaphore to a count > 1. Use it to track a pool of resources (e.g. 4 available DMA channels) or to count events that arrive faster than they are consumed.
Mutex (mutual exclusion semaphore) protects a shared resource from concurrent access. It enforces the rule that only one task at a time can hold it. Unlike a binary semaphore, a mutex must be released by the same task that acquired it — the kernel tracks ownership. Attempting to release a mutex from a different task than the one that acquired it is undefined behaviour.
Priority inheritance: FreeRTOS mutexes implement priority inheritance automatically. If a high-priority task (H) is blocked waiting for a mutex held by a low-priority task (L), the kernel temporarily boosts L's priority to match H, preventing a medium-priority task (M) from starving H indefinitely — the classic priority inversion scenario.
| Type |
Create API |
Use Case |
Priority Inheritance |
ISR-safe Give |
| Binary Semaphore |
xSemaphoreCreateBinary() |
ISR signals task |
No |
Yes (xSemaphoreGiveFromISR) |
| Counting Semaphore |
xSemaphoreCreateCounting() |
Resource pool, event counting |
No |
Yes (xSemaphoreGiveFromISR) |
| Mutex |
xSemaphoreCreateMutex() |
Shared peripheral protection |
Yes |
No — never use mutex from ISR |
| Recursive Mutex |
xSemaphoreCreateRecursiveMutex() |
Re-entrant code paths |
Yes |
No |
/* ---------------------------------------------------------------
* SPI mutex protecting a shared SPI bus used by two tasks.
* Both tasks acquire the mutex before driving CS low,
* and release it after the transfer + CS deassert.
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "semphr.h"
extern SPI_HandleTypeDef hspi1;
/* Mutex created once — before vTaskStartScheduler() */
static SemaphoreHandle_t xSpiMutex = NULL;
void SpiMutex_Init(void)
{
xSpiMutex = xSemaphoreCreateMutex();
configASSERT(xSpiMutex != NULL);
}
/* Generic SPI transfer with mutex protection
* Returns HAL_OK on success, HAL_TIMEOUT if mutex not acquired */
HAL_StatusTypeDef SPI_Transfer(GPIO_TypeDef *csPort, uint16_t csPin,
uint8_t *txBuf, uint8_t *rxBuf,
uint16_t len, uint32_t timeoutMs)
{
if (xSemaphoreTake(xSpiMutex, pdMS_TO_TICKS(timeoutMs)) != pdTRUE)
return HAL_TIMEOUT;
HAL_GPIO_WritePin(csPort, csPin, GPIO_PIN_RESET); /* CS active low */
HAL_StatusTypeDef status =
HAL_SPI_TransmitReceive(&hspi1, txBuf, rxBuf, len, HAL_MAX_DELAY);
HAL_GPIO_WritePin(csPort, csPin, GPIO_PIN_SET); /* CS deassert */
xSemaphoreGive(xSpiMutex);
return status;
}
/* Task A: reads accelerometer over shared SPI */
void vAccelTask(void *pvParameters)
{
(void)pvParameters;
uint8_t tx[2] = {0x0F | 0x80, 0x00}; /* WHO_AM_I read command */
uint8_t rx[2];
for (;;)
{
SPI_Transfer(GPIOE, GPIO_PIN_3, tx, rx, 2, 100);
vTaskDelay(pdMS_TO_TICKS(50));
}
}
/* Task B: controls external flash over same SPI bus */
void vFlashTask(void *pvParameters)
{
(void)pvParameters;
uint8_t tx[4] = {0x03, 0x00, 0x00, 0x00}; /* READ command + addr */
uint8_t rx[4];
for (;;)
{
SPI_Transfer(GPIOB, GPIO_PIN_6, tx, rx, 4, 200);
vTaskDelay(pdMS_TO_TICKS(200));
}
}
Stack Sizing & Memory
Running out of stack in an RTOS task is one of the hardest bugs to diagnose — the failure can manifest as corrupt data, a HardFault hours after the stack overflowed, or completely silent wrong behaviour. The FreeRTOS toolkit provides everything you need to detect and prevent stack overflows before they reach production.
FreeRTOS heap allocators: Select via the heap_N.c file you compile. heap_4 is the most suitable for typical applications: it implements a best-fit allocator with coalescing of adjacent free blocks, preventing fragmentation from repeated alloc/free cycles. configTOTAL_HEAP_SIZE in FreeRTOSConfig.h sets the total heap budget — all task stacks, queues, semaphores, and mutexes come from this pool.
uxTaskGetStackHighWaterMark(taskHandle) returns the minimum number of free words that have ever existed on a task's stack. A value near zero means the stack nearly overflowed. Run your system through all worst-case code paths (ISR storms, large printf calls, recursive decode) and then audit the high-water marks. Pad each task's stack by at least 20% above the measured high-water mark.
Stack overflow hook: Enable configCHECK_FOR_STACK_OVERFLOW 2 in FreeRTOSConfig.h. FreeRTOS writes a known pattern to the last 16 bytes of each task's stack and checks it on every context switch. If corrupted, it calls vApplicationStackOverflowHook() — implement this to assert or log the offending task name before the system corrupts further.
/* ---------------------------------------------------------------
* Stack audit: print high-water mark for all tasks over UART.
* vApplicationStackOverflowHook: assert + log on stack overflow.
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"
#include
#include
extern UART_HandleTypeDef huart2;
/* Call this from a debug task or on command to audit all stacks */
void PrintStackAudit(void)
{
TaskStatus_t *pxTaskStatusArray;
UBaseType_t uxArraySize, i;
char buf[80];
int len;
uxArraySize = uxTaskGetNumberOfTasks();
pxTaskStatusArray = pvPortMalloc(uxArraySize * sizeof(TaskStatus_t));
if (pxTaskStatusArray == NULL) return;
/* Fill array with current task info */
uxArraySize = uxTaskGetSystemState(pxTaskStatusArray,
uxArraySize, NULL);
len = snprintf(buf, sizeof(buf),
"\r\n--- Stack Audit ---\r\n"
"%-16s %-8s %-10s\r\n", "Task", "Pri", "HWM(words)");
HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
for (i = 0; i < uxArraySize; i++)
{
len = snprintf(buf, sizeof(buf), "%-16s %-8lu %-10lu\r\n",
pxTaskStatusArray[i].pcTaskName,
(unsigned long)pxTaskStatusArray[i].uxCurrentPriority,
(unsigned long)pxTaskStatusArray[i].usStackHighWaterMark);
HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
}
len = snprintf(buf, sizeof(buf),
"Free heap: %lu bytes\r\n-------------------\r\n",
(unsigned long)xPortGetFreeHeapSize());
HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
vPortFree(pxTaskStatusArray);
}
/* Stack overflow hook — called by FreeRTOS when stack sentinel is corrupted */
void vApplicationStackOverflowHook(TaskHandle_t xTask, char *pcTaskName)
{
(void)xTask;
/* Log the task name before asserting — at this point the stack
* is already corrupt, so keep this code as simple as possible. */
char msg[64];
int len = snprintf(msg, sizeof(msg),
"STACK OVERFLOW: task='%s'\r\n", pcTaskName);
HAL_UART_Transmit(&huart2, (uint8_t *)msg, len, 10);
configASSERT(0); /* triggers breakpoint in debug, resets via watchdog */
}
FreeRTOS + HAL Peripheral Integration
The most ergonomic integration pattern combines DMA-driven HAL peripherals with FreeRTOS semaphores or task notifications. The DMA transfer completes in hardware; the HAL callback fires in ISR context; the ISR signals a semaphore; the task wakes up, processes the data, and goes back to sleep — all without polling and without busy-waiting.
UART DMA + binary semaphore: Start a DMA TX with HAL_UART_Transmit_DMA(). When the transfer completes, HAL_UART_TxCpltCallback() fires in ISR context. Give a binary semaphore from the ISR. The task that initiated the transfer waits on the semaphore, allowing other tasks to run during the DMA operation.
ADC DMA + task notification: HAL_ADC_ConvCpltCallback() fires when a DMA buffer is full. Call xTaskNotifyFromISR() targeting the process task — this is lighter-weight than a semaphore for one-to-one signalling and carries a 32-bit notification value.
Accessing shared UART from multiple tasks: The DMA semaphore serialises the transfer itself. But if two tasks both call HAL_UART_Transmit_DMA() concurrently, the second call will fail with HAL_BUSY (UART state machine is not idle). Wrap the entire sequence — acquire mutex, start DMA, wait on semaphore, release mutex — so only one task at a time can initiate a UART transfer.
/* ---------------------------------------------------------------
* DMA UART TX with mutex + completion semaphore.
* Thread-safe UART write for multiple tasks.
* --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "semphr.h"
extern UART_HandleTypeDef huart2;
static SemaphoreHandle_t xUartMutex = NULL;
static SemaphoreHandle_t xDmaDoneSem = NULL;
void UART_RTOS_Init(void)
{
xUartMutex = xSemaphoreCreateMutex();
xDmaDoneSem = xSemaphoreCreateBinary();
configASSERT(xUartMutex != NULL);
configASSERT(xDmaDoneSem != NULL);
}
/* Thread-safe DMA UART write — blocks until TX is complete */
HAL_StatusTypeDef UART_Write(const uint8_t *data, uint16_t len)
{
HAL_StatusTypeDef status;
/* Acquire mutex: only one task may use UART at a time */
if (xSemaphoreTake(xUartMutex, pdMS_TO_TICKS(500)) != pdTRUE)
return HAL_TIMEOUT;
/* Start non-blocking DMA transfer */
status = HAL_UART_Transmit_DMA(&huart2, data, len);
if (status != HAL_OK)
{
xSemaphoreGive(xUartMutex);
return status;
}
/* Wait for DMA completion callback to signal the semaphore.
* Timeout 1 s: allows detection of DMA hang. */
if (xSemaphoreTake(xDmaDoneSem, pdMS_TO_TICKS(1000)) != pdTRUE)
{
HAL_UART_AbortTransmit(&huart2);
xSemaphoreGive(xUartMutex);
return HAL_TIMEOUT;
}
xSemaphoreGive(xUartMutex);
return HAL_OK;
}
/* DMA TX complete callback — runs in ISR context */
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart)
{
if (huart->Instance == USART2)
{
BaseType_t xHigherPriorityTaskWoken = pdFALSE;
xSemaphoreGiveFromISR(xDmaDoneSem, &xHigherPriorityTaskWoken);
portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}
}
Data must outlive the DMA transfer. Never pass a stack-allocated buffer to HAL_UART_Transmit_DMA() if the calling task might return before the DMA completes. Either use a static buffer, heap-allocated buffer, or block until the semaphore signals completion (as shown above).
Exercises
Exercise 1
Beginner
Three Concurrent FreeRTOS Tasks
Create 3 FreeRTOS tasks: BlinkTask (500 ms period, toggles LED using vTaskDelayUntil), UartTask (prints "Task running [tick]" every 1 s over UART), ButtonTask (blocks on a binary semaphore signalled by an EXTI ISR on the user button). Move the HAL timebase to TIM6. Verify all three run concurrently: the LED blinks at exactly 500 ms, UART output appears every 1 s, and pressing the button prints a message without disrupting the other tasks.
xTaskCreate
vTaskDelayUntil
Binary Semaphore
TIM6 Timebase
Exercise 2
Intermediate
Sensor Fusion Pipeline with Queues
Build a three-task sensor fusion pipeline: Task A reads MPU-6050 over I2C at 1 kHz using DMA and a binary semaphore. Task B receives raw accelerometer + gyroscope readings via a queue of 20, applies a complementary filter (alpha = 0.98) to compute roll and pitch angles. Task C receives filtered angles from a second queue and outputs them over UART at 100 Hz. After running for 60 seconds, print the queue high-water marks and verify no data loss occurred.
I2C DMA
Queue
Complementary Filter
High-Water Mark
Exercise 3
Advanced
Diagnose Priority Inversion
Create 3 tasks to demonstrate and measure priority inversion: High (priority 10, acquires SPI mutex, performs a 1 ms SPI read), Medium (priority 5, CPU-bound loop incrementing a counter), Low (priority 1, holds the SPI mutex while performing a simulated 50 ms flash erase delay). First, run this scenario with a binary semaphore instead of a mutex — measure the latency from High task becoming Ready to actually Running using a GPIO toggle and oscilloscope. Then switch to a proper mutex (with priority inheritance) and measure again. Document the latency difference and explain why priority inheritance eliminates the inversion.
Priority Inversion
Priority Inheritance
Mutex
Oscilloscope Measurement
RTOS Design Tool
Document your FreeRTOS design — tasks, queues, mutexes, heap configuration, and timer base selection. Download as Word, Excel, PDF, or PPTX for design reviews or team handoff.
Conclusion & Next Steps
In this article we have built a complete FreeRTOS foundation for STM32 development:
- The HAL timebase conflict is the most critical issue: always move the HAL tick source to TIM6 (or another basic timer) when using FreeRTOS, and configure NVIC priorities so TIM6 has higher urgency than SysTick.
- Tasks created with
xTaskCreate() each have their own stack. Use vTaskDelayUntil() for periodic tasks to avoid accumulated drift, and uxTaskGetStackHighWaterMark() to size stacks correctly.
- Queues are the correct way to pass data between tasks or from ISR to task. Always use
FromISR variants in interrupt context and call portYIELD_FROM_ISR() to trigger immediate context switches.
- Mutexes protect shared peripherals with priority inheritance. Never use a mutex from an ISR — use a binary semaphore for ISR-to-task signalling instead.
- DMA + semaphore is the right pattern for non-blocking peripheral access: start DMA, let the CPU do other work, wake on completion callback.
- Enable
configCHECK_FOR_STACK_OVERFLOW 2 and implement vApplicationStackOverflowHook() for every production project.
Next in the Series
In Part 15: Bootloader Development, we design a fail-safe flash memory layout, implement a UART/USB DFU firmware update protocol, add CRC32 image validation, and build the jump-to-application mechanism — including dual-bank updates on STM32H7. Every production STM32 product needs a bootloader, and this is how to build one correctly.
Related Articles in This Series
Part 15: Bootloader Development
Custom IAP bootloader, UART/USB DFU firmware update, CRC32 image validation, jump-to-application, and dual-bank fail-safe updates.
Read Article
Part 9: Interrupt Management & NVIC
Priority grouping, preemption levels, ISR design patterns, HAL callbacks, and measuring interrupt latency — essential knowledge for any FreeRTOS project.
Read Article
Part 18: Production Readiness
Watchdog timers, HardFault handler implementation, flash option bytes, code signing, and CI/CD pipeline for STM32 firmware.
Read Article