STM32 Part 14: FreeRTOS Integration

                        
                        Series Overview: This is Part 14 of our 18-part STM32 Unleashed series. We have covered GPIO, UART, timers, ADC, SPI, I2C, DMA, interrupts, low-power modes, RTC, CAN, and USB. Now we add an RTOS to orchestrate everything.
                    

STM32 Unleashed: HAL Driver Development

Your 18-step learning path • Currently on Step 14

1

14

FreeRTOS Integration

Tasks, queues, semaphores, mutexes, CMSIS-RTOS2 wrapper, stack sizing

You Are Here

15

Bootloader Development

Custom IAP bootloader, UART/USB DFU, flash programming, jump-to-app

16

External Storage: SD & QSPI Flash

FATFS on SD card, QSPI NOR flash, memory-mapped execution, wear levelling

17

Ethernet & TCP/IP Stack

LwIP integration, DHCP, TCP server, HTTP, MQTT, Ethernet DMA descriptors

18

Production Readiness

Watchdog, HardFault handler, flash option bytes, code signing, CI/CD

Why FreeRTOS on STM32

The superloop — a while(1) loop calling functions in sequence — is the natural starting point for any embedded project. It is predictable, debuggable, and has zero overhead. But as applications grow, the superloop reveals fundamental limitations that no amount of careful ordering or clever flag management can fully overcome.

The core problem is priority. In a superloop, every function call is implicitly the same priority. If your UART receive handler takes 2 ms and your motor control loop needs to run every 1 ms, you have a problem. You can partially work around this with interrupt service routines, but ISRs should be short — and the moment you start doing real work in ISRs, you have reinvented a poorly-designed RTOS without the documentation.

FreeRTOS fundamentals: FreeRTOS is a real-time operating system kernel for embedded systems. The core abstraction is the task — a function that runs as if it owns the CPU, but actually shares it with other tasks through preemptive scheduling. Each task has a Task Control Block (TCB) containing its stack pointer, state, priority, name, and runtime statistics. The scheduler runs on every tick interrupt, examining the ready list and context-switching to the highest-priority runnable task.

CubeMX integrates FreeRTOS seamlessly: enable it under Middleware, define tasks in the UI, and CubeMX generates freertos.c containing task creation calls and osKernelStart(). The heap is sized through configTOTAL_HEAP_SIZE in FreeRTOSConfig.h.

The CMSIS-RTOS2 wrapper provides a standardised API over FreeRTOS (and other kernels): osThreadNew() instead of xTaskCreate(), osMutexNew() instead of xSemaphoreCreateMutex(). CubeMX defaults to the CMSIS-RTOS2 layer, but you can use native FreeRTOS calls directly — and this article covers both so you understand what the wrapper is hiding.

Aspect	Superloop	FreeRTOS	When to Use
Complexity	Low — single execution path	Higher — context switching, TCBs, heap	Superloop for simple, single-function devices
Jitter	High — tasks block each other	Low — preemption isolates tasks	FreeRTOS when timing guarantees are needed
Stack overhead	Single stack, shared by all code	Separate stack per task (configurable)	Superloop on 4–8 KB SRAM parts
Debugging	Easy — one call stack	Harder — multiple stacks, race conditions	Enable FreeRTOS runtime stats for production
Code size	Minimal	~5–10 KB Flash, ~1–2 KB RAM for kernel	FreeRTOS when Flash > 32 KB
Inter-task comms	Shared globals (unsafe)	Queues, semaphores, mutexes (thread-safe)	FreeRTOS whenever data flows between contexts

/* ---------------------------------------------------------------
 * FreeRTOS task creation — xTaskCreate and task function prototype
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"

/* Task handle — NULL if we don't need to reference the task later */
static TaskHandle_t xLedTaskHandle = NULL;
static TaskHandle_t xUartTaskHandle = NULL;

/* Task function prototype: void name(void *pvParameters) */
void vLedTask(void *pvParameters);
void vUartTask(void *pvParameters);

int main(void)
{
    HAL_Init();
    SystemClock_Config();   /* configures SYSCLK, AHB, APB */
    MX_GPIO_Init();
    MX_USART2_UART_Init();

    /* Create LED blink task
     * xTaskCreate(pvTaskCode, pcName, usStackDepth_words,
     *             pvParameters, uxPriority, pxCreatedTask) */
    xTaskCreate(vLedTask,           /* task function          */
                "LED",              /* name (debug only)      */
                128,                /* stack depth in words   */
                NULL,               /* parameter passed in    */
                1,                  /* priority (1 = low)     */
                &xLedTaskHandle);   /* task handle out        */

    xTaskCreate(vUartTask, "UART", 256, NULL, 2, &xUartTaskHandle);

    vTaskStartScheduler();  /* never returns on success */
    while (1) {}            /* should never reach here  */
}

/* LED task: toggles LED every 500 ms */
void vLedTask(void *pvParameters)
{
    (void)pvParameters;
    for (;;)
    {
        HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
        vTaskDelay(pdMS_TO_TICKS(500));  /* blocks, yields CPU */
    }
}

/* UART task: prints heartbeat message every 1 s */
void vUartTask(void *pvParameters)
{
    (void)pvParameters;
    const char *msg = "Heartbeat\r\n";
    for (;;)
    {
        HAL_UART_Transmit(&huart2, (uint8_t *)msg, 11, HAL_MAX_DELAY);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

                        
                        Stack depth is in words, not bytes. On a 32-bit Cortex-M, one word = 4 bytes. A stack depth of 128 allocates 512 bytes. Always convert: if you want a 1 KB stack, pass 256.
                    

HAL Timebase Conflict

This is the most common silent killer in STM32+FreeRTOS projects. The HAL library uses SysTick to implement HAL_GetTick() and HAL_Delay(). FreeRTOS also uses SysTick as its RTOS tick source. When both try to configure SysTick, one configuration silently overwrites the other.

The failure mode is insidious: HAL_Delay() works fine in main() before vTaskStartScheduler(), then hangs forever once the scheduler is running. The reason: HAL_Delay() polls HAL_GetTick() in a busy-wait loop, but under FreeRTOS the SysTick handler's priority may cause the tick counter to stop incrementing when called from inside a task.

The solution is to move the HAL timebase to a basic timer — TIM6 is ideal because it has no input capture or output compare channels that might conflict with application code. In CubeMX: System Core → SYS → Timebase Source → TIM6.

NVIC priority rule: FreeRTOS defines configMAX_SYSCALL_INTERRUPT_PRIORITY — any ISR that calls FreeRTOS API functions must have a numerical priority value greater than or equal to this (lower urgency). The SysTick used by FreeRTOS itself must have the lowest priority of all ISRs in the system. When you move HAL timebase to TIM6, assign TIM6 a higher urgency (lower numerical value) than SysTick so HAL_GetTick() always advances.

/* ---------------------------------------------------------------
 * TIM6 timebase — HAL_InitTick() override using basic timer TIM6
 * Generated by CubeMX when Timebase Source = TIM6
 * In: stm32f4xx_hal_timebase_tim.c
 * --------------------------------------------------------------- */
#include "stm32f4xx_hal.h"

extern TIM_HandleTypeDef htim6;

/* Called by HAL_Init() to configure the tick time base.
 * We override the weak default which uses SysTick. */
HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority)
{
    RCC_ClkInitTypeDef    clkconfig;
    uint32_t              uwTimclock, uwAPB1Prescaler;
    uint32_t              uwPrescalerValue;
    uint32_t              pFLatency;

    /* Enable TIM6 clock */
    __HAL_RCC_TIM6_CLK_ENABLE();

    /* Get clocks frequencies */
    HAL_RCC_GetClockConfig(&clkconfig, &pFLatency);

    uwAPB1Prescaler = clkconfig.APB1CLKDivider;
    if (uwAPB1Prescaler == RCC_HCLK_DIV1)
        uwTimclock = HAL_RCC_GetPCLK1Freq();
    else
        uwTimclock = 2UL * HAL_RCC_GetPCLK1Freq();

    /* 1 MHz timer clock: prescaler = TimClock / 1000000 - 1 */
    uwPrescalerValue = (uint32_t)((uwTimclock / 1000000U) - 1U);

    htim6.Instance               = TIM6;
    htim6.Init.Period            = (1000000U / 1000U) - 1U; /* 1 ms period */
    htim6.Init.Prescaler         = uwPrescalerValue;
    htim6.Init.ClockDivision     = 0;
    htim6.Init.CounterMode       = TIM_COUNTERMODE_UP;
    htim6.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE;

    if (HAL_TIM_Base_Init(&htim6) != HAL_OK)
        return HAL_ERROR;

    /* Configure TIM6 update interrupt at TickPriority */
    HAL_NVIC_SetPriority(TIM6_DAC_IRQn, TickPriority, 0U);
    HAL_NVIC_EnableIRQ(TIM6_DAC_IRQn);

    return HAL_TIM_Base_Start_IT(&htim6);
}

/* TIM6 interrupt handler — increments uwTick */
void TIM6_DAC_IRQHandler(void)
{
    HAL_TIM_IRQHandler(&htim6);
}

void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim)
{
    if (htim->Instance == TIM6)
        HAL_IncTick();
}

                        
                        Priority numbers are backwards. On Cortex-M with 4-bit priority (16 levels), priority 0 is the highest urgency. FreeRTOS configures SysTick at configKERNEL_INTERRUPT_PRIORITY (typically 255 = lowest). Set TIM6 at priority 6 (in CubeMX NVIC settings) so it preempts SysTick and HAL_GetTick() always advances.
                    

Tasks & Scheduling

Every FreeRTOS task is always in one of four states:

Running — currently executing on the CPU. Only one task can be Running at a time.
Ready — eligible to run but waiting for the CPU (a higher-priority task is running).
Blocked — waiting for a timeout, queue item, semaphore, or notification. Does not consume CPU time.
Suspended — explicitly removed from scheduling with vTaskSuspend(). Does not unblock on timeouts.

Priority levels range from 0 (the Idle task) to configMAX_PRIORITIES - 1. The FreeRTOS scheduler is strictly priority-preemptive: the highest-priority Ready task always runs. Among tasks at the same priority, round-robin time-slicing occurs (configurable with configUSE_TIME_SLICING).

Stack sizing: Each task requires its own stack. configMINIMAL_STACK_SIZE (typically 128 words = 512 bytes) is the bare minimum for the Idle task. Real tasks that call HAL functions, use printf, or have deep call stacks may need 512–2048 words. Use uxTaskGetStackHighWaterMark(NULL) to measure the actual high-water mark (the closest the stack came to overflow) during runtime testing.

vTaskDelayUntil() is the correct primitive for periodic tasks. Unlike vTaskDelay() which delays relative to the current time (accumulating drift), vTaskDelayUntil() blocks until an absolute tick count — guaranteeing a fixed period regardless of task execution time.

/* ---------------------------------------------------------------
 * Three periodic FreeRTOS tasks using vTaskDelayUntil()
 * SensorTask: reads ADC every 10 ms
 * ProcessTask: applies moving average filter
 * LogTask: outputs results via UART every 100 ms
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"

extern ADC_HandleTypeDef hadc1;
extern UART_HandleTypeDef huart2;

static QueueHandle_t xAdcQueue;   /* ADC readings: SensorTask → ProcessTask */
static QueueHandle_t xLogQueue;   /* filtered values: ProcessTask → LogTask */

/* ── SensorTask: 10 ms period ── */
void vSensorTask(void *pvParameters)
{
    (void)pvParameters;
    TickType_t xLastWakeTime = xTaskGetTickCount();
    const TickType_t xPeriod = pdMS_TO_TICKS(10);
    uint32_t adcValue;

    for (;;)
    {
        HAL_ADC_Start(&hadc1);
        HAL_ADC_PollForConversion(&hadc1, 5);
        adcValue = HAL_ADC_GetValue(&hadc1);
        HAL_ADC_Stop(&hadc1);

        /* non-blocking send; discard if queue full (back-pressure) */
        xQueueSend(xAdcQueue, &adcValue, 0);

        vTaskDelayUntil(&xLastWakeTime, xPeriod);  /* fixed 10 ms period */
    }
}

/* ── ProcessTask: runs as fast as data arrives ── */
void vProcessTask(void *pvParameters)
{
    (void)pvParameters;
    uint32_t raw, filtered = 0;
    /* 8-tap moving average accumulator */
    uint32_t buf[8] = {0};
    uint8_t  idx = 0;

    for (;;)
    {
        xQueueReceive(xAdcQueue, &raw, portMAX_DELAY);

        buf[idx] = raw;
        idx = (idx + 1) & 0x07;
        filtered = 0;
        for (int i = 0; i < 8; i++) filtered += buf[i];
        filtered >>= 3;  /* divide by 8 */

        xQueueSend(xLogQueue, &filtered, 0);
    }
}

/* ── LogTask: 100 ms period, logs to UART ── */
void vLogTask(void *pvParameters)
{
    (void)pvParameters;
    TickType_t xLastWakeTime = xTaskGetTickCount();
    const TickType_t xPeriod = pdMS_TO_TICKS(100);
    uint32_t value;
    char buf[64];
    int len;

    for (;;)
    {
        while (xQueueReceive(xLogQueue, &value, 0) == pdTRUE)
        {
            len = snprintf(buf, sizeof(buf), "ADC: %lu mV\r\n",
                           (value * 3300UL) / 4095UL);
            HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
        }
        vTaskDelayUntil(&xLastWakeTime, xPeriod);
    }
}

Queues

A FreeRTOS queue is a thread-safe FIFO that works between tasks and between ISRs and tasks. It is the primary mechanism for passing data across execution contexts without shared globals or manual locking. Queues are type-safe by convention: you define the item size at creation time and always use the same type when sending and receiving.

xQueueCreate(length, itemSize) allocates from the FreeRTOS heap. The queue stores length items, each itemSize bytes. For a queue of structs, pass sizeof(MyStruct_t) — FreeRTOS copies the struct by value on send and receive.

From an ISR, you must use the FromISR variants: xQueueSendFromISR(), xQueueReceiveFromISR(). These take an additional pxHigherPriorityTaskWoken parameter. If a higher-priority task was unblocked by the queue operation, you must call portYIELD_FROM_ISR(*pxHigherPriorityTaskWoken) at the end of the ISR to trigger an immediate context switch rather than waiting for the next tick.

Queue of structs vs queue of pointers: Copying large structs into a queue is safe but wastes cycles. For large payloads, allocate from a memory pool, write the data, send a pointer through the queue, and free after the receiver is done. This is the zero-copy pattern — the queue transmits only a pointer (4 bytes) regardless of payload size.

/* ---------------------------------------------------------------
 * ISR-to-task sensor queue:
 * UART ISR pushes a SensorReading_t into a queue of 10.
 * ProcessTask receives and logs each reading.
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "queue.h"

/* Struct holding one sensor measurement */
typedef struct {
    uint32_t timestamp_ms;   /* HAL_GetTick() at capture time */
    uint16_t raw_adc;        /* 12-bit ADC reading            */
    int16_t  temperature_dC; /* temperature in 0.1 °C units   */
} SensorReading_t;

static QueueHandle_t xSensorQueue;

/* Called from main() before vTaskStartScheduler() */
void SensorQueue_Init(void)
{
    xSensorQueue = xQueueCreate(10, sizeof(SensorReading_t));
    configASSERT(xSensorQueue != NULL);
}

/* ── UART Rx Complete Callback (runs in ISR context) ── */
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    SensorReading_t reading;

    reading.timestamp_ms   = HAL_GetTick();
    reading.raw_adc        = 2048;      /* placeholder: parse from DMA buffer */
    reading.temperature_dC = 235;       /* placeholder: 23.5 °C              */

    /* Send to queue from ISR; do NOT block (timeout = 0) */
    xQueueSendFromISR(xSensorQueue, &reading, &xHigherPriorityTaskWoken);

    /* Trigger context switch if ProcessTask has higher priority */
    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

/* ── ProcessTask: receives readings and logs them ── */
void vProcessTask(void *pvParameters)
{
    (void)pvParameters;
    SensorReading_t reading;
    char buf[80];
    int len;

    for (;;)
    {
        /* Block indefinitely until a reading arrives */
        xQueueReceive(xSensorQueue, &reading, portMAX_DELAY);

        len = snprintf(buf, sizeof(buf),
                       "[%6lu ms] ADC=%4u  Temp=%3d.%1d C\r\n",
                       reading.timestamp_ms,
                       reading.raw_adc,
                       reading.temperature_dC / 10,
                       reading.temperature_dC % 10);
        HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
    }
}

Semaphores & Mutexes

Queues transmit data. Semaphores and mutexes synchronise access and events.

Binary semaphore is a signalling primitive. One context (an ISR or task) gives the semaphore; another task waits on it with xSemaphoreTake(). This is the correct pattern for "ISR signals task that work is ready" — far cleaner than polling a volatile flag, because the task blocks and yields the CPU while waiting.

Counting semaphore extends the binary semaphore to a count > 1. Use it to track a pool of resources (e.g. 4 available DMA channels) or to count events that arrive faster than they are consumed.

Mutex (mutual exclusion semaphore) protects a shared resource from concurrent access. It enforces the rule that only one task at a time can hold it. Unlike a binary semaphore, a mutex must be released by the same task that acquired it — the kernel tracks ownership. Attempting to release a mutex from a different task than the one that acquired it is undefined behaviour.

Priority inheritance: FreeRTOS mutexes implement priority inheritance automatically. If a high-priority task (H) is blocked waiting for a mutex held by a low-priority task (L), the kernel temporarily boosts L's priority to match H, preventing a medium-priority task (M) from starving H indefinitely — the classic priority inversion scenario.

Type	Create API	Use Case	Priority Inheritance	ISR-safe Give
Binary Semaphore	`xSemaphoreCreateBinary()`	ISR signals task	No	Yes (`xSemaphoreGiveFromISR`)
Counting Semaphore	`xSemaphoreCreateCounting()`	Resource pool, event counting	No	Yes (`xSemaphoreGiveFromISR`)
Mutex	`xSemaphoreCreateMutex()`	Shared peripheral protection	Yes	No — never use mutex from ISR
Recursive Mutex	`xSemaphoreCreateRecursiveMutex()`	Re-entrant code paths	Yes	No

/* ---------------------------------------------------------------
 * SPI mutex protecting a shared SPI bus used by two tasks.
 * Both tasks acquire the mutex before driving CS low,
 * and release it after the transfer + CS deassert.
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "semphr.h"

extern SPI_HandleTypeDef hspi1;

/* Mutex created once — before vTaskStartScheduler() */
static SemaphoreHandle_t xSpiMutex = NULL;

void SpiMutex_Init(void)
{
    xSpiMutex = xSemaphoreCreateMutex();
    configASSERT(xSpiMutex != NULL);
}

/* Generic SPI transfer with mutex protection
 * Returns HAL_OK on success, HAL_TIMEOUT if mutex not acquired */
HAL_StatusTypeDef SPI_Transfer(GPIO_TypeDef *csPort, uint16_t csPin,
                                uint8_t *txBuf, uint8_t *rxBuf,
                                uint16_t len, uint32_t timeoutMs)
{
    if (xSemaphoreTake(xSpiMutex, pdMS_TO_TICKS(timeoutMs)) != pdTRUE)
        return HAL_TIMEOUT;

    HAL_GPIO_WritePin(csPort, csPin, GPIO_PIN_RESET);  /* CS active low */

    HAL_StatusTypeDef status =
        HAL_SPI_TransmitReceive(&hspi1, txBuf, rxBuf, len, HAL_MAX_DELAY);

    HAL_GPIO_WritePin(csPort, csPin, GPIO_PIN_SET);    /* CS deassert   */
    xSemaphoreGive(xSpiMutex);

    return status;
}

/* Task A: reads accelerometer over shared SPI */
void vAccelTask(void *pvParameters)
{
    (void)pvParameters;
    uint8_t tx[2] = {0x0F | 0x80, 0x00};  /* WHO_AM_I read command */
    uint8_t rx[2];
    for (;;)
    {
        SPI_Transfer(GPIOE, GPIO_PIN_3, tx, rx, 2, 100);
        vTaskDelay(pdMS_TO_TICKS(50));
    }
}

/* Task B: controls external flash over same SPI bus */
void vFlashTask(void *pvParameters)
{
    (void)pvParameters;
    uint8_t tx[4] = {0x03, 0x00, 0x00, 0x00};  /* READ command + addr */
    uint8_t rx[4];
    for (;;)
    {
        SPI_Transfer(GPIOB, GPIO_PIN_6, tx, rx, 4, 200);
        vTaskDelay(pdMS_TO_TICKS(200));
    }
}

Stack Sizing & Memory

Running out of stack in an RTOS task is one of the hardest bugs to diagnose — the failure can manifest as corrupt data, a HardFault hours after the stack overflowed, or completely silent wrong behaviour. The FreeRTOS toolkit provides everything you need to detect and prevent stack overflows before they reach production.

FreeRTOS heap allocators: Select via the heap_N.c file you compile. heap_4 is the most suitable for typical applications: it implements a best-fit allocator with coalescing of adjacent free blocks, preventing fragmentation from repeated alloc/free cycles. configTOTAL_HEAP_SIZE in FreeRTOSConfig.h sets the total heap budget — all task stacks, queues, semaphores, and mutexes come from this pool.

uxTaskGetStackHighWaterMark(taskHandle) returns the minimum number of free words that have ever existed on a task's stack. A value near zero means the stack nearly overflowed. Run your system through all worst-case code paths (ISR storms, large printf calls, recursive decode) and then audit the high-water marks. Pad each task's stack by at least 20% above the measured high-water mark.

Stack overflow hook: Enable configCHECK_FOR_STACK_OVERFLOW 2 in FreeRTOSConfig.h. FreeRTOS writes a known pattern to the last 16 bytes of each task's stack and checks it on every context switch. If corrupted, it calls vApplicationStackOverflowHook() — implement this to assert or log the offending task name before the system corrupts further.

/* ---------------------------------------------------------------
 * Stack audit: print high-water mark for all tasks over UART.
 * vApplicationStackOverflowHook: assert + log on stack overflow.
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "task.h"
#include 
#include 

extern UART_HandleTypeDef huart2;

/* Call this from a debug task or on command to audit all stacks */
void PrintStackAudit(void)
{
    TaskStatus_t *pxTaskStatusArray;
    UBaseType_t   uxArraySize, i;
    char          buf[80];
    int           len;

    uxArraySize = uxTaskGetNumberOfTasks();
    pxTaskStatusArray = pvPortMalloc(uxArraySize * sizeof(TaskStatus_t));
    if (pxTaskStatusArray == NULL) return;

    /* Fill array with current task info */
    uxArraySize = uxTaskGetSystemState(pxTaskStatusArray,
                                       uxArraySize, NULL);

    len = snprintf(buf, sizeof(buf),
                   "\r\n--- Stack Audit ---\r\n"
                   "%-16s %-8s %-10s\r\n", "Task", "Pri", "HWM(words)");
    HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);

    for (i = 0; i < uxArraySize; i++)
    {
        len = snprintf(buf, sizeof(buf), "%-16s %-8lu %-10lu\r\n",
                       pxTaskStatusArray[i].pcTaskName,
                       (unsigned long)pxTaskStatusArray[i].uxCurrentPriority,
                       (unsigned long)pxTaskStatusArray[i].usStackHighWaterMark);
        HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);
    }

    len = snprintf(buf, sizeof(buf),
                   "Free heap: %lu bytes\r\n-------------------\r\n",
                   (unsigned long)xPortGetFreeHeapSize());
    HAL_UART_Transmit(&huart2, (uint8_t *)buf, len, HAL_MAX_DELAY);

    vPortFree(pxTaskStatusArray);
}

/* Stack overflow hook — called by FreeRTOS when stack sentinel is corrupted */
void vApplicationStackOverflowHook(TaskHandle_t xTask, char *pcTaskName)
{
    (void)xTask;
    /* Log the task name before asserting — at this point the stack
     * is already corrupt, so keep this code as simple as possible. */
    char msg[64];
    int len = snprintf(msg, sizeof(msg),
                       "STACK OVERFLOW: task='%s'\r\n", pcTaskName);
    HAL_UART_Transmit(&huart2, (uint8_t *)msg, len, 10);
    configASSERT(0);  /* triggers breakpoint in debug, resets via watchdog */
}

FreeRTOS + HAL Peripheral Integration

The most ergonomic integration pattern combines DMA-driven HAL peripherals with FreeRTOS semaphores or task notifications. The DMA transfer completes in hardware; the HAL callback fires in ISR context; the ISR signals a semaphore; the task wakes up, processes the data, and goes back to sleep — all without polling and without busy-waiting.

UART DMA + binary semaphore: Start a DMA TX with HAL_UART_Transmit_DMA(). When the transfer completes, HAL_UART_TxCpltCallback() fires in ISR context. Give a binary semaphore from the ISR. The task that initiated the transfer waits on the semaphore, allowing other tasks to run during the DMA operation.

ADC DMA + task notification: HAL_ADC_ConvCpltCallback() fires when a DMA buffer is full. Call xTaskNotifyFromISR() targeting the process task — this is lighter-weight than a semaphore for one-to-one signalling and carries a 32-bit notification value.

Accessing shared UART from multiple tasks: The DMA semaphore serialises the transfer itself. But if two tasks both call HAL_UART_Transmit_DMA() concurrently, the second call will fail with HAL_BUSY (UART state machine is not idle). Wrap the entire sequence — acquire mutex, start DMA, wait on semaphore, release mutex — so only one task at a time can initiate a UART transfer.

/* ---------------------------------------------------------------
 * DMA UART TX with mutex + completion semaphore.
 * Thread-safe UART write for multiple tasks.
 * --------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "semphr.h"

extern UART_HandleTypeDef huart2;

static SemaphoreHandle_t xUartMutex  = NULL;
static SemaphoreHandle_t xDmaDoneSem = NULL;

void UART_RTOS_Init(void)
{
    xUartMutex  = xSemaphoreCreateMutex();
    xDmaDoneSem = xSemaphoreCreateBinary();
    configASSERT(xUartMutex  != NULL);
    configASSERT(xDmaDoneSem != NULL);
}

/* Thread-safe DMA UART write — blocks until TX is complete */
HAL_StatusTypeDef UART_Write(const uint8_t *data, uint16_t len)
{
    HAL_StatusTypeDef status;

    /* Acquire mutex: only one task may use UART at a time */
    if (xSemaphoreTake(xUartMutex, pdMS_TO_TICKS(500)) != pdTRUE)
        return HAL_TIMEOUT;

    /* Start non-blocking DMA transfer */
    status = HAL_UART_Transmit_DMA(&huart2, data, len);
    if (status != HAL_OK)
    {
        xSemaphoreGive(xUartMutex);
        return status;
    }

    /* Wait for DMA completion callback to signal the semaphore.
     * Timeout 1 s: allows detection of DMA hang. */
    if (xSemaphoreTake(xDmaDoneSem, pdMS_TO_TICKS(1000)) != pdTRUE)
    {
        HAL_UART_AbortTransmit(&huart2);
        xSemaphoreGive(xUartMutex);
        return HAL_TIMEOUT;
    }

    xSemaphoreGive(xUartMutex);
    return HAL_OK;
}

/* DMA TX complete callback — runs in ISR context */
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart)
{
    if (huart->Instance == USART2)
    {
        BaseType_t xHigherPriorityTaskWoken = pdFALSE;
        xSemaphoreGiveFromISR(xDmaDoneSem, &xHigherPriorityTaskWoken);
        portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
    }
}

                        
                        Data must outlive the DMA transfer. Never pass a stack-allocated buffer to HAL_UART_Transmit_DMA() if the calling task might return before the DMA completes. Either use a static buffer, heap-allocated buffer, or block until the semaphore signals completion (as shown above).
                    

Exercises

Exercise 1 Beginner

Three Concurrent FreeRTOS Tasks

Create 3 FreeRTOS tasks: BlinkTask (500 ms period, toggles LED using vTaskDelayUntil), UartTask (prints "Task running [tick]" every 1 s over UART), ButtonTask (blocks on a binary semaphore signalled by an EXTI ISR on the user button). Move the HAL timebase to TIM6. Verify all three run concurrently: the LED blinks at exactly 500 ms, UART output appears every 1 s, and pressing the button prints a message without disrupting the other tasks.

xTaskCreate vTaskDelayUntil Binary Semaphore TIM6 Timebase

Exercise 2 Intermediate

Sensor Fusion Pipeline with Queues

Build a three-task sensor fusion pipeline: Task A reads MPU-6050 over I2C at 1 kHz using DMA and a binary semaphore. Task B receives raw accelerometer + gyroscope readings via a queue of 20, applies a complementary filter (alpha = 0.98) to compute roll and pitch angles. Task C receives filtered angles from a second queue and outputs them over UART at 100 Hz. After running for 60 seconds, print the queue high-water marks and verify no data loss occurred.

I2C DMA Queue Complementary Filter High-Water Mark

Exercise 3 Advanced

Diagnose Priority Inversion

Create 3 tasks to demonstrate and measure priority inversion: High (priority 10, acquires SPI mutex, performs a 1 ms SPI read), Medium (priority 5, CPU-bound loop incrementing a counter), Low (priority 1, holds the SPI mutex while performing a simulated 50 ms flash erase delay). First, run this scenario with a binary semaphore instead of a mutex — measure the latency from High task becoming Ready to actually Running using a GPIO toggle and oscilloscope. Then switch to a proper mutex (with priority inheritance) and measure again. Document the latency difference and explain why priority inheritance eliminates the inversion.

Priority Inversion Priority Inheritance Mutex Oscilloscope Measurement

RTOS Design Tool

Document your FreeRTOS design — tasks, queues, mutexes, heap configuration, and timer base selection. Download as Word, Excel, PDF, or PPTX for design reviews or team handoff.

STM32 FreeRTOS Design Generator

Document your RTOS architecture — tasks, IPC primitives, heap configuration. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Project Name *

FreeRTOS Version *

Task List

Queue List

Mutexes / Semaphores

Heap Allocator

configTOTAL_HEAP_SIZE (bytes)

HAL Timebase Source

Design Notes

Author Name

Conclusion & Next Steps

In this article we have built a complete FreeRTOS foundation for STM32 development:

The HAL timebase conflict is the most critical issue: always move the HAL tick source to TIM6 (or another basic timer) when using FreeRTOS, and configure NVIC priorities so TIM6 has higher urgency than SysTick.
Tasks created with xTaskCreate() each have their own stack. Use vTaskDelayUntil() for periodic tasks to avoid accumulated drift, and uxTaskGetStackHighWaterMark() to size stacks correctly.
Queues are the correct way to pass data between tasks or from ISR to task. Always use FromISR variants in interrupt context and call portYIELD_FROM_ISR() to trigger immediate context switches.
Mutexes protect shared peripherals with priority inheritance. Never use a mutex from an ISR — use a binary semaphore for ISR-to-task signalling instead.
DMA + semaphore is the right pattern for non-blocking peripheral access: start DMA, let the CPU do other work, wake on completion callback.
Enable configCHECK_FOR_STACK_OVERFLOW 2 and implement vApplicationStackOverflowHook() for every production project.

Next in the Series

In Part 15: Bootloader Development, we design a fail-safe flash memory layout, implement a UART/USB DFU firmware update protocol, add CRC32 image validation, and build the jump-to-application mechanism — including dual-bank updates on STM32H7. Every production STM32 product needs a bootloader, and this is how to build one correctly.

Cookie Consent

Cookie Preferences

Table of Contents

STM32 Unleashed: HAL Driver Development

Architecture & CubeMX Setup

GPIO & Button Debounce

UART Communication

Timers, PWM & Input Capture

ADC & DAC

SPI Protocol

I2C Protocol

DMA & Memory Efficiency

Interrupt Management & NVIC

Low-Power Modes

RTC & Calendar

CAN Bus

USB CDC Virtual COM Port

FreeRTOS Integration

Bootloader Development

External Storage: SD & QSPI Flash

Ethernet & TCP/IP Stack

Production Readiness

Why FreeRTOS on STM32

HAL Timebase Conflict

Tasks & Scheduling

Queues

Semaphores & Mutexes

Stack Sizing & Memory

FreeRTOS + HAL Peripheral Integration

Exercises

Three Concurrent FreeRTOS Tasks

Sensor Fusion Pipeline with Queues

Diagnose Priority Inversion

RTOS Design Tool

STM32 FreeRTOS Design Generator

Conclusion & Next Steps

Next in the Series

Cookie Consent

Cookie Preferences

STM32 Part 14: FreeRTOS Integration

Table of Contents

STM32 Unleashed: HAL Driver Development

Architecture & CubeMX Setup

GPIO & Button Debounce

UART Communication

Timers, PWM & Input Capture

ADC & DAC

SPI Protocol

I2C Protocol

DMA & Memory Efficiency

Interrupt Management & NVIC

Low-Power Modes

RTC & Calendar

CAN Bus

USB CDC Virtual COM Port

FreeRTOS Integration

Bootloader Development

External Storage: SD & QSPI Flash

Ethernet & TCP/IP Stack

Production Readiness

Why FreeRTOS on STM32

HAL Timebase Conflict

Tasks & Scheduling

Queues

Semaphores & Mutexes

Stack Sizing & Memory

FreeRTOS + HAL Peripheral Integration

Exercises

Three Concurrent FreeRTOS Tasks

Sensor Fusion Pipeline with Queues

Diagnose Priority Inversion

RTOS Design Tool

STM32 FreeRTOS Design Generator

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 15: Bootloader Development

Part 9: Interrupt Management & NVIC

Part 18: Production Readiness