Series Progress: This is Part 5 of our 20-part CMSIS Mastery Series. Parts 1–4 covered the ecosystem, CMSIS-Core registers, startup code, and RTOS2 thread primitives. Here we complete the RTOS2 IPC picture with queues and event flags.
1
Overview & ARM Cortex-M Ecosystem
CMSIS layers, Cortex-M families, memory map, toolchains
Completed
2
CMSIS-Core: Registers, NVIC & SysTick
core_cmX.h, register access, interrupt controller, SysTick timer
Completed
3
Startup Code, Linker Scripts & Vector Table
Reset handler, BSS init, scatter files, boot process
Completed
4
CMSIS-RTOS2: Threads, Mutexes & Semaphores
Thread management, synchronization primitives, scheduling
Completed
5
CMSIS-RTOS2: Message Queues & Event Flags
Inter-thread comms, ISR-to-thread, real-time design patterns
You Are Here
6
CMSIS-DSP: Filters, FFT & Math Functions
FIR/IIR filters, FFT, SIMD optimizations
7
CMSIS-Driver: UART, SPI & I2C
Driver abstraction layer, callbacks, DMA integration
8
CMSIS-Pack & Software Components
Pack files, device support, dependency management
9
Debugging with CMSIS-DAP & CoreSight
SWD/JTAG, HardFault analysis, ITM tracing
10
Portable Firmware: Multi-Vendor Projects
HAL vs CMSIS, cross-platform BSPs, reusable libraries
11
Interrupts, Concurrency & Real-Time Constraints
Interrupt latency, critical sections, lock-free programming
12
Memory Management in Embedded Systems
Static vs dynamic, heap fragmentation, memory pools
13
Low Power & Energy Optimization
Sleep modes, clock gating, tickless RTOS, power profiling
14
DMA & High-Performance Data Handling
DMA basics, peripheral transfers, zero-copy techniques
15
Security: ARMv8-M & TrustZone
Secure/non-secure worlds, secure boot, firmware protection
16
Bootloaders & Firmware Updates
OTA updates, dual-bank flash, fail-safe strategies
17
Testing & Validation
Unity/Ceedling unit tests, HIL testing, integration testing
18
Performance Optimization
Compiler flags, inline assembly, cache (M7/M33), profiling
19
Embedded Software Architecture
Layered design, event-driven, state machines, component-based
20
Tooling & Workflow (Professional Level)
CI/CD for embedded, MISRA, static analysis, Doxygen
Message Queues
When two RTOS threads need to exchange structured data safely — without shared globals and without the data races that come with them — the right tool is a message queue. A CMSIS-RTOS2 message queue is a kernel-managed FIFO buffer of fixed-size messages. The producer thread puts messages in; the consumer thread gets them out. If the queue is full, the producer blocks (or times out). If the queue is empty, the consumer blocks. The kernel handles the synchronisation transparently.
Message queues differ fundamentally from mutexes and semaphores: they carry data, not just signals. Each message slot holds exactly msg_size bytes — the same size for every message in that queue. This fixed-size discipline enables the kernel to allocate the queue from a statically-sized memory block with no heap fragmentation.
Design Rule: Keep message sizes small — typically a pointer plus a few metadata bytes. For large payloads, pass a pointer to a statically allocated buffer or a memory pool block, not the data itself. This avoids copying large arrays through the queue and keeps queue RAM bounded.
The core API is three functions: osMessageQueueNew() to create the queue, osMessageQueuePut() to send, and osMessageQueueGet() to receive. Both put and get accept a timeout in RTOS ticks — use osWaitForever for permanent blocking or 0 for a non-blocking poll.
/* ── message_queue_example.c ─────────────────────────────────────────────
* Producer–consumer with a fixed-size message struct.
* Demonstrates osMessageQueueNew / Put / Get with blocking timeout.
* ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include
#include
/* ── Message struct (must be a fixed, POD type) ─────────────────────── */
typedef struct {
uint8_t sensor_id; /* which sensor fired */
int16_t raw_value; /* ADC reading, -32768..32767 */
uint32_t timestamp_ms; /* osKernelGetTickCount() at capture */
} SensorMsg_t;
/* ── Queue handle (global, created before threads start) ────────────── */
static osMessageQueueId_t g_sensor_queue;
/* ── Producer: simulates ADC sampling at 100 Hz ─────────────────────── */
static void producer_thread(void *arg)
{
(void)arg;
SensorMsg_t msg;
uint32_t sample_count = 0U;
for (;;) {
/* Build a message */
msg.sensor_id = 0x01U;
msg.raw_value = (int16_t)(sample_count & 0x7FFFU); /* fake data */
msg.timestamp_ms = osKernelGetTickCount();
/* Put with 10 ms timeout — if queue is full we skip this sample */
osStatus_t status = osMessageQueuePut(g_sensor_queue, &msg,
0U, /* msg priority */
10U); /* timeout ticks */
if (status == osErrorTimeout) {
/* Queue full: consumer is too slow — log or increment overrun counter */
}
sample_count++;
osDelay(10U); /* 10 ms → 100 Hz */
}
}
/* ── Consumer: processes sensor messages ─────────────────────────────── */
static void consumer_thread(void *arg)
{
(void)arg;
SensorMsg_t msg;
for (;;) {
/* Block indefinitely until a message is available */
osStatus_t status = osMessageQueueGet(g_sensor_queue, &msg,
NULL, /* priority out */
osWaitForever);
if (status == osOK) {
/* Process: apply calibration, log, transmit, etc. */
int32_t calibrated = (int32_t)msg.raw_value * 1000 / 4096;
(void)calibrated; /* use in real application */
}
}
}
/* ── Initialisation (call from main or an init thread) ──────────────── */
void ipc_init(void)
{
/* 16-slot queue, each slot = sizeof(SensorMsg_t) bytes */
g_sensor_queue = osMessageQueueNew(16U, sizeof(SensorMsg_t), NULL);
/* Static thread attributes for deterministic stack allocation */
static uint64_t producer_stack[256]; /* 256 * 8 = 2 KB */
static uint64_t consumer_stack[256];
static osThreadAttr_t prod_attr = {
.name = "Producer",
.stack_mem = producer_stack,
.stack_size = sizeof(producer_stack),
.priority = osPriorityNormal
};
static osThreadAttr_t cons_attr = {
.name = "Consumer",
.stack_mem = consumer_stack,
.stack_size = sizeof(consumer_stack),
.priority = osPriorityBelowNormal
};
osThreadNew(producer_thread, NULL, &prod_attr);
osThreadNew(consumer_thread, NULL, &cons_attr);
}
Notice how the consumer runs at a lower priority than the producer. The kernel will preempt the consumer when the producer becomes ready, but the queue provides the decoupling: even if the producer bursts several messages before the consumer runs, they are safely buffered in the FIFO. The queue depth (16 slots here) is the burst-absorbing capacity — size it for your worst-case burst, not just the average rate.
Mail Queues & Memory Pools
Standard message queues copy data into the kernel's internal buffer. For large or variable-length payloads this is wasteful — you pay for the copy and must size every queue slot for the largest possible message. The solution is the zero-copy pattern: use a memory pool to allocate a block, fill it with data, then pass only the pointer through a queue of pointers.
CMSIS-RTOS2 provides osMemoryPoolNew() for exactly this purpose. A memory pool is a fixed-count, fixed-size block allocator backed by a statically defined array. Unlike malloc(), pool allocation is O(1), deterministic, and never fragments — it simply takes or returns a block from a free list. The pool is sized at creation time; if all blocks are in use, osMemoryPoolAlloc() blocks until a block is returned.
/* ── memory_pool_queue.c ─────────────────────────────────────────────────
* Zero-copy message pattern: memory pool + pointer queue.
* Avoids copying large payloads through the queue.
* ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include
#include
#define POOL_BLOCKS 8U /* maximum in-flight messages */
#define PAYLOAD_BYTES 128U /* bytes per message block */
typedef struct {
uint8_t channel;
uint16_t length; /* actual payload length */
uint8_t data[PAYLOAD_BYTES]; /* variable-use payload area */
} NetPacket_t;
static osMemoryPoolId_t g_pkt_pool; /* block allocator */
static osMessageQueueId_t g_pkt_queue; /* queue carries NetPacket_t * */
/* ── Receiver: fills a pool block, enqueues pointer ─────────────────── */
static void net_rx_thread(void *arg)
{
(void)arg;
for (;;) {
/* Allocate a block — blocks up to 50 ms if pool exhausted */
NetPacket_t *pkt = osMemoryPoolAlloc(g_pkt_pool, 50U);
if (pkt == NULL) {
/* Pool exhausted — drop incoming data, increment error counter */
continue;
}
/* Simulate receiving data from a DMA buffer */
pkt->channel = 2U;
pkt->length = 64U;
memset(pkt->data, 0xABU, pkt->length);
/* Enqueue the pointer — zero copy, no memcpy of payload */
osStatus_t st = osMessageQueuePut(g_pkt_queue, &pkt,
0U, osWaitForever);
if (st != osOK) {
/* Queue put failed — free block immediately */
osMemoryPoolFree(g_pkt_pool, pkt);
}
}
}
/* ── Processor: dequeues pointer, processes, frees ───────────────────── */
static void net_process_thread(void *arg)
{
(void)arg;
NetPacket_t *pkt;
for (;;) {
osStatus_t st = osMessageQueueGet(g_pkt_queue, &pkt,
NULL, osWaitForever);
if (st == osOK) {
/* Process without copying — work directly on pool block */
uint32_t checksum = 0U;
for (uint16_t i = 0; i < pkt->length; i++) {
checksum += pkt->data[i];
}
(void)checksum;
/* CRITICAL: always free back to pool after use */
osMemoryPoolFree(g_pkt_pool, pkt);
}
}
}
void pool_queue_init(void)
{
g_pkt_pool = osMemoryPoolNew(POOL_BLOCKS, sizeof(NetPacket_t), NULL);
/* Queue carries pointers — msg_size = sizeof(NetPacket_t *) */
g_pkt_queue = osMessageQueueNew(POOL_BLOCKS, sizeof(NetPacket_t *), NULL);
osThreadNew(net_rx_thread, NULL, NULL);
osThreadNew(net_process_thread, NULL, NULL);
}
Common Mistake: Forgetting to call osMemoryPoolFree() after processing. Each unreturned block is a pool leak. Once all POOL_BLOCKS blocks are outstanding, every subsequent osMemoryPoolAlloc() call will time out, stalling the producer. Always free in the consumer, even on error paths.
Event Flags
Message queues transfer data. Event flags transfer signals. An event flag group is a 32-bit bitmask — each bit is an independent boolean signal. Threads can set any combination of bits using osEventFlagsSet() and wait for any combination using osEventFlagsWait(). The wait call can block until any of the specified bits are set (OR mode) or until all of them are set simultaneously (AND mode).
The practical advantage over semaphores: a single event group replaces multiple semaphores when you have many independent conditions. A UI task, for example, might wait for any of: button pressed, timer expired, BLE data arrived, or sensor alert — four conditions expressed as four bits, waited with a single osEventFlagsWait() call with osFlagsWaitAny.
/* ── event_flags_example.c ───────────────────────────────────────────────
* Demonstrates osEventFlagsNew, Set, and Wait with OR and AND patterns.
* ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include
/* Bit definitions — each condition gets its own bit position */
#define EVT_BUTTON_PRESSED (1UL << 0) /* bit 0 */
#define EVT_TIMER_EXPIRED (1UL << 1) /* bit 1 */
#define EVT_BLE_DATA_READY (1UL << 2) /* bit 2 */
#define EVT_SENSOR_ALERT (1UL << 3) /* bit 3 */
/* AND example: both conditions required before proceeding */
#define EVT_INIT_A_DONE (1UL << 8) /* bit 8 */
#define EVT_INIT_B_DONE (1UL << 9) /* bit 9 */
static osEventFlagsId_t g_ui_events;
static osEventFlagsId_t g_init_sync;
/* ── UI task: waits for any of four conditions ───────────────────────── */
static void ui_task(void *arg)
{
(void)arg;
for (;;) {
/* OR wait: unblock when ANY of the four bits is set.
* osFlagsWaitAny | osFlagsNoClear — clear flags manually below
* or omit osFlagsNoClear to auto-clear matched bits. */
uint32_t flags = osEventFlagsWait(
g_ui_events,
EVT_BUTTON_PRESSED | EVT_TIMER_EXPIRED |
EVT_BLE_DATA_READY | EVT_SENSOR_ALERT,
osFlagsWaitAny, /* unblock on ANY matching bit */
osWaitForever);
if (flags & osFlagsError) { continue; } /* handle errors */
if (flags & EVT_BUTTON_PRESSED) {
/* Handle button press — debounce, state machine transition */
}
if (flags & EVT_TIMER_EXPIRED) {
/* Handle periodic tick — update display, run state checks */
}
if (flags & EVT_BLE_DATA_READY) {
/* Handle incoming BLE notification */
}
if (flags & EVT_SENSOR_ALERT) {
/* Handle out-of-range sensor alert */
}
}
}
/* ── Init supervisor: waits for ALL init tasks to complete ──────────── */
static void supervisor_task(void *arg)
{
(void)arg;
/* AND wait: block until BOTH init_A and init_B bits are set */
uint32_t flags = osEventFlagsWait(
g_init_sync,
EVT_INIT_A_DONE | EVT_INIT_B_DONE,
osFlagsWaitAll, /* ALL bits must be set before returning */
5000U); /* 5-second timeout — fault if exceeded */
if (flags & osFlagsError) {
/* Initialisation timed out — enter safe state or reset */
for (;;) { osDelay(1000U); }
}
/* Both inits complete — start application */
}
/* ── Peripheral init A ───────────────────────────────────────────────── */
static void init_a_task(void *arg)
{
(void)arg;
osDelay(200U); /* simulate hardware init time */
osEventFlagsSet(g_init_sync, EVT_INIT_A_DONE);
osThreadTerminate(osThreadGetId());
}
/* ── Peripheral init B ───────────────────────────────────────────────── */
static void init_b_task(void *arg)
{
(void)arg;
osDelay(350U); /* simulate longer hardware init */
osEventFlagsSet(g_init_sync, EVT_INIT_B_DONE);
osThreadTerminate(osThreadGetId());
}
void event_flags_init(void)
{
g_ui_events = osEventFlagsNew(NULL);
g_init_sync = osEventFlagsNew(NULL);
osThreadNew(ui_task, NULL, NULL);
osThreadNew(supervisor_task, NULL, NULL);
osThreadNew(init_a_task, NULL, NULL);
osThreadNew(init_b_task, NULL, NULL);
}
Auto-Clear vs Manual-Clear: By default, osEventFlagsWait() clears the matched bits before returning (auto-clear). Add osFlagsNoClear to the flags parameter if multiple tasks must see the same event — then clear explicitly with osEventFlagsClear() after all consumers have handled it.
ISR-to-Thread Communication
Interrupt service routines run outside the RTOS scheduler context. Most CMSIS-RTOS2 functions are not safe to call from an ISR — they may attempt to acquire a mutex or block, which has no meaning in interrupt context. The API documentation marks each function with an "ISR" column: only functions explicitly listed as ISR-safe may be called from interrupt handlers.
The general pattern for ISR-to-thread communication is deferred interrupt handling: the ISR does the minimum work necessary (read hardware status, acknowledge the interrupt, pass data), then wakes a thread to do the heavy processing. This keeps interrupt latency low and keeps all complex logic in the safe, schedulable thread context.
| Mechanism |
ISR Safe? |
Carries Data? |
Blocking in ISR? |
Typical Use |
| osMessageQueuePut |
Yes (timeout=0) |
Yes |
No (use timeout=0) |
Pass data struct from ISR to handler thread |
| osEventFlagsSet |
Yes |
No (signals only) |
No |
Signal thread that event occurred |
| osSemaphoreRelease |
Yes |
No |
No |
Count-based signaling (DMA complete, etc.) |
| osThreadFlagsSet |
Yes |
No |
No |
Wake a specific thread by handle |
| osMutexAcquire |
No |
N/A |
N/A |
Never call from ISR |
| osDelay |
No |
N/A |
N/A |
Never call from ISR |
/* ── isr_to_thread.c ─────────────────────────────────────────────────────
* UART Rx ISR posts raw bytes to a message queue.
* Handler thread processes characters in thread context.
* ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include "stm32f407xx.h" /* device header for USART registers */
#include
typedef struct {
uint8_t byte;
uint32_t timestamp_ticks;
} UartRxMsg_t;
static osMessageQueueId_t g_uart_rx_queue;
/* ── UART ISR — runs at interrupt priority, minimal work ─────────────── */
void USART2_IRQHandler(void)
{
if (USART2->SR & USART_SR_RXNE) { /* Rx not empty flag */
UartRxMsg_t msg;
msg.byte = (uint8_t)(USART2->DR & 0xFFU);
msg.timestamp_ticks = osKernelGetTickCount(); /* ISR-safe */
/* Non-blocking put (timeout = 0) — ISR must never block */
osMessageQueuePut(g_uart_rx_queue, &msg, 0U, 0U);
/* If queue full, byte is dropped — monitor with a drop counter */
}
}
/* ── UART Rx handler thread — full processing in thread context ───────── */
static void uart_rx_task(void *arg)
{
(void)arg;
UartRxMsg_t msg;
uint8_t line_buf[128];
uint8_t pos = 0U;
for (;;) {
osMessageQueueGet(g_uart_rx_queue, &msg, NULL, osWaitForever);
if (msg.byte == '\n' || pos >= (sizeof(line_buf) - 1U)) {
line_buf[pos] = '\0';
/* Process complete line: parse command, update state, etc. */
pos = 0U;
} else {
line_buf[pos++] = msg.byte;
}
}
}
void uart_ipc_init(void)
{
/* 32-byte queue — absorbs bursts between scheduler ticks */
g_uart_rx_queue = osMessageQueueNew(32U, sizeof(UartRxMsg_t), NULL);
osThreadNew(uart_rx_task, NULL, NULL);
/* Configure USART2 Rx interrupt — vendor-specific, not shown */
NVIC_SetPriority(USART2_IRQn, 5U); /* below configMAX_SYSCALL_INTERRUPT_PRIORITY */
NVIC_EnableIRQ(USART2_IRQn);
}
FreeRTOS Users: When using CMSIS-RTOS2 over FreeRTOS, the NVIC priority of any ISR that calls an RTOS function must be numerically greater than or equal to configMAX_SYSCALL_INTERRUPT_PRIORITY (i.e., lower hardware priority). ISRs with higher priority (lower numerical value) must never call any OS function, even ISR-safe ones.
Real-Time Design Patterns
The IPC primitives we've covered — message queues, memory pools, event flags — are building blocks. Professional RTOS firmware combines them into well-known design patterns that have proven reliable across thousands of production embedded systems. Understanding these patterns lets you design firmware architectures that are maintainable, testable, and composable.
Producer-Consumer Pipeline
The canonical RTOS pattern. One or more producer threads acquire data (from sensors, interfaces, or timers) and post it to a queue. One or more consumer threads drain the queue, performing processing (filtering, encoding, formatting). Queues provide backpressure: if the consumer is busy, the producer blocks rather than discarding data silently.
/* ── producer_consumer_pipeline.c ────────────────────────────────────────
* Three-thread pipeline: Sampler → Processor → Transmitter
* Two queues connect the three stages.
* ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include
typedef struct { int16_t raw[8]; uint32_t ts; } RawFrame_t;
typedef struct { float filtered[8]; uint32_t ts; } ProcessedFrame_t;
static osMessageQueueId_t g_raw_q; /* Sampler → Processor */
static osMessageQueueId_t g_proc_q; /* Processor → Transmitter */
static osSemaphoreId_t g_tx_sem; /* Transmitter DMA-complete */
/* Stage 1: Sampler — acquires 8-channel ADC frame at 1 kHz */
static void sampler_task(void *arg)
{
(void)arg;
RawFrame_t frame;
uint32_t n = 0U;
for (;;) {
/* Simulate ADC read */
for (int i = 0; i < 8; i++) { frame.raw[i] = (int16_t)(n + i); }
frame.ts = osKernelGetTickCount();
osMessageQueuePut(g_raw_q, &frame, 0U, 5U);
n++;
osDelay(1U); /* 1 ms period */
}
}
/* Stage 2: Processor — applies calibration and low-pass filter */
static void processor_task(void *arg)
{
(void)arg;
RawFrame_t raw;
ProcessedFrame_t proc;
static float prev[8] = {0.0f};
const float alpha = 0.1f; /* IIR coefficient */
for (;;) {
osMessageQueueGet(g_raw_q, &raw, NULL, osWaitForever);
for (int i = 0; i < 8; i++) {
float s = (float)raw.raw[i] * (3.3f / 4096.0f); /* scale to volts */
prev[i] = alpha * s + (1.0f - alpha) * prev[i]; /* IIR filter */
proc.filtered[i] = prev[i];
}
proc.ts = raw.ts;
osMessageQueuePut(g_proc_q, &proc, 0U, 10U);
}
}
/* Stage 3: Transmitter — serialises and sends over UART DMA */
static void transmitter_task(void *arg)
{
(void)arg;
ProcessedFrame_t proc;
for (;;) {
osMessageQueueGet(g_proc_q, &proc, NULL, osWaitForever);
/* Kick DMA transfer (vendor-specific), then wait for completion */
/* DMA_start_transfer((uint8_t*)&proc, sizeof(proc)); */
osSemaphoreAcquire(g_tx_sem, 100U); /* posted by DMA ISR */
}
}
void pipeline_init(void)
{
g_raw_q = osMessageQueueNew(8U, sizeof(RawFrame_t), NULL);
g_proc_q = osMessageQueueNew(4U, sizeof(ProcessedFrame_t), NULL);
g_tx_sem = osSemaphoreNew(1U, 0U, NULL); /* initially unavailable */
static const osThreadAttr_t sampler_attr = {
.priority = osPriorityHigh, .stack_size = 512U };
static const osThreadAttr_t proc_attr = {
.priority = osPriorityNormal, .stack_size = 1024U };
static const osThreadAttr_t tx_attr = {
.priority = osPriorityBelowNormal, .stack_size = 512U };
osThreadNew(sampler_task, NULL, &sampler_attr);
osThreadNew(processor_task, NULL, &proc_attr);
osThreadNew(transmitter_task,NULL, &tx_attr);
}
Event-Loop Pattern
Many embedded UI and state-machine tasks are best structured as an event loop: a single thread waits for any incoming event (button, timer, network, sensor) using osEventFlagsWait() with osFlagsWaitAny, then dispatches to a handler function based on which bits are set. This eliminates the need for multiple polling threads and keeps state in one place.
Work Queue Pattern
A work queue is a message queue where each message is a function pointer (or a small struct containing one). A pool of worker threads drains the queue, calling each function in turn. This pattern defers non-urgent work — logging, display updates, non-time-critical computations — from high-priority contexts to lower-priority workers. It is the RTOS equivalent of a thread pool.
| Mechanism |
Capacity |
Blocking |
ISR Safe |
Carries Typed Data |
Best For |
| osMessageQueue |
N fixed-size slots |
Put & Get |
Yes (timeout=0) |
Yes (fixed-size struct) |
Thread-to-thread data transfer |
| osMemoryPool |
N fixed-size blocks |
Alloc only |
Yes (timeout=0) |
Yes (any type) |
Zero-copy large payload queue |
| osEventFlags |
32 bits |
Wait only |
Yes (Set) |
No (signals only) |
Multi-condition wake, state signaling |
| osMutex |
1 (binary) |
Acquire |
No |
No |
Shared resource exclusive access |
| osSemaphore |
N (counting) |
Acquire |
Yes (Release) |
No |
Resource counting, rate limiting |
Exercises
Exercise 1
Beginner
UART Rx ISR Posting to a Message Queue
Implement a UART receive system where the ISR posts each received byte (plus a timestamp) to a CMSIS-RTOS2 message queue using osMessageQueuePut() with timeout=0. Create a handler thread that reads from the queue and buffers bytes into a line array, processing each complete line terminated by '\n'. Test with a terminal sending strings at 115200 baud. Verify no bytes are lost at normal typing speed and characterise the maximum burst rate before drops occur (hint: queue depth matters).
ISR Safety
osMessageQueuePut
UART
Exercise 2
Intermediate
Event-Flag-Driven LED State Machine
Design a three-state LED state machine (IDLE, BLINK_SLOW, BLINK_FAST) driven by event flags. Define three bits: EVT_BTN_SHORT, EVT_BTN_LONG, and EVT_TIMEOUT. A button ISR sets the appropriate flag on press duration. A timer thread sets EVT_TIMEOUT every 5 seconds. The LED task uses osEventFlagsWait(osFlagsWaitAny) and transitions the state machine on each event. Ensure that releasing the button correctly debounces via a 20 ms timer before setting the flag.
osEventFlagsWait
State Machine
Debounce
Exercise 3
Advanced
Three-Stage Pipeline with Work Queues
Implement the three-stage pipeline from the article (Sampler → Processor → Transmitter) and extend it with a work queue: after the transmitter stage, post a logging work item (function pointer + data) to a work queue of depth 4. A low-priority logger thread drains the work queue and writes formatted lines to a UART. Benchmark: use SysTick cycle counters to measure end-to-end latency from sample capture to UART byte transmitted. Profile what happens when you reduce the Processor stage queue depth from 8 to 2. Document the trade-off between latency, memory, and throughput.
Pipeline Architecture
Work Queue
Latency Profiling
RTOS IPC Design Canvas
Use this tool to document the IPC architecture of your RTOS project — message queues, memory pools, event flag groups, ISR sources, and design patterns chosen. Download as Word, Excel, PDF, or PPTX for design reviews or onboarding documentation.
Conclusion & Next Steps
In this article we have completed the CMSIS-RTOS2 IPC toolkit:
- Message queues provide type-safe, buffered, blocking data transfer between threads — size them for worst-case burst, not average rate.
- Memory pools with pointer queues eliminate payload copying for large messages, giving O(1) allocation and bounded memory usage.
- Event flags replace multiple semaphores with a 32-bit bitmask — use OR waits for multi-condition dispatch, AND waits for synchronisation barriers.
- ISR-to-thread communication follows one rule: do the minimum in the ISR, post to a queue or set a flag, and let the scheduler dispatch a thread to handle it — always use timeout=0 in ISR context.
- The producer-consumer, event-loop, and work-queue patterns are the recurring structural elements of professional RTOS firmware — learn to recognise and apply them instinctively.
Next in the Series
In Part 6: CMSIS-DSP — Filters, FFT & Math Functions, we leave the RTOS domain and enter signal processing: sampling theory, FIR and IIR filter design with arm_fir_f32 and arm_biquad_cascade_df2T_f32, real FFT for spectral analysis, and the SIMD optimisations that make CMSIS-DSP dramatically faster than naive C on Cortex-M4/M7.
Related Articles in This Series
Part 4: CMSIS-RTOS2 — Threads, Mutexes & Semaphores
Master thread creation, priority management, mutexes with priority inheritance, and counting semaphores as the complement to the queues and flags covered here.
Read Article
Part 11: Interrupts, Concurrency & Real-Time Constraints
Deep-dive into interrupt latency budgeting, critical sections, lock-free ring buffers, and the real-time analysis techniques that validate your RTOS design.
Read Article
Part 14: DMA & High-Performance Data Handling
Extend the zero-copy queue pattern to DMA-backed transfers, scatter-gather descriptors, and techniques for achieving near-zero CPU overhead on high-bandwidth peripherals.
Read Article