CMSIS Part 5: CMSIS-RTOS2 — Message Queues & Event Flags — CMSIS Mastery Series Part 5

                        
                        Series Progress: This is Part 5 of our 20-part CMSIS Mastery Series. Parts 1–4 covered the ecosystem, CMSIS-Core registers, startup code, and RTOS2 thread primitives. Here we complete the RTOS2 IPC picture with queues and event flags.
                    

CMSIS Mastery Series

Your 20-step learning path • Currently on Step 5

1

5

CMSIS-RTOS2: Message Queues & Event Flags

Inter-thread comms, ISR-to-thread, real-time design patterns

You Are Here

6

CMSIS-DSP: Filters, FFT & Math Functions

FIR/IIR filters, FFT, SIMD optimizations

7

CMSIS-Driver: UART, SPI & I2C

Driver abstraction layer, callbacks, DMA integration

8

CMSIS-Pack & Software Components

Pack files, device support, dependency management

9

Debugging with CMSIS-DAP & CoreSight

SWD/JTAG, HardFault analysis, ITM tracing

10

Portable Firmware: Multi-Vendor Projects

HAL vs CMSIS, cross-platform BSPs, reusable libraries

11

Interrupts, Concurrency & Real-Time Constraints

Interrupt latency, critical sections, lock-free programming

12

Memory Management in Embedded Systems

Static vs dynamic, heap fragmentation, memory pools

13

Low Power & Energy Optimization

Sleep modes, clock gating, tickless RTOS, power profiling

14

DMA & High-Performance Data Handling

DMA basics, peripheral transfers, zero-copy techniques

15

Security: ARMv8-M & TrustZone

Secure/non-secure worlds, secure boot, firmware protection

16

Bootloaders & Firmware Updates

OTA updates, dual-bank flash, fail-safe strategies

17

Testing & Validation

Unity/Ceedling unit tests, HIL testing, integration testing

18

Performance Optimization

Compiler flags, inline assembly, cache (M7/M33), profiling

19

Embedded Software Architecture

Layered design, event-driven, state machines, component-based

20

Tooling & Workflow (Professional Level)

CI/CD for embedded, MISRA, static analysis, Doxygen

Message Queues

When two RTOS threads need to exchange structured data safely — without shared globals and without the data races that come with them — the right tool is a message queue. A CMSIS-RTOS2 message queue is a kernel-managed FIFO buffer of fixed-size messages. The producer thread puts messages in; the consumer thread gets them out. If the queue is full, the producer blocks (or times out). If the queue is empty, the consumer blocks. The kernel handles the synchronisation transparently.

Message queues differ fundamentally from mutexes and semaphores: they carry data, not just signals. Each message slot holds exactly msg_size bytes — the same size for every message in that queue. This fixed-size discipline enables the kernel to allocate the queue from a statically-sized memory block with no heap fragmentation.

                        
                        Design Rule: Keep message sizes small — typically a pointer plus a few metadata bytes. For large payloads, pass a pointer to a statically allocated buffer or a memory pool block, not the data itself. This avoids copying large arrays through the queue and keeps queue RAM bounded.
                    

The core API is three functions: osMessageQueueNew() to create the queue, osMessageQueuePut() to send, and osMessageQueueGet() to receive. Both put and get accept a timeout in RTOS ticks — use osWaitForever for permanent blocking or 0 for a non-blocking poll.

/* ── message_queue_example.c ─────────────────────────────────────────────
 * Producer–consumer with a fixed-size message struct.
 * Demonstrates osMessageQueueNew / Put / Get with blocking timeout.
 * ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include 
#include 

/* ── Message struct (must be a fixed, POD type) ─────────────────────── */
typedef struct {
    uint8_t  sensor_id;    /* which sensor fired               */
    int16_t  raw_value;    /* ADC reading, -32768..32767       */
    uint32_t timestamp_ms; /* osKernelGetTickCount() at capture */
} SensorMsg_t;

/* ── Queue handle (global, created before threads start) ────────────── */
static osMessageQueueId_t g_sensor_queue;

/* ── Producer: simulates ADC sampling at 100 Hz ─────────────────────── */
static void producer_thread(void *arg)
{
    (void)arg;
    SensorMsg_t msg;
    uint32_t    sample_count = 0U;

    for (;;) {
        /* Build a message */
        msg.sensor_id    = 0x01U;
        msg.raw_value    = (int16_t)(sample_count & 0x7FFFU); /* fake data */
        msg.timestamp_ms = osKernelGetTickCount();

        /* Put with 10 ms timeout — if queue is full we skip this sample */
        osStatus_t status = osMessageQueuePut(g_sensor_queue, &msg,
                                              0U,   /* msg priority */
                                              10U); /* timeout ticks */
        if (status == osErrorTimeout) {
            /* Queue full: consumer is too slow — log or increment overrun counter */
        }

        sample_count++;
        osDelay(10U); /* 10 ms → 100 Hz */
    }
}

/* ── Consumer: processes sensor messages ─────────────────────────────── */
static void consumer_thread(void *arg)
{
    (void)arg;
    SensorMsg_t msg;

    for (;;) {
        /* Block indefinitely until a message is available */
        osStatus_t status = osMessageQueueGet(g_sensor_queue, &msg,
                                              NULL,          /* priority out */
                                              osWaitForever);
        if (status == osOK) {
            /* Process: apply calibration, log, transmit, etc. */
            int32_t calibrated = (int32_t)msg.raw_value * 1000 / 4096;
            (void)calibrated; /* use in real application */
        }
    }
}

/* ── Initialisation (call from main or an init thread) ──────────────── */
void ipc_init(void)
{
    /* 16-slot queue, each slot = sizeof(SensorMsg_t) bytes */
    g_sensor_queue = osMessageQueueNew(16U, sizeof(SensorMsg_t), NULL);

    /* Static thread attributes for deterministic stack allocation */
    static uint64_t producer_stack[256]; /* 256 * 8 = 2 KB */
    static uint64_t consumer_stack[256];

    static osThreadAttr_t prod_attr = {
        .name       = "Producer",
        .stack_mem  = producer_stack,
        .stack_size = sizeof(producer_stack),
        .priority   = osPriorityNormal
    };
    static osThreadAttr_t cons_attr = {
        .name       = "Consumer",
        .stack_mem  = consumer_stack,
        .stack_size = sizeof(consumer_stack),
        .priority   = osPriorityBelowNormal
    };

    osThreadNew(producer_thread, NULL, &prod_attr);
    osThreadNew(consumer_thread, NULL, &cons_attr);
}

Notice how the consumer runs at a lower priority than the producer. The kernel will preempt the consumer when the producer becomes ready, but the queue provides the decoupling: even if the producer bursts several messages before the consumer runs, they are safely buffered in the FIFO. The queue depth (16 slots here) is the burst-absorbing capacity — size it for your worst-case burst, not just the average rate.

Mail Queues & Memory Pools

Standard message queues copy data into the kernel's internal buffer. For large or variable-length payloads this is wasteful — you pay for the copy and must size every queue slot for the largest possible message. The solution is the zero-copy pattern: use a memory pool to allocate a block, fill it with data, then pass only the pointer through a queue of pointers.

CMSIS-RTOS2 provides osMemoryPoolNew() for exactly this purpose. A memory pool is a fixed-count, fixed-size block allocator backed by a statically defined array. Unlike malloc(), pool allocation is O(1), deterministic, and never fragments — it simply takes or returns a block from a free list. The pool is sized at creation time; if all blocks are in use, osMemoryPoolAlloc() blocks until a block is returned.

/* ── memory_pool_queue.c ─────────────────────────────────────────────────
 * Zero-copy message pattern: memory pool + pointer queue.
 * Avoids copying large payloads through the queue.
 * ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include 
#include 

#define POOL_BLOCKS   8U    /* maximum in-flight messages      */
#define PAYLOAD_BYTES 128U  /* bytes per message block         */

typedef struct {
    uint8_t  channel;
    uint16_t length;               /* actual payload length          */
    uint8_t  data[PAYLOAD_BYTES];  /* variable-use payload area      */
} NetPacket_t;

static osMemoryPoolId_t  g_pkt_pool;   /* block allocator               */
static osMessageQueueId_t g_pkt_queue; /* queue carries NetPacket_t *   */

/* ── Receiver: fills a pool block, enqueues pointer ─────────────────── */
static void net_rx_thread(void *arg)
{
    (void)arg;

    for (;;) {
        /* Allocate a block — blocks up to 50 ms if pool exhausted */
        NetPacket_t *pkt = osMemoryPoolAlloc(g_pkt_pool, 50U);
        if (pkt == NULL) {
            /* Pool exhausted — drop incoming data, increment error counter */
            continue;
        }

        /* Simulate receiving data from a DMA buffer */
        pkt->channel = 2U;
        pkt->length  = 64U;
        memset(pkt->data, 0xABU, pkt->length);

        /* Enqueue the pointer — zero copy, no memcpy of payload */
        osStatus_t st = osMessageQueuePut(g_pkt_queue, &pkt,
                                          0U, osWaitForever);
        if (st != osOK) {
            /* Queue put failed — free block immediately */
            osMemoryPoolFree(g_pkt_pool, pkt);
        }
    }
}

/* ── Processor: dequeues pointer, processes, frees ───────────────────── */
static void net_process_thread(void *arg)
{
    (void)arg;
    NetPacket_t *pkt;

    for (;;) {
        osStatus_t st = osMessageQueueGet(g_pkt_queue, &pkt,
                                          NULL, osWaitForever);
        if (st == osOK) {
            /* Process without copying — work directly on pool block */
            uint32_t checksum = 0U;
            for (uint16_t i = 0; i < pkt->length; i++) {
                checksum += pkt->data[i];
            }
            (void)checksum;

            /* CRITICAL: always free back to pool after use */
            osMemoryPoolFree(g_pkt_pool, pkt);
        }
    }
}

void pool_queue_init(void)
{
    g_pkt_pool  = osMemoryPoolNew(POOL_BLOCKS, sizeof(NetPacket_t), NULL);
    /* Queue carries pointers — msg_size = sizeof(NetPacket_t *) */
    g_pkt_queue = osMessageQueueNew(POOL_BLOCKS, sizeof(NetPacket_t *), NULL);

    osThreadNew(net_rx_thread,      NULL, NULL);
    osThreadNew(net_process_thread, NULL, NULL);
}

                        
                        Common Mistake: Forgetting to call osMemoryPoolFree() after processing. Each unreturned block is a pool leak. Once all POOL_BLOCKS blocks are outstanding, every subsequent osMemoryPoolAlloc() call will time out, stalling the producer. Always free in the consumer, even on error paths.
                    

Event Flags

Message queues transfer data. Event flags transfer signals. An event flag group is a 32-bit bitmask — each bit is an independent boolean signal. Threads can set any combination of bits using osEventFlagsSet() and wait for any combination using osEventFlagsWait(). The wait call can block until any of the specified bits are set (OR mode) or until all of them are set simultaneously (AND mode).

The practical advantage over semaphores: a single event group replaces multiple semaphores when you have many independent conditions. A UI task, for example, might wait for any of: button pressed, timer expired, BLE data arrived, or sensor alert — four conditions expressed as four bits, waited with a single osEventFlagsWait() call with osFlagsWaitAny.

/* ── event_flags_example.c ───────────────────────────────────────────────
 * Demonstrates osEventFlagsNew, Set, and Wait with OR and AND patterns.
 * ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include 

/* Bit definitions — each condition gets its own bit position */
#define EVT_BUTTON_PRESSED  (1UL << 0)  /* bit 0 */
#define EVT_TIMER_EXPIRED   (1UL << 1)  /* bit 1 */
#define EVT_BLE_DATA_READY  (1UL << 2)  /* bit 2 */
#define EVT_SENSOR_ALERT    (1UL << 3)  /* bit 3 */

/* AND example: both conditions required before proceeding */
#define EVT_INIT_A_DONE     (1UL << 8)  /* bit 8 */
#define EVT_INIT_B_DONE     (1UL << 9)  /* bit 9 */

static osEventFlagsId_t g_ui_events;
static osEventFlagsId_t g_init_sync;

/* ── UI task: waits for any of four conditions ───────────────────────── */
static void ui_task(void *arg)
{
    (void)arg;

    for (;;) {
        /* OR wait: unblock when ANY of the four bits is set.
         * osFlagsWaitAny | osFlagsNoClear — clear flags manually below
         * or omit osFlagsNoClear to auto-clear matched bits.          */
        uint32_t flags = osEventFlagsWait(
            g_ui_events,
            EVT_BUTTON_PRESSED | EVT_TIMER_EXPIRED |
            EVT_BLE_DATA_READY | EVT_SENSOR_ALERT,
            osFlagsWaitAny,     /* unblock on ANY matching bit */
            osWaitForever);

        if (flags & osFlagsError) { continue; } /* handle errors */

        if (flags & EVT_BUTTON_PRESSED) {
            /* Handle button press — debounce, state machine transition */
        }
        if (flags & EVT_TIMER_EXPIRED) {
            /* Handle periodic tick — update display, run state checks */
        }
        if (flags & EVT_BLE_DATA_READY) {
            /* Handle incoming BLE notification */
        }
        if (flags & EVT_SENSOR_ALERT) {
            /* Handle out-of-range sensor alert */
        }
    }
}

/* ── Init supervisor: waits for ALL init tasks to complete ──────────── */
static void supervisor_task(void *arg)
{
    (void)arg;

    /* AND wait: block until BOTH init_A and init_B bits are set */
    uint32_t flags = osEventFlagsWait(
        g_init_sync,
        EVT_INIT_A_DONE | EVT_INIT_B_DONE,
        osFlagsWaitAll,     /* ALL bits must be set before returning */
        5000U);             /* 5-second timeout — fault if exceeded  */

    if (flags & osFlagsError) {
        /* Initialisation timed out — enter safe state or reset */
        for (;;) { osDelay(1000U); }
    }

    /* Both inits complete — start application */
}

/* ── Peripheral init A ───────────────────────────────────────────────── */
static void init_a_task(void *arg)
{
    (void)arg;
    osDelay(200U); /* simulate hardware init time */
    osEventFlagsSet(g_init_sync, EVT_INIT_A_DONE);
    osThreadTerminate(osThreadGetId());
}

/* ── Peripheral init B ───────────────────────────────────────────────── */
static void init_b_task(void *arg)
{
    (void)arg;
    osDelay(350U); /* simulate longer hardware init */
    osEventFlagsSet(g_init_sync, EVT_INIT_B_DONE);
    osThreadTerminate(osThreadGetId());
}

void event_flags_init(void)
{
    g_ui_events = osEventFlagsNew(NULL);
    g_init_sync = osEventFlagsNew(NULL);

    osThreadNew(ui_task,         NULL, NULL);
    osThreadNew(supervisor_task, NULL, NULL);
    osThreadNew(init_a_task,     NULL, NULL);
    osThreadNew(init_b_task,     NULL, NULL);
}

                        
                        Auto-Clear vs Manual-Clear: By default, osEventFlagsWait() clears the matched bits before returning (auto-clear). Add osFlagsNoClear to the flags parameter if multiple tasks must see the same event — then clear explicitly with osEventFlagsClear() after all consumers have handled it.
                    

ISR-to-Thread Communication

Interrupt service routines run outside the RTOS scheduler context. Most CMSIS-RTOS2 functions are not safe to call from an ISR — they may attempt to acquire a mutex or block, which has no meaning in interrupt context. The API documentation marks each function with an "ISR" column: only functions explicitly listed as ISR-safe may be called from interrupt handlers.

The general pattern for ISR-to-thread communication is deferred interrupt handling: the ISR does the minimum work necessary (read hardware status, acknowledge the interrupt, pass data), then wakes a thread to do the heavy processing. This keeps interrupt latency low and keeps all complex logic in the safe, schedulable thread context.

Mechanism	ISR Safe?	Carries Data?	Blocking in ISR?	Typical Use
osMessageQueuePut	Yes (timeout=0)	Yes	No (use timeout=0)	Pass data struct from ISR to handler thread
osEventFlagsSet	Yes	No (signals only)	No	Signal thread that event occurred
osSemaphoreRelease	Yes	No	No	Count-based signaling (DMA complete, etc.)
osThreadFlagsSet	Yes	No	No	Wake a specific thread by handle
osMutexAcquire	No	N/A	N/A	Never call from ISR
osDelay	No	N/A	N/A	Never call from ISR

/* ── isr_to_thread.c ─────────────────────────────────────────────────────
 * UART Rx ISR posts raw bytes to a message queue.
 * Handler thread processes characters in thread context.
 * ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include "stm32f407xx.h"  /* device header for USART registers */
#include 

typedef struct {
    uint8_t  byte;
    uint32_t timestamp_ticks;
} UartRxMsg_t;

static osMessageQueueId_t g_uart_rx_queue;

/* ── UART ISR — runs at interrupt priority, minimal work ─────────────── */
void USART2_IRQHandler(void)
{
    if (USART2->SR & USART_SR_RXNE) { /* Rx not empty flag */
        UartRxMsg_t msg;
        msg.byte             = (uint8_t)(USART2->DR & 0xFFU);
        msg.timestamp_ticks  = osKernelGetTickCount(); /* ISR-safe */

        /* Non-blocking put (timeout = 0) — ISR must never block */
        osMessageQueuePut(g_uart_rx_queue, &msg, 0U, 0U);
        /* If queue full, byte is dropped — monitor with a drop counter */
    }
}

/* ── UART Rx handler thread — full processing in thread context ───────── */
static void uart_rx_task(void *arg)
{
    (void)arg;
    UartRxMsg_t msg;
    uint8_t     line_buf[128];
    uint8_t     pos = 0U;

    for (;;) {
        osMessageQueueGet(g_uart_rx_queue, &msg, NULL, osWaitForever);

        if (msg.byte == '\n' || pos >= (sizeof(line_buf) - 1U)) {
            line_buf[pos] = '\0';
            /* Process complete line: parse command, update state, etc. */
            pos = 0U;
        } else {
            line_buf[pos++] = msg.byte;
        }
    }
}

void uart_ipc_init(void)
{
    /* 32-byte queue — absorbs bursts between scheduler ticks */
    g_uart_rx_queue = osMessageQueueNew(32U, sizeof(UartRxMsg_t), NULL);
    osThreadNew(uart_rx_task, NULL, NULL);

    /* Configure USART2 Rx interrupt — vendor-specific, not shown */
    NVIC_SetPriority(USART2_IRQn, 5U); /* below configMAX_SYSCALL_INTERRUPT_PRIORITY */
    NVIC_EnableIRQ(USART2_IRQn);
}

                        
                        FreeRTOS Users: When using CMSIS-RTOS2 over FreeRTOS, the NVIC priority of any ISR that calls an RTOS function must be numerically greater than or equal to configMAX_SYSCALL_INTERRUPT_PRIORITY (i.e., lower hardware priority). ISRs with higher priority (lower numerical value) must never call any OS function, even ISR-safe ones.
                    

Real-Time Design Patterns

The IPC primitives we've covered — message queues, memory pools, event flags — are building blocks. Professional RTOS firmware combines them into well-known design patterns that have proven reliable across thousands of production embedded systems. Understanding these patterns lets you design firmware architectures that are maintainable, testable, and composable.

Producer-Consumer Pipeline

The canonical RTOS pattern. One or more producer threads acquire data (from sensors, interfaces, or timers) and post it to a queue. One or more consumer threads drain the queue, performing processing (filtering, encoding, formatting). Queues provide backpressure: if the consumer is busy, the producer blocks rather than discarding data silently.

/* ── producer_consumer_pipeline.c ────────────────────────────────────────
 * Three-thread pipeline: Sampler → Processor → Transmitter
 * Two queues connect the three stages.
 * ──────────────────────────────────────────────────────────────────────── */
#include "cmsis_os2.h"
#include 

typedef struct { int16_t raw[8]; uint32_t ts; } RawFrame_t;
typedef struct { float   filtered[8]; uint32_t ts; } ProcessedFrame_t;

static osMessageQueueId_t  g_raw_q;       /* Sampler → Processor  */
static osMessageQueueId_t  g_proc_q;      /* Processor → Transmitter */
static osSemaphoreId_t     g_tx_sem;      /* Transmitter DMA-complete */

/* Stage 1: Sampler — acquires 8-channel ADC frame at 1 kHz */
static void sampler_task(void *arg)
{
    (void)arg;
    RawFrame_t frame;
    uint32_t   n = 0U;

    for (;;) {
        /* Simulate ADC read */
        for (int i = 0; i < 8; i++) { frame.raw[i] = (int16_t)(n + i); }
        frame.ts = osKernelGetTickCount();
        osMessageQueuePut(g_raw_q, &frame, 0U, 5U);
        n++;
        osDelay(1U); /* 1 ms period */
    }
}

/* Stage 2: Processor — applies calibration and low-pass filter */
static void processor_task(void *arg)
{
    (void)arg;
    RawFrame_t      raw;
    ProcessedFrame_t proc;
    static float    prev[8] = {0.0f};
    const float     alpha   = 0.1f; /* IIR coefficient */

    for (;;) {
        osMessageQueueGet(g_raw_q, &raw, NULL, osWaitForever);

        for (int i = 0; i < 8; i++) {
            float s = (float)raw.raw[i] * (3.3f / 4096.0f); /* scale to volts */
            prev[i] = alpha * s + (1.0f - alpha) * prev[i]; /* IIR filter */
            proc.filtered[i] = prev[i];
        }
        proc.ts = raw.ts;

        osMessageQueuePut(g_proc_q, &proc, 0U, 10U);
    }
}

/* Stage 3: Transmitter — serialises and sends over UART DMA */
static void transmitter_task(void *arg)
{
    (void)arg;
    ProcessedFrame_t proc;

    for (;;) {
        osMessageQueueGet(g_proc_q, &proc, NULL, osWaitForever);

        /* Kick DMA transfer (vendor-specific), then wait for completion */
        /* DMA_start_transfer((uint8_t*)&proc, sizeof(proc)); */
        osSemaphoreAcquire(g_tx_sem, 100U); /* posted by DMA ISR */
    }
}

void pipeline_init(void)
{
    g_raw_q  = osMessageQueueNew(8U,  sizeof(RawFrame_t),       NULL);
    g_proc_q = osMessageQueueNew(4U,  sizeof(ProcessedFrame_t), NULL);
    g_tx_sem = osSemaphoreNew(1U, 0U, NULL); /* initially unavailable */

    static const osThreadAttr_t sampler_attr = {
        .priority = osPriorityHigh, .stack_size = 512U };
    static const osThreadAttr_t proc_attr = {
        .priority = osPriorityNormal, .stack_size = 1024U };
    static const osThreadAttr_t tx_attr = {
        .priority = osPriorityBelowNormal, .stack_size = 512U };

    osThreadNew(sampler_task,    NULL, &sampler_attr);
    osThreadNew(processor_task,  NULL, &proc_attr);
    osThreadNew(transmitter_task,NULL, &tx_attr);
}

Event-Loop Pattern

Many embedded UI and state-machine tasks are best structured as an event loop: a single thread waits for any incoming event (button, timer, network, sensor) using osEventFlagsWait() with osFlagsWaitAny, then dispatches to a handler function based on which bits are set. This eliminates the need for multiple polling threads and keeps state in one place.

Work Queue Pattern

A work queue is a message queue where each message is a function pointer (or a small struct containing one). A pool of worker threads drains the queue, calling each function in turn. This pattern defers non-urgent work — logging, display updates, non-time-critical computations — from high-priority contexts to lower-priority workers. It is the RTOS equivalent of a thread pool.

Mechanism	Capacity	Blocking	ISR Safe	Carries Typed Data	Best For
osMessageQueue	N fixed-size slots	Put & Get	Yes (timeout=0)	Yes (fixed-size struct)	Thread-to-thread data transfer
osMemoryPool	N fixed-size blocks	Alloc only	Yes (timeout=0)	Yes (any type)	Zero-copy large payload queue
osEventFlags	32 bits	Wait only	Yes (Set)	No (signals only)	Multi-condition wake, state signaling
osMutex	1 (binary)	Acquire	No	No	Shared resource exclusive access
osSemaphore	N (counting)	Acquire	Yes (Release)	No	Resource counting, rate limiting

Exercises

Exercise 1 Beginner

UART Rx ISR Posting to a Message Queue

Implement a UART receive system where the ISR posts each received byte (plus a timestamp) to a CMSIS-RTOS2 message queue using osMessageQueuePut() with timeout=0. Create a handler thread that reads from the queue and buffers bytes into a line array, processing each complete line terminated by '\n'. Test with a terminal sending strings at 115200 baud. Verify no bytes are lost at normal typing speed and characterise the maximum burst rate before drops occur (hint: queue depth matters).

ISR Safety osMessageQueuePut UART

Exercise 2 Intermediate

Event-Flag-Driven LED State Machine

Design a three-state LED state machine (IDLE, BLINK_SLOW, BLINK_FAST) driven by event flags. Define three bits: EVT_BTN_SHORT, EVT_BTN_LONG, and EVT_TIMEOUT. A button ISR sets the appropriate flag on press duration. A timer thread sets EVT_TIMEOUT every 5 seconds. The LED task uses osEventFlagsWait(osFlagsWaitAny) and transitions the state machine on each event. Ensure that releasing the button correctly debounces via a 20 ms timer before setting the flag.

osEventFlagsWait State Machine Debounce

Exercise 3 Advanced

Three-Stage Pipeline with Work Queues

Implement the three-stage pipeline from the article (Sampler → Processor → Transmitter) and extend it with a work queue: after the transmitter stage, post a logging work item (function pointer + data) to a work queue of depth 4. A low-priority logger thread drains the work queue and writes formatted lines to a UART. Benchmark: use SysTick cycle counters to measure end-to-end latency from sample capture to UART byte transmitted. Profile what happens when you reduce the Processor stage queue depth from 8 to 2. Document the trade-off between latency, memory, and throughput.

Pipeline Architecture Work Queue Latency Profiling

RTOS IPC Design Canvas

Use this tool to document the IPC architecture of your RTOS project — message queues, memory pools, event flag groups, ISR sources, and design patterns chosen. Download as Word, Excel, PDF, or PPTX for design reviews or onboarding documentation.

RTOS IPC Design Canvas Generator

Document your RTOS inter-thread communication design. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Project Name *

Target MCU *

RTOS Kernel

Message Queues (name, msg size, depth)

Memory Pools (name, block size, count)

Event Flag Groups (group name, bit definitions)

ISR Sources & OS Calls Used

Design Pattern Description

Author Name

Conclusion & Next Steps

In this article we have completed the CMSIS-RTOS2 IPC toolkit:

Message queues provide type-safe, buffered, blocking data transfer between threads — size them for worst-case burst, not average rate.
Memory pools with pointer queues eliminate payload copying for large messages, giving O(1) allocation and bounded memory usage.
Event flags replace multiple semaphores with a 32-bit bitmask — use OR waits for multi-condition dispatch, AND waits for synchronisation barriers.
ISR-to-thread communication follows one rule: do the minimum in the ISR, post to a queue or set a flag, and let the scheduler dispatch a thread to handle it — always use timeout=0 in ISR context.
The producer-consumer, event-loop, and work-queue patterns are the recurring structural elements of professional RTOS firmware — learn to recognise and apply them instinctively.

Next in the Series

In Part 6: CMSIS-DSP — Filters, FFT & Math Functions, we leave the RTOS domain and enter signal processing: sampling theory, FIR and IIR filter design with arm_fir_f32 and arm_biquad_cascade_df2T_f32, real FFT for spectral analysis, and the SIMD optimisations that make CMSIS-DSP dramatically faster than naive C on Cortex-M4/M7.

Cookie Consent

Cookie Preferences

CMSIS Part 5: CMSIS-RTOS2 — Message Queues & Event Flags

Table of Contents