Series Context: This is Part 4 of the 20-part CMSIS Mastery Series. Parts 1–3 established the hardware foundation. Now we build upward: multi-threading, synchronisation, and real-time scheduling via the CMSIS-RTOS2 API.
1
Overview & ARM Cortex-M Ecosystem
CMSIS layers, Cortex-M families, memory map, toolchains
Completed
2
CMSIS-Core: Registers, NVIC & SysTick
core_cmX.h, register access, interrupt controller, SysTick timer
Completed
3
Startup Code, Linker Scripts & Vector Table
Reset handler, BSS init, scatter files, boot process
Completed
4
CMSIS-RTOS2: Threads, Mutexes & Semaphores
Thread management, synchronization primitives, scheduling
You Are Here
5
CMSIS-RTOS2: Message Queues & Event Flags
Inter-thread comms, ISR-to-thread, real-time design patterns
6
CMSIS-DSP: Filters, FFT & Math Functions
FIR/IIR filters, FFT, SIMD optimizations
7
CMSIS-Driver: UART, SPI & I2C
Driver abstraction layer, callbacks, DMA integration
8
CMSIS-Pack & Software Components
Pack files, device support, dependency management
9
Debugging with CMSIS-DAP & CoreSight
SWD/JTAG, HardFault analysis, ITM tracing
10
Portable Firmware: Multi-Vendor Projects
HAL vs CMSIS, cross-platform BSPs, reusable libraries
11
Interrupts, Concurrency & Real-Time Constraints
Interrupt latency, critical sections, lock-free programming
12
Memory Management in Embedded Systems
Static vs dynamic, heap fragmentation, memory pools
13
Low Power & Energy Optimization
Sleep modes, clock gating, tickless RTOS, power profiling
14
DMA & High-Performance Data Handling
DMA basics, peripheral transfers, zero-copy techniques
15
Security: ARMv8-M & TrustZone
Secure/non-secure worlds, secure boot, firmware protection
16
Bootloaders & Firmware Updates
OTA updates, dual-bank flash, fail-safe strategies
17
Testing & Validation
Unity/Ceedling unit tests, HIL testing, integration testing
18
Performance Optimization
Compiler flags, inline assembly, cache (M7/M33), profiling
19
Embedded Software Architecture
Layered design, event-driven, state machines, component-based
20
Tooling & Workflow (Professional Level)
CI/CD for embedded, MISRA, static analysis, Doxygen
RTOS Fundamentals
An RTOS is not about speed — it is about determinism. A bare-metal super-loop can run faster than an RTOS on a single task, but it cannot guarantee that a critical task will always complete within a deadline when multiple activities compete for the CPU. An RTOS provides a scheduler that makes and enforces those guarantees.
Preemptive vs Cooperative Scheduling
CMSIS-RTOS2 implementations (FreeRTOS, RTX5, Zephyr) all default to preemptive priority-based scheduling — the highest-priority runnable thread always runs, and the scheduler can interrupt a lower-priority thread at any tick boundary. Cooperative scheduling (a thread runs until it explicitly yields) is also supported but uncommon in production firmware due to its inability to bound worst-case latency.
Context Switching on Cortex-M
A context switch on Cortex-M uses PendSV (the lowest-priority exception) to perform the actual register save/restore. The scheduler sets PendSV pending, and when it fires, it saves R4–R11 (R0–R3, R12, LR, PC, xPSR are saved by hardware on exception entry) to the current thread's stack and loads the next thread's saved state. On Cortex-M4F with FPU, the FPU lazy stacking mechanism saves floating-point registers only if the incoming thread actually uses the FPU — a significant performance optimisation.
CMSIS-RTOS2 Portability: The CMSIS-RTOS2 API (cmsis_os2.h) is a thin wrapper over the native RTOS API. Calling osThreadNew() on FreeRTOS maps to xTaskCreate(); on RTX5 it maps to a different internal call. Your application code stays identical when changing kernels — only the wrapper layer and linker configuration change.
Thread Management
Every concurrent activity in a CMSIS-RTOS2 application is a thread (RTOS2's term; FreeRTOS calls them tasks). Each thread has its own stack, its own priority, and an execution state (Running, Ready, Blocked, Terminated). The scheduler manages all state transitions.
Complete Thread Creation with osThreadNew
/*
* CMSIS-RTOS2 Thread Creation — complete example.
* Producer thread writes sensor data; LED thread blinks status.
*
* Include: cmsis_os2.h (and the RTOS-specific header, e.g., FreeRTOS.h)
*/
#include "cmsis_os2.h"
/* ── Static stack allocation (preferred over dynamic for predictability) */
#define PRODUCER_STACK_SIZE 512U /* bytes — tune with watermark analysis */
#define LED_STACK_SIZE 256U
static uint32_t producer_stack[PRODUCER_STACK_SIZE / sizeof(uint32_t)];
static uint32_t led_stack[LED_STACK_SIZE / sizeof(uint32_t)];
/* Thread control blocks (TCBs) — static allocation avoids heap fragmentation */
static osThreadId_t h_producer;
static osThreadId_t h_led;
/* ── Thread function prototypes */
static void producer_thread(void *argument);
static void led_thread(void *argument);
/* ── Thread attribute structures */
static const osThreadAttr_t producer_attr = {
.name = "Producer",
.stack_mem = producer_stack,
.stack_size = PRODUCER_STACK_SIZE,
.priority = osPriorityNormal, /* Priority level 24 of 56 */
.tz_module = 0U, /* TrustZone: non-secure (default) */
};
static const osThreadAttr_t led_attr = {
.name = "LED",
.stack_mem = led_stack,
.stack_size = LED_STACK_SIZE,
.priority = osPriorityLow, /* Below Producer */
};
/* ── Application entry point (called from main after kernel init) */
void app_start(void *argument) {
/* Create threads — both start in Ready state */
h_producer = osThreadNew(producer_thread, NULL, &producer_attr);
h_led = osThreadNew(led_thread, NULL, &led_attr);
if (h_producer == NULL || h_led == NULL) {
/* Thread creation failed — insufficient memory or invalid params */
Error_Handler();
}
/* This thread (app_start) can terminate — RTOS continues running */
osThreadExit();
}
/* ── Producer thread: reads ADC, signals LED thread via semaphore */
static volatile uint16_t g_sensor_value = 0U;
static void producer_thread(void *argument) {
(void)argument;
for (;;) {
g_sensor_value = ADC_Read();
osDelay(10U); /* 10 ms sampling period — yields CPU to lower tasks */
}
}
/* ── LED thread: blinks at 1 Hz to indicate system health */
static void led_thread(void *argument) {
(void)argument;
for (;;) {
LED_Toggle();
osDelay(500U); /* 500 ms on, 500 ms off */
}
}
/* ── main(): initialise kernel and start scheduler */
int main(void) {
SystemInit();
SysTick_Config(SystemCoreClock / 1000U);
/* Initialise CMSIS-RTOS2 kernel */
osKernelInitialize();
/* Create the first application thread */
osThreadNew(app_start, NULL, NULL);
/* Start the scheduler — never returns */
if (osKernelStart() != osOK) {
for (;;) {}
}
return 0;
}
CMSIS-RTOS2 Thread Priority Levels
| Enum Value |
Numeric Value |
Priority Level |
Typical Use |
osPriorityIdle | 1 | Lowest | Idle hook, background analytics |
osPriorityLow | 8 | Low | Logging, display updates |
osPriorityBelowNormal | 16 | Below Normal | Non-critical background tasks |
osPriorityNormal | 24 | Normal | Most application tasks |
osPriorityAboveNormal | 32 | Above Normal | Sensor acquisition, communication |
osPriorityHigh | 40 | High | Control loops, safety monitoring |
osPriorityRealtime | 48 | Realtime | Hard real-time tasks, motor control |
osPriorityISR | 56 | Highest | Deferred interrupt processing only |
CMSIS-RTOS2 defines 56 discrete priority levels (1–56). FreeRTOS maps these to its configMAX_PRIORITIES range; RTX5 maps directly. Each enum also has variants: osPriorityNormal1 through osPriorityNormal7 for fine-grained priority within a band.
Synchronization Primitives
Shared resources between threads require synchronisation. The two primary primitives are mutexes (mutual exclusion — binary, owner-based) and semaphores (counting, no ownership). Choosing the wrong primitive is a common source of priority inversion and deadlocks.
Mutexes with Priority Inheritance
Priority Inversion Scenario: Thread H (high priority) waits for mutex held by Thread L (low priority). Thread M (medium priority) preempts Thread L. Now Thread H is blocked behind Thread M indefinitely — a priority inversion. The fix is Priority Inheritance Protocol (PIP): temporarily raise Thread L to Thread H's priority while it holds the mutex. CMSIS-RTOS2 enables this via osMutexPrioInherit in the mutex attribute.
/*
* Mutex with Priority Inheritance — complete example.
* Protects a shared UART transmit buffer.
*/
#include "cmsis_os2.h"
/* Mutex handle and static control block */
static osMutexId_t h_uart_mutex;
static uint32_t uart_mutex_cb[osRtxMutexCbSize / sizeof(uint32_t)];
/* Mutex attributes: enable priority inheritance to prevent inversion */
static const osMutexAttr_t uart_mutex_attr = {
.name = "UART_Mutex",
.attr_bits = osMutexRecursive /* Allow same thread to re-acquire */
| osMutexPrioInherit /* Priority inheritance protocol */
| osMutexRobust, /* Detect abandoned mutex (owner killed) */
.cb_mem = uart_mutex_cb,
.cb_size = sizeof(uart_mutex_cb),
};
void app_init_mutexes(void) {
h_uart_mutex = osMutexNew(&uart_mutex_attr);
if (h_uart_mutex == NULL) { Error_Handler(); }
}
/* Thread-safe UART transmit — any thread can call this */
osStatus_t UART_SendSafe(const uint8_t *data, uint32_t len) {
osStatus_t status;
/* Acquire mutex — blocks until available (max 100 ms timeout) */
status = osMutexAcquire(h_uart_mutex, 100U);
if (status != osOK) {
return status; /* Timeout or error */
}
/* Critical section: exclusive UART access */
UART_Transmit(data, len);
/* Release mutex — must always be called after successful acquire */
osMutexRelease(h_uart_mutex);
return osOK;
}
/*
* Recursive mutex usage — a function that calls itself or
* another function that also acquires the same mutex:
*/
void nested_uart_operation(void) {
osMutexAcquire(h_uart_mutex, osWaitForever);
/* ... first operation ... */
osMutexAcquire(h_uart_mutex, osWaitForever); /* Safe with osMutexRecursive */
/* ... nested operation ... */
osMutexRelease(h_uart_mutex);
osMutexRelease(h_uart_mutex); /* Must release once per acquire */
}
Semaphores: Counting and Binary
/*
* Semaphore examples:
* 1. Counting semaphore: resource pool (N identical resources)
* 2. Binary semaphore: ISR-to-thread signaling
*/
#include "cmsis_os2.h"
/* ── 1. Counting semaphore: DMA buffer pool with 4 slots ── */
#define DMA_BUFFER_COUNT 4U
static osSemaphoreId_t h_dma_pool_sem;
void init_dma_pool(void) {
/* Initial count = max count = DMA_BUFFER_COUNT */
h_dma_pool_sem = osSemaphoreNew(DMA_BUFFER_COUNT, DMA_BUFFER_COUNT, NULL);
}
uint8_t *acquire_dma_buffer(void) {
/* Decrement count — blocks if no buffers available */
if (osSemaphoreAcquire(h_dma_pool_sem, osWaitForever) != osOK) {
return NULL;
}
return get_free_buffer(); /* Your pool allocator */
}
void release_dma_buffer(uint8_t *buf) {
return_buffer_to_pool(buf);
osSemaphoreRelease(h_dma_pool_sem); /* Increment count */
}
/* ── 2. Binary semaphore: UART receive ISR signals processing thread ── */
static osSemaphoreId_t h_uart_rx_sem;
void init_uart_rx_sem(void) {
/* Initial count = 0 (thread will block until ISR posts) */
h_uart_rx_sem = osSemaphoreNew(1U, 0U, NULL);
}
/* Called from UART RX interrupt — NEVER acquire inside an ISR */
void USART1_IRQHandler(void) {
if (USART1->SR & USART_SR_RXNE) {
g_rx_byte = (uint8_t)(USART1->DR & 0xFFU);
/* Release (post) the semaphore from ISR context */
osSemaphoreRelease(h_uart_rx_sem);
}
}
/* Processing thread — waits for ISR to post */
static void uart_rx_thread(void *arg) {
(void)arg;
for (;;) {
/* Block until ISR posts — zero CPU usage while waiting */
osSemaphoreAcquire(h_uart_rx_sem, osWaitForever);
process_received_byte(g_rx_byte);
}
}
/*
* KEY RULE: Mutexes have an owner — only the thread that acquired
* a mutex can release it. Semaphores have no owner — any thread
* (or ISR) can release a semaphore. Never use a mutex for ISR signaling.
*/
Timing Services
CMSIS-RTOS2 provides several timing mechanisms: blocking delays (osDelay), absolute-deadline delays (osDelayUntil), and software timers (osTimerNew) that run a callback function from the timer task context. Each serves a different real-time design pattern.
osDelay vs osDelayUntil — Drift Prevention
/*
* osDelay(ms): delays from NOW.
* If the task body takes variable time, the period drifts.
*
* osDelayUntil(&ref, ms): delays from a REFERENCE tick.
* Subtracts time already spent in the task — no drift accumulation.
* Equivalent to FreeRTOS vTaskDelayUntil().
*/
#include "cmsis_os2.h"
/* BAD: period = task_execution_time + osDelay(period_ms) — drifts */
static void bad_periodic_task(void *arg) {
for (;;) {
do_work(); /* Variable duration */
osDelay(100U); /* 100 ms from end of do_work */
}
}
/* GOOD: absolute period guaranteed by osDelayUntil */
static void good_periodic_task(void *arg) {
uint32_t tick = osKernelGetTickCount();
for (;;) {
tick += 100U; /* Next deadline = current + 100 ticks */
do_work();
osDelayUntil(tick); /* Blocks until absolute tick reached */
}
}
/* Read current kernel tick count (1 tick = configTICK_RATE_HZ period) */
void log_timing(void) {
uint32_t ticks = osKernelGetTickCount();
uint32_t ms = osKernelGetTickFreq() > 0U
? (ticks * 1000U) / osKernelGetTickFreq()
: ticks;
printf("System time: %lu ms\r\n", ms);
}
osTimerNew — Periodic and One-Shot Software Timers
/*
* Software timers run in the context of the timer task (daemon thread).
* Keep callbacks short and non-blocking — never call osDelay from a timer.
* For long processing, use the timer callback to post a semaphore/flag.
*/
#include "cmsis_os2.h"
static osTimerId_t h_heartbeat_timer;
static osTimerId_t h_timeout_timer;
/* Timer callback — runs in timer daemon thread context */
static void heartbeat_callback(void *arg) {
(void)arg;
LED_Toggle(); /* Quick, non-blocking operation */
}
static void timeout_callback(void *arg) {
(void)arg;
/* Signal main thread that timeout occurred */
osSemaphoreRelease(h_timeout_sem);
}
void init_timers(void) {
/* Periodic timer: fires every 1000 ms (LED heartbeat) */
h_heartbeat_timer = osTimerNew(heartbeat_callback, osTimerPeriodic, NULL, NULL);
osTimerStart(h_heartbeat_timer, 1000U);
/* One-shot timer: fires 5000 ms after start (communication timeout) */
h_timeout_timer = osTimerNew(timeout_callback, osTimerOnce, NULL, NULL);
}
void communication_start(void) {
osTimerStart(h_timeout_timer, 5000U); /* Arm 5 second timeout */
}
void communication_complete(void) {
osTimerStop(h_timeout_timer); /* Disarm: response received in time */
}
/* Restart a running timer (extend deadline) */
void extend_timeout(uint32_t additional_ms) {
/* osTimerStart restarts even if already running */
osTimerStart(h_timeout_timer, 5000U + additional_ms);
}
Choosing a CMSIS-RTOS2 Kernel
The CMSIS-RTOS2 API is kernel-agnostic, but your choice of underlying kernel affects RAM footprint, timing resolution, safety certifications, and ecosystem support. Here is a comparison of the four most common options:
| Kernel |
Min RAM |
Max Tick Rate |
Tickless |
Safety Cert. |
Licence |
Best For |
| FreeRTOS |
~5 KB |
1 kHz (configurable) |
Yes (built-in) |
None (SAFERTOS for ISO) |
MIT |
Open-source, AWS IoT, wide ecosystem |
| Keil RTX5 |
~4 KB |
1 kHz |
Yes |
None (MISRA-C compliant) |
Apache 2.0 |
CMSIS-native, Keil MDK, MISRA compliance |
| Zephyr RTOS |
~8 KB |
10 kHz |
Yes |
Working toward IEC 61508 |
Apache 2.0 |
Connectivity-heavy IoT, large device support |
| Azure RTOS (ThreadX) |
~2 KB |
Unlimited |
Yes |
IEC 61508 SIL 4, DO-178C |
Proprietary (free for Azure) |
Safety-critical, medical, aerospace |
FreeRTOS Strengths
FreeRTOS — The Industry Standard
Largest community, most tutorials, native AWS IoT integration, and excellent FreeRTOS+TCP and FreeRTOS+FAT middleware. The CMSIS-RTOS2 wrapper (cmsis_os2.c on GitHub) is battle-tested. Downside: dynamic allocation by default — requires careful configuration for safety-critical use.
Safety-Critical Choice
Azure RTOS (ThreadX)
Smallest footprint, deterministic execution, and the only option here with IEC 61508 SIL 4 and DO-178C certifications out-of-the-box. The CMSIS-RTOS2 wrapper is provided by Microsoft. Ideal for medical devices, industrial safety systems, and avionics where certification documentation is mandatory.
Exercises
Exercise 1
Beginner
Two Threads Sharing a Resource with a Mutex
Create two threads — a Writer (osPriorityNormal) and a Reader (osPriorityBelowNormal) — that both access a shared 32-byte ring buffer. Protect the buffer with a osMutexPrioInherit mutex. Verify there are no torn reads/writes by having the Writer insert a sequence number and the Reader check sequence continuity. Log any detected corruption via UART.
osMutexNew
Shared Buffer
Data Integrity
Exercise 2
Intermediate
Rate-Monotonic Scheduler with 3 Periodic Tasks
Implement three periodic tasks with periods T1=10 ms (highest priority), T2=25 ms (medium), T3=50 ms (lowest) using osDelayUntil(). Assign priorities according to Rate-Monotonic Scheduling theory (shortest period = highest priority). Measure actual period jitter by timestamping each activation with osKernelGetTickCount() and log max jitter. Verify CPU utilisation stays below the RMS bound (~0.780 for 3 tasks).
Rate-Monotonic
osDelayUntil
Jitter Measurement
Exercise 3
Advanced
Diagnose and Fix a Priority Inversion Bug
You are given the following buggy scenario (implement it): Thread H (priority 40) acquires mutex M, then blocks on osDelay. Thread L (priority 8) then acquires mutex M (success, H has released it), and then Thread M (priority 24) is created. Observe that Thread H's next acquisition is delayed because Thread M preempts Thread L, which holds the mutex. Now change the mutex attribute to add osMutexPrioInherit and observe the fix. Document the maximum observed blocking time before and after the fix using DWT cycle counts.
Priority Inversion
osMutexPrioInherit
DWT Timing
RTOS Thread Design Canvas
Use this tool to document your RTOS thread architecture — kernel selection, thread list with priorities and stack sizes, mutex/semaphore inventory, and timer configuration. Download as Word, Excel, PDF, or PPTX for design review and team onboarding.
Conclusion & Next Steps
In this article we have built a complete foundation for multi-threaded embedded development with CMSIS-RTOS2:
- Preemptive scheduling via PendSV ensures the highest-priority runnable thread always executes — but only if priorities are assigned correctly and critical sections are bounded.
osThreadNew() with a full osThreadAttr_t struct — specifying static stack memory, control block storage, and priority — eliminates heap fragmentation in production firmware.
- Mutexes with
osMutexPrioInherit are the correct primitive for protecting shared resources; binary semaphores are the correct primitive for ISR-to-thread signaling. Confusing these is the most common source of priority inversion bugs.
osDelayUntil() rather than osDelay() is required for accurate periodic task periods — the difference compounds over time and causes real-time deadline misses at high utilisation.
- The choice of underlying kernel (FreeRTOS, RTX5, ThreadX, Zephyr) is independent of your application code when using the CMSIS-RTOS2 API — but it determines your RAM budget, certification path, and middleware ecosystem.
Next in the Series
In Part 5: CMSIS-RTOS2 — Message Queues & Event Flags, we complete the CMSIS-RTOS2 synchronisation toolkit: osMessageQueueNew() for typed inter-thread messaging, osEventFlagsNew() for broadcasting conditions to multiple waiters, and the design patterns that prevent the most common real-time bugs — including the ISR-to-thread handoff that every sensor driver needs.
Related Articles in This Series
Part 5: CMSIS-RTOS2 — Message Queues & Event Flags
Inter-thread communication patterns, ISR-to-thread signaling, and real-time design patterns using CMSIS-RTOS2 message queues and event flags.
Read Article
Part 11: Interrupts, Concurrency & Real-Time Constraints
Deep-dive into interrupt latency, critical sections, lock-free programming, and the real-time constraints that govern RTOS design decisions.
Read Article
Part 13: Low Power & Energy Optimization
Tickless RTOS idle mode, sleep state management from thread context, and power profiling techniques for battery-operated embedded systems.
Read Article