Back to Technology

CMSIS Part 4: CMSIS-RTOS2 — Threads, Mutexes & Semaphores

March 31, 2026 Wasil Zafar 30 min read

How CMSIS-RTOS2 gives you a kernel-agnostic API for multi-threaded embedded firmware — create threads, synchronise with mutexes, signal with semaphores, and never deadlock again.

Table of Contents

  1. RTOS Fundamentals
  2. Thread Management
  3. Synchronization Primitives
  4. Timing Services
  5. Choosing a CMSIS-RTOS2 Kernel
  6. Exercises
  7. RTOS Thread Design Canvas
  8. Conclusion & Next Steps
Series Context: This is Part 4 of the 20-part CMSIS Mastery Series. Parts 1–3 established the hardware foundation. Now we build upward: multi-threading, synchronisation, and real-time scheduling via the CMSIS-RTOS2 API.

CMSIS Mastery Series

Your 20-step learning path • Currently on Step 4
1
Overview & ARM Cortex-M Ecosystem
CMSIS layers, Cortex-M families, memory map, toolchains
Completed
2
CMSIS-Core: Registers, NVIC & SysTick
core_cmX.h, register access, interrupt controller, SysTick timer
Completed
3
Startup Code, Linker Scripts & Vector Table
Reset handler, BSS init, scatter files, boot process
Completed
4
CMSIS-RTOS2: Threads, Mutexes & Semaphores
Thread management, synchronization primitives, scheduling
You Are Here
5
CMSIS-RTOS2: Message Queues & Event Flags
Inter-thread comms, ISR-to-thread, real-time design patterns
6
CMSIS-DSP: Filters, FFT & Math Functions
FIR/IIR filters, FFT, SIMD optimizations
7
CMSIS-Driver: UART, SPI & I2C
Driver abstraction layer, callbacks, DMA integration
8
CMSIS-Pack & Software Components
Pack files, device support, dependency management
9
Debugging with CMSIS-DAP & CoreSight
SWD/JTAG, HardFault analysis, ITM tracing
10
Portable Firmware: Multi-Vendor Projects
HAL vs CMSIS, cross-platform BSPs, reusable libraries
11
Interrupts, Concurrency & Real-Time Constraints
Interrupt latency, critical sections, lock-free programming
12
Memory Management in Embedded Systems
Static vs dynamic, heap fragmentation, memory pools
13
Low Power & Energy Optimization
Sleep modes, clock gating, tickless RTOS, power profiling
14
DMA & High-Performance Data Handling
DMA basics, peripheral transfers, zero-copy techniques
15
Security: ARMv8-M & TrustZone
Secure/non-secure worlds, secure boot, firmware protection
16
Bootloaders & Firmware Updates
OTA updates, dual-bank flash, fail-safe strategies
17
Testing & Validation
Unity/Ceedling unit tests, HIL testing, integration testing
18
Performance Optimization
Compiler flags, inline assembly, cache (M7/M33), profiling
19
Embedded Software Architecture
Layered design, event-driven, state machines, component-based
20
Tooling & Workflow (Professional Level)
CI/CD for embedded, MISRA, static analysis, Doxygen

RTOS Fundamentals

An RTOS is not about speed — it is about determinism. A bare-metal super-loop can run faster than an RTOS on a single task, but it cannot guarantee that a critical task will always complete within a deadline when multiple activities compete for the CPU. An RTOS provides a scheduler that makes and enforces those guarantees.

Preemptive vs Cooperative Scheduling

CMSIS-RTOS2 implementations (FreeRTOS, RTX5, Zephyr) all default to preemptive priority-based scheduling — the highest-priority runnable thread always runs, and the scheduler can interrupt a lower-priority thread at any tick boundary. Cooperative scheduling (a thread runs until it explicitly yields) is also supported but uncommon in production firmware due to its inability to bound worst-case latency.

Context Switching on Cortex-M

A context switch on Cortex-M uses PendSV (the lowest-priority exception) to perform the actual register save/restore. The scheduler sets PendSV pending, and when it fires, it saves R4–R11 (R0–R3, R12, LR, PC, xPSR are saved by hardware on exception entry) to the current thread's stack and loads the next thread's saved state. On Cortex-M4F with FPU, the FPU lazy stacking mechanism saves floating-point registers only if the incoming thread actually uses the FPU — a significant performance optimisation.

CMSIS-RTOS2 Portability: The CMSIS-RTOS2 API (cmsis_os2.h) is a thin wrapper over the native RTOS API. Calling osThreadNew() on FreeRTOS maps to xTaskCreate(); on RTX5 it maps to a different internal call. Your application code stays identical when changing kernels — only the wrapper layer and linker configuration change.

Thread Management

Every concurrent activity in a CMSIS-RTOS2 application is a thread (RTOS2's term; FreeRTOS calls them tasks). Each thread has its own stack, its own priority, and an execution state (Running, Ready, Blocked, Terminated). The scheduler manages all state transitions.

Complete Thread Creation with osThreadNew

/*
 * CMSIS-RTOS2 Thread Creation — complete example.
 * Producer thread writes sensor data; LED thread blinks status.
 *
 * Include: cmsis_os2.h (and the RTOS-specific header, e.g., FreeRTOS.h)
 */
#include "cmsis_os2.h"

/* ── Static stack allocation (preferred over dynamic for predictability) */
#define PRODUCER_STACK_SIZE   512U   /* bytes — tune with watermark analysis */
#define LED_STACK_SIZE        256U

static uint32_t producer_stack[PRODUCER_STACK_SIZE / sizeof(uint32_t)];
static uint32_t led_stack[LED_STACK_SIZE / sizeof(uint32_t)];

/* Thread control blocks (TCBs) — static allocation avoids heap fragmentation */
static osThreadId_t h_producer;
static osThreadId_t h_led;

/* ── Thread function prototypes */
static void producer_thread(void *argument);
static void led_thread(void *argument);

/* ── Thread attribute structures */
static const osThreadAttr_t producer_attr = {
    .name       = "Producer",
    .stack_mem  = producer_stack,
    .stack_size = PRODUCER_STACK_SIZE,
    .priority   = osPriorityNormal,    /* Priority level 24 of 56 */
    .tz_module  = 0U,                  /* TrustZone: non-secure (default) */
};

static const osThreadAttr_t led_attr = {
    .name       = "LED",
    .stack_mem  = led_stack,
    .stack_size = LED_STACK_SIZE,
    .priority   = osPriorityLow,       /* Below Producer */
};

/* ── Application entry point (called from main after kernel init) */
void app_start(void *argument) {
    /* Create threads — both start in Ready state */
    h_producer = osThreadNew(producer_thread, NULL, &producer_attr);
    h_led      = osThreadNew(led_thread,      NULL, &led_attr);

    if (h_producer == NULL || h_led == NULL) {
        /* Thread creation failed — insufficient memory or invalid params */
        Error_Handler();
    }

    /* This thread (app_start) can terminate — RTOS continues running */
    osThreadExit();
}

/* ── Producer thread: reads ADC, signals LED thread via semaphore */
static volatile uint16_t g_sensor_value = 0U;

static void producer_thread(void *argument) {
    (void)argument;
    for (;;) {
        g_sensor_value = ADC_Read();
        osDelay(10U);  /* 10 ms sampling period — yields CPU to lower tasks */
    }
}

/* ── LED thread: blinks at 1 Hz to indicate system health */
static void led_thread(void *argument) {
    (void)argument;
    for (;;) {
        LED_Toggle();
        osDelay(500U);  /* 500 ms on, 500 ms off */
    }
}

/* ── main(): initialise kernel and start scheduler */
int main(void) {
    SystemInit();
    SysTick_Config(SystemCoreClock / 1000U);

    /* Initialise CMSIS-RTOS2 kernel */
    osKernelInitialize();

    /* Create the first application thread */
    osThreadNew(app_start, NULL, NULL);

    /* Start the scheduler — never returns */
    if (osKernelStart() != osOK) {
        for (;;) {}
    }
    return 0;
}

CMSIS-RTOS2 Thread Priority Levels

Enum Value Numeric Value Priority Level Typical Use
osPriorityIdle1LowestIdle hook, background analytics
osPriorityLow8LowLogging, display updates
osPriorityBelowNormal16Below NormalNon-critical background tasks
osPriorityNormal24NormalMost application tasks
osPriorityAboveNormal32Above NormalSensor acquisition, communication
osPriorityHigh40HighControl loops, safety monitoring
osPriorityRealtime48RealtimeHard real-time tasks, motor control
osPriorityISR56HighestDeferred interrupt processing only

CMSIS-RTOS2 defines 56 discrete priority levels (1–56). FreeRTOS maps these to its configMAX_PRIORITIES range; RTX5 maps directly. Each enum also has variants: osPriorityNormal1 through osPriorityNormal7 for fine-grained priority within a band.

Synchronization Primitives

Shared resources between threads require synchronisation. The two primary primitives are mutexes (mutual exclusion — binary, owner-based) and semaphores (counting, no ownership). Choosing the wrong primitive is a common source of priority inversion and deadlocks.

Mutexes with Priority Inheritance

Priority Inversion Scenario: Thread H (high priority) waits for mutex held by Thread L (low priority). Thread M (medium priority) preempts Thread L. Now Thread H is blocked behind Thread M indefinitely — a priority inversion. The fix is Priority Inheritance Protocol (PIP): temporarily raise Thread L to Thread H's priority while it holds the mutex. CMSIS-RTOS2 enables this via osMutexPrioInherit in the mutex attribute.
/*
 * Mutex with Priority Inheritance — complete example.
 * Protects a shared UART transmit buffer.
 */
#include "cmsis_os2.h"

/* Mutex handle and static control block */
static osMutexId_t h_uart_mutex;
static uint32_t    uart_mutex_cb[osRtxMutexCbSize / sizeof(uint32_t)];

/* Mutex attributes: enable priority inheritance to prevent inversion */
static const osMutexAttr_t uart_mutex_attr = {
    .name      = "UART_Mutex",
    .attr_bits = osMutexRecursive   /* Allow same thread to re-acquire */
               | osMutexPrioInherit /* Priority inheritance protocol    */
               | osMutexRobust,     /* Detect abandoned mutex (owner killed) */
    .cb_mem    = uart_mutex_cb,
    .cb_size   = sizeof(uart_mutex_cb),
};

void app_init_mutexes(void) {
    h_uart_mutex = osMutexNew(&uart_mutex_attr);
    if (h_uart_mutex == NULL) { Error_Handler(); }
}

/* Thread-safe UART transmit — any thread can call this */
osStatus_t UART_SendSafe(const uint8_t *data, uint32_t len) {
    osStatus_t status;

    /* Acquire mutex — blocks until available (max 100 ms timeout) */
    status = osMutexAcquire(h_uart_mutex, 100U);
    if (status != osOK) {
        return status;  /* Timeout or error */
    }

    /* Critical section: exclusive UART access */
    UART_Transmit(data, len);

    /* Release mutex — must always be called after successful acquire */
    osMutexRelease(h_uart_mutex);
    return osOK;
}

/*
 * Recursive mutex usage — a function that calls itself or
 * another function that also acquires the same mutex:
 */
void nested_uart_operation(void) {
    osMutexAcquire(h_uart_mutex, osWaitForever);
    /* ... first operation ... */
    osMutexAcquire(h_uart_mutex, osWaitForever);  /* Safe with osMutexRecursive */
    /* ... nested operation ... */
    osMutexRelease(h_uart_mutex);
    osMutexRelease(h_uart_mutex);  /* Must release once per acquire */
}

Semaphores: Counting and Binary

/*
 * Semaphore examples:
 * 1. Counting semaphore: resource pool (N identical resources)
 * 2. Binary semaphore: ISR-to-thread signaling
 */
#include "cmsis_os2.h"

/* ── 1. Counting semaphore: DMA buffer pool with 4 slots ── */
#define DMA_BUFFER_COUNT  4U

static osSemaphoreId_t h_dma_pool_sem;

void init_dma_pool(void) {
    /* Initial count = max count = DMA_BUFFER_COUNT */
    h_dma_pool_sem = osSemaphoreNew(DMA_BUFFER_COUNT, DMA_BUFFER_COUNT, NULL);
}

uint8_t *acquire_dma_buffer(void) {
    /* Decrement count — blocks if no buffers available */
    if (osSemaphoreAcquire(h_dma_pool_sem, osWaitForever) != osOK) {
        return NULL;
    }
    return get_free_buffer();  /* Your pool allocator */
}

void release_dma_buffer(uint8_t *buf) {
    return_buffer_to_pool(buf);
    osSemaphoreRelease(h_dma_pool_sem);  /* Increment count */
}

/* ── 2. Binary semaphore: UART receive ISR signals processing thread ── */
static osSemaphoreId_t h_uart_rx_sem;

void init_uart_rx_sem(void) {
    /* Initial count = 0 (thread will block until ISR posts) */
    h_uart_rx_sem = osSemaphoreNew(1U, 0U, NULL);
}

/* Called from UART RX interrupt — NEVER acquire inside an ISR */
void USART1_IRQHandler(void) {
    if (USART1->SR & USART_SR_RXNE) {
        g_rx_byte = (uint8_t)(USART1->DR & 0xFFU);
        /* Release (post) the semaphore from ISR context */
        osSemaphoreRelease(h_uart_rx_sem);
    }
}

/* Processing thread — waits for ISR to post */
static void uart_rx_thread(void *arg) {
    (void)arg;
    for (;;) {
        /* Block until ISR posts — zero CPU usage while waiting */
        osSemaphoreAcquire(h_uart_rx_sem, osWaitForever);
        process_received_byte(g_rx_byte);
    }
}

/*
 * KEY RULE: Mutexes have an owner — only the thread that acquired
 * a mutex can release it. Semaphores have no owner — any thread
 * (or ISR) can release a semaphore. Never use a mutex for ISR signaling.
 */

Timing Services

CMSIS-RTOS2 provides several timing mechanisms: blocking delays (osDelay), absolute-deadline delays (osDelayUntil), and software timers (osTimerNew) that run a callback function from the timer task context. Each serves a different real-time design pattern.

osDelay vs osDelayUntil — Drift Prevention

/*
 * osDelay(ms): delays from NOW.
 * If the task body takes variable time, the period drifts.
 *
 * osDelayUntil(&ref, ms): delays from a REFERENCE tick.
 * Subtracts time already spent in the task — no drift accumulation.
 * Equivalent to FreeRTOS vTaskDelayUntil().
 */
#include "cmsis_os2.h"

/* BAD: period = task_execution_time + osDelay(period_ms) — drifts */
static void bad_periodic_task(void *arg) {
    for (;;) {
        do_work();           /* Variable duration */
        osDelay(100U);       /* 100 ms from end of do_work */
    }
}

/* GOOD: absolute period guaranteed by osDelayUntil */
static void good_periodic_task(void *arg) {
    uint32_t tick = osKernelGetTickCount();
    for (;;) {
        tick += 100U;        /* Next deadline = current + 100 ticks */
        do_work();
        osDelayUntil(tick);  /* Blocks until absolute tick reached   */
    }
}

/* Read current kernel tick count (1 tick = configTICK_RATE_HZ period) */
void log_timing(void) {
    uint32_t ticks = osKernelGetTickCount();
    uint32_t ms    = osKernelGetTickFreq() > 0U
                   ? (ticks * 1000U) / osKernelGetTickFreq()
                   : ticks;
    printf("System time: %lu ms\r\n", ms);
}

osTimerNew — Periodic and One-Shot Software Timers

/*
 * Software timers run in the context of the timer task (daemon thread).
 * Keep callbacks short and non-blocking — never call osDelay from a timer.
 * For long processing, use the timer callback to post a semaphore/flag.
 */
#include "cmsis_os2.h"

static osTimerId_t h_heartbeat_timer;
static osTimerId_t h_timeout_timer;

/* Timer callback — runs in timer daemon thread context */
static void heartbeat_callback(void *arg) {
    (void)arg;
    LED_Toggle();  /* Quick, non-blocking operation */
}

static void timeout_callback(void *arg) {
    (void)arg;
    /* Signal main thread that timeout occurred */
    osSemaphoreRelease(h_timeout_sem);
}

void init_timers(void) {
    /* Periodic timer: fires every 1000 ms (LED heartbeat) */
    h_heartbeat_timer = osTimerNew(heartbeat_callback, osTimerPeriodic, NULL, NULL);
    osTimerStart(h_heartbeat_timer, 1000U);

    /* One-shot timer: fires 5000 ms after start (communication timeout) */
    h_timeout_timer = osTimerNew(timeout_callback, osTimerOnce, NULL, NULL);
}

void communication_start(void) {
    osTimerStart(h_timeout_timer, 5000U);  /* Arm 5 second timeout */
}

void communication_complete(void) {
    osTimerStop(h_timeout_timer);  /* Disarm: response received in time */
}

/* Restart a running timer (extend deadline) */
void extend_timeout(uint32_t additional_ms) {
    /* osTimerStart restarts even if already running */
    osTimerStart(h_timeout_timer, 5000U + additional_ms);
}

Choosing a CMSIS-RTOS2 Kernel

The CMSIS-RTOS2 API is kernel-agnostic, but your choice of underlying kernel affects RAM footprint, timing resolution, safety certifications, and ecosystem support. Here is a comparison of the four most common options:

Kernel Min RAM Max Tick Rate Tickless Safety Cert. Licence Best For
FreeRTOS ~5 KB 1 kHz (configurable) Yes (built-in) None (SAFERTOS for ISO) MIT Open-source, AWS IoT, wide ecosystem
Keil RTX5 ~4 KB 1 kHz Yes None (MISRA-C compliant) Apache 2.0 CMSIS-native, Keil MDK, MISRA compliance
Zephyr RTOS ~8 KB 10 kHz Yes Working toward IEC 61508 Apache 2.0 Connectivity-heavy IoT, large device support
Azure RTOS (ThreadX) ~2 KB Unlimited Yes IEC 61508 SIL 4, DO-178C Proprietary (free for Azure) Safety-critical, medical, aerospace
FreeRTOS Strengths

FreeRTOS — The Industry Standard

Largest community, most tutorials, native AWS IoT integration, and excellent FreeRTOS+TCP and FreeRTOS+FAT middleware. The CMSIS-RTOS2 wrapper (cmsis_os2.c on GitHub) is battle-tested. Downside: dynamic allocation by default — requires careful configuration for safety-critical use.

Safety-Critical Choice

Azure RTOS (ThreadX)

Smallest footprint, deterministic execution, and the only option here with IEC 61508 SIL 4 and DO-178C certifications out-of-the-box. The CMSIS-RTOS2 wrapper is provided by Microsoft. Ideal for medical devices, industrial safety systems, and avionics where certification documentation is mandatory.

Exercises

Exercise 1 Beginner

Two Threads Sharing a Resource with a Mutex

Create two threads — a Writer (osPriorityNormal) and a Reader (osPriorityBelowNormal) — that both access a shared 32-byte ring buffer. Protect the buffer with a osMutexPrioInherit mutex. Verify there are no torn reads/writes by having the Writer insert a sequence number and the Reader check sequence continuity. Log any detected corruption via UART.

osMutexNew Shared Buffer Data Integrity
Exercise 2 Intermediate

Rate-Monotonic Scheduler with 3 Periodic Tasks

Implement three periodic tasks with periods T1=10 ms (highest priority), T2=25 ms (medium), T3=50 ms (lowest) using osDelayUntil(). Assign priorities according to Rate-Monotonic Scheduling theory (shortest period = highest priority). Measure actual period jitter by timestamping each activation with osKernelGetTickCount() and log max jitter. Verify CPU utilisation stays below the RMS bound (~0.780 for 3 tasks).

Rate-Monotonic osDelayUntil Jitter Measurement
Exercise 3 Advanced

Diagnose and Fix a Priority Inversion Bug

You are given the following buggy scenario (implement it): Thread H (priority 40) acquires mutex M, then blocks on osDelay. Thread L (priority 8) then acquires mutex M (success, H has released it), and then Thread M (priority 24) is created. Observe that Thread H's next acquisition is delayed because Thread M preempts Thread L, which holds the mutex. Now change the mutex attribute to add osMutexPrioInherit and observe the fix. Document the maximum observed blocking time before and after the fix using DWT cycle counts.

Priority Inversion osMutexPrioInherit DWT Timing

RTOS Thread Design Canvas

Use this tool to document your RTOS thread architecture — kernel selection, thread list with priorities and stack sizes, mutex/semaphore inventory, and timer configuration. Download as Word, Excel, PDF, or PPTX for design review and team onboarding.

RTOS Thread Design Canvas

Document your RTOS thread architecture. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

In this article we have built a complete foundation for multi-threaded embedded development with CMSIS-RTOS2:

  • Preemptive scheduling via PendSV ensures the highest-priority runnable thread always executes — but only if priorities are assigned correctly and critical sections are bounded.
  • osThreadNew() with a full osThreadAttr_t struct — specifying static stack memory, control block storage, and priority — eliminates heap fragmentation in production firmware.
  • Mutexes with osMutexPrioInherit are the correct primitive for protecting shared resources; binary semaphores are the correct primitive for ISR-to-thread signaling. Confusing these is the most common source of priority inversion bugs.
  • osDelayUntil() rather than osDelay() is required for accurate periodic task periods — the difference compounds over time and causes real-time deadline misses at high utilisation.
  • The choice of underlying kernel (FreeRTOS, RTX5, ThreadX, Zephyr) is independent of your application code when using the CMSIS-RTOS2 API — but it determines your RAM budget, certification path, and middleware ecosystem.

Next in the Series

In Part 5: CMSIS-RTOS2 — Message Queues & Event Flags, we complete the CMSIS-RTOS2 synchronisation toolkit: osMessageQueueNew() for typed inter-thread messaging, osEventFlagsNew() for broadcasting conditions to multiple waiters, and the design patterns that prevent the most common real-time bugs — including the ISR-to-thread handoff that every sensor driver needs.

Technology