Back to Technology

USB Part 11: RTOS + USB Integration with FreeRTOS

March 31, 2026 Wasil Zafar 30 min read

Running TinyUSB alongside FreeRTOS requires more than just calling tud_task() in a loop — you must design task priorities to prevent inversion, protect CDC with mutexes, build producer/consumer queues for safe data handoff, and handle D-cache coherency on Cortex-M7 before USB DMA corrupts your data silently.

Table of Contents

  1. TinyUSB + FreeRTOS Integration Overview
  2. USB Task Design
  3. FreeRTOS Task Priorities for USB
  4. Thread-Safe CDC with FreeRTOS
  5. Producer/Consumer Queue Pattern
  6. DMA Cache Coherency on Cortex-M7
  7. STM32H7 USB + FreeRTOS Complete Example
  8. TinyUSB RTOS Hooks
  9. USB Suspend/Resume with RTOS
  10. Practical Exercises
  11. USB RTOS Design Generator
  12. Conclusion & Next Steps
Series Context: This is Part 11 of the 17-part USB Development Mastery series. Parts 1–10 covered fundamentals through USB debugging. This part assumes you have a working bare-metal TinyUSB device from earlier parts. We now add FreeRTOS to that device and make USB work correctly in a multitasking environment.

USB Development Mastery

Your 17-step learning path • Currently on Step 11

TinyUSB + FreeRTOS Integration Overview

TinyUSB is designed to be RTOS-agnostic. It supports bare-metal (polling in main loop), FreeRTOS, Azure RTOS (ThreadX), Zephyr, and others through an abstraction layer defined in osal/osal_freertos.h. Selecting the correct RTOS is a compile-time choice made in tusb_config.h.

The key configuration is:

// In tusb_config.h — select FreeRTOS as the RTOS abstraction layer
// Options: CFG_TUSB_OS_NONE (bare-metal polling)
//          CFG_TUSB_OS_FREERTOS
//          CFG_TUSB_OS_MYNEWT
//          CFG_TUSB_OS_PICO (RP2040 SDK)
//          CFG_TUSB_OS_RTTHREAD
#define CFG_TUSB_OS   CFG_TUSB_OS_FREERTOS

When CFG_TUSB_OS_FREERTOS is selected, TinyUSB's OSAL (OS Abstraction Layer) maps its internal primitives to FreeRTOS equivalents:

TinyUSB OSAL Primitive FreeRTOS Equivalent Used For
osal_queue_t QueueHandle_t USB event queue between ISR and USB task
osal_semaphore_t SemaphoreHandle_t Signalling USB task from USB ISR
osal_mutex_t SemaphoreHandle_t (mutex) Protecting shared USB resources
osal_task_delay(ms) vTaskDelay(ms / portTICK_PERIOD_MS) Yielding in USB task when no events pending
Critical Rule: Never call TinyUSB API functions (tud_cdc_write(), tud_hid_report(), etc.) from within a FreeRTOS ISR context. All TinyUSB API calls must come from task context. TinyUSB's callbacks (tud_cdc_rx_cb(), etc.) are called from within tud_task(), which runs in the USB task — they are safe to use FreeRTOS ISR-safe APIs from, but not regular task APIs.

USB Task Design

The fundamental rule of TinyUSB + FreeRTOS integration is: tud_task() must run in a dedicated task. You cannot call tud_task() from multiple tasks, from the idle task, or from within an ISR. One task, one tud_task(), always.

Basic USB Task Structure

// usb_task.c — the dedicated USB device task
#include "tusb.h"
#include "FreeRTOS.h"
#include "task.h"

// Stack size: 512 words (2048 bytes) is the recommended minimum.
// Increase to 1024 words if using complex class callbacks or debug output.
#define USB_DEVICE_TASK_STACK_SIZE   512
#define USB_DEVICE_TASK_PRIORITY     (configMAX_PRIORITIES - 1)

// Forward declaration
static void usb_device_task(void *param);

void usb_task_init(void)
{
    // Initialize TinyUSB BEFORE creating the task
    tusb_init();

    // Create the USB device task
    xTaskCreate(
        usb_device_task,              // Task function
        "usb_dev",                    // Task name (for debugging)
        USB_DEVICE_TASK_STACK_SIZE,   // Stack depth in words
        NULL,                         // Task parameter (unused)
        USB_DEVICE_TASK_PRIORITY,     // Priority
        NULL                          // Task handle (not needed here)
    );
}

static void usb_device_task(void *param)
{
    (void)param;

    // tud_task() processes USB events:
    //   - Calls endpoint callbacks (tud_cdc_rx_cb, tud_hid_set_report_cb, etc.)
    //   - Handles control requests (enumeration, class commands)
    //   - Manages endpoint state machines
    //
    // With CFG_TUSB_OS_FREERTOS, tud_task() internally calls
    // osal_queue_receive() which blocks on the USB event queue.
    // The USB ISR posts events to this queue, unblocking the task.
    // This means the USB task sleeps when idle — no CPU waste.

    while (1) {
        tud_task();
        // No vTaskDelay needed here — tud_task() blocks internally
        // via osal_queue_receive() with portMAX_DELAY when no events.
        // Adding vTaskDelay(1) here would REDUCE responsiveness.
    }
}

Stack Size Considerations

The 512-word (2048-byte) stack recommendation covers:

  • TinyUSB internal control request processing: ~200 bytes
  • Descriptor callbacks (your code): typically 100–300 bytes
  • Class driver callbacks (CDC, HID, MSC): 100–400 bytes each
  • Debug output via tu_printf if enabled: add 256 bytes for the format buffer

If you see stack overflow crashes (detected via FreeRTOS stack watermark: uxTaskGetStackHighWaterMark(NULL)), increase to 1024 words. Enable configCHECK_FOR_STACK_OVERFLOW 2 in FreeRTOSConfig.h to catch stack overflows at runtime.

portYIELD vs vTaskDelay

When using CFG_TUSB_OS_FREERTOS, never add vTaskDelay(1) after tud_task(). The FreeRTOS OSAL already handles task blocking via the queue receive — adding a delay would introduce up to 1 tick (1 ms by default) of additional latency for every USB event. This is acceptable in bare-metal mode (where you poll in a while loop), but not in RTOS mode where the event-driven blocking is already efficient.

FreeRTOS Task Priorities for USB

Priority assignment is the most critical and most frequently wrong aspect of USB + RTOS integration. Getting it wrong causes intermittent disconnects, dropped data, and enumeration failures that are nearly impossible to debug without understanding the priority relationship.

The Priority Ladder

Priority ladder (highest to lowest):
══════════════════════════════════════════════════════════════
Priority 7 (configMAX_PRIORITIES-1): USB device task (tud_task)
             → Must respond to USB events within ~1 ms
             → If delayed longer, host sees timeout → bus reset → re-enumeration

Priority 6: Time-critical application tasks
             → Tasks that produce/consume USB data at high rate

Priority 5: CDC receive processing task
             → Reads from USB receive queue and processes data

Priority 4: General application tasks
             → Non-time-critical application logic

Priority 3: Logging / diagnostic tasks
             → Low priority — can be delayed without affecting USB

Priority 2: (reserved — avoid)

Priority 1 (tskIDLE_PRIORITY+1): Lowest user task priority

Priority 0 (tskIDLE_PRIORITY): FreeRTOS idle task (automatic)
══════════════════════════════════════════════════════════════
Why USB Must Be Highest: The USB host expects the device to respond to control requests within 50 ms (per USB spec), and to respond to IN tokens within 6.5 bit times at FS. If the USB task is blocked by a lower-priority task for more than a few milliseconds during active transfers, the host may see a timeout and reset the bus. This manifests as the device disconnecting and re-enumerating randomly — a notoriously difficult problem to diagnose.

Priority Inversion Scenario

Priority inversion occurs when a high-priority task (USB task) is blocked waiting for a resource held by a low-priority task (application task). FreeRTOS provides priority inheritance in mutexes to mitigate this — but only for xSemaphoreCreateMutex(), not xSemaphoreCreateBinary(). The scenario:

Priority Inversion in USB + FreeRTOS:
────────────────────────────────────────────────────────
Time 0ms: App task (priority 4) takes CDC write mutex
Time 1ms: USB task (priority 7) needs CDC write mutex → BLOCKED
Time 1ms: Medium-priority task (priority 5) preempts App task
           → App task cannot release mutex
           → USB task stays blocked
           → Host sees USB device not responding → bus reset
────────────────────────────────────────────────────────
FreeRTOS priority inheritance fix:
  When USB task (P7) blocks on mutex held by App task (P4),
  FreeRTOS temporarily raises App task priority to P7.
  This prevents medium-priority tasks from preempting App task.
  App task completes, releases mutex, returns to P4.
  USB task resumes at P7.
────────────────────────────────────────────────────────
Requirement: Use xSemaphoreCreateMutex() (not binary semaphore)
             for any mutex that can be held by application tasks
             and waited on by the USB task.

Creating Tasks with Correct Priorities

// main.c — creating all tasks with correct priorities
#include "FreeRTOS.h"
#include "task.h"
#include "usb_task.h"
#include "app_task.h"

// Task handle declarations
TaskHandle_t usb_task_handle    = NULL;
TaskHandle_t app_task_handle    = NULL;
TaskHandle_t cdc_rx_task_handle = NULL;

int main(void)
{
    // Hardware initialization (clock, GPIO, UART, etc.)
    SystemClock_Config();
    HAL_Init();

    // Initialize USB task (calls tusb_init() internally)
    // USB task runs at configMAX_PRIORITIES - 1
    xTaskCreate(usb_device_task, "usb_dev", 512, NULL,
                configMAX_PRIORITIES - 1, &usb_task_handle);

    // CDC receive processing: lower than USB, higher than app
    xTaskCreate(cdc_rx_processing_task, "cdc_rx", 256, NULL,
                configMAX_PRIORITIES - 3, &cdc_rx_task_handle);

    // Application task: lowest user priority
    xTaskCreate(app_main_task, "app", 512, NULL,
                tskIDLE_PRIORITY + 1, &app_task_handle);

    // Start scheduler — does not return
    vTaskStartScheduler();

    // Should never reach here
    while (1) { __NOP(); }
}

Thread-Safe CDC with FreeRTOS

The core problem with CDC in a multitasking system: tud_cdc_write() and tud_cdc_write_flush() access a shared transmit buffer. If two tasks both call tud_cdc_write() simultaneously, the data interleaves randomly — you get garbled output and potential buffer corruption.

The solution is a mutex that serialises all CDC write operations:

// cdc_thread_safe.h — thread-safe CDC write interface
#ifndef CDC_THREAD_SAFE_H
#define CDC_THREAD_SAFE_H

#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>

// Initialize the CDC mutex — call once before starting tasks
void cdc_ts_init(void);

// Thread-safe CDC write — blocks until mutex available
// Returns number of bytes actually written (may be less than len if FIFO full)
uint32_t cdc_ts_write(const void *buf, uint32_t len);

// Thread-safe CDC printf — convenience wrapper
int cdc_ts_printf(const char *fmt, ...);

// Thread-safe CDC write for multi-port TinyUSB (port 0 = default)
uint32_t cdc_ts_write_n(uint8_t itf, const void *buf, uint32_t len);

#endif // CDC_THREAD_SAFE_H
// cdc_thread_safe.c — implementation
#include "cdc_thread_safe.h"
#include "tusb.h"
#include "FreeRTOS.h"
#include "semphr.h"
#include <stdarg.h>
#include <stdio.h>
#include <string.h>

// Mutex handle — created at init, never deleted
static SemaphoreHandle_t s_cdc_mutex = NULL;

void cdc_ts_init(void)
{
    // Create a mutex with priority inheritance.
    // xSemaphoreCreateMutex() enables FreeRTOS priority inheritance,
    // which prevents priority inversion between USB task and app tasks.
    s_cdc_mutex = xSemaphoreCreateMutex();
    configASSERT(s_cdc_mutex != NULL);
}

uint32_t cdc_ts_write(const void *buf, uint32_t len)
{
    uint32_t written = 0;

    // Take the mutex — wait forever (USB task at higher priority,
    // so app task will be promoted via priority inheritance if needed)
    if (xSemaphoreTake(s_cdc_mutex, portMAX_DELAY) == pdTRUE) {

        // Check that CDC is connected and ready
        if (tud_cdc_connected()) {
            written = tud_cdc_write(buf, len);

            // Flush immediately — without this, data sits in the
            // TinyUSB transmit FIFO until the FIFO is full or the
            // USB task triggers a flush automatically.
            // For low-latency output, always flush after write.
            tud_cdc_write_flush();
        }

        xSemaphoreGive(s_cdc_mutex);
    }

    return written;
}

int cdc_ts_printf(const char *fmt, ...)
{
    char buf[256];
    va_list args;
    va_start(args, fmt);
    int len = vsnprintf(buf, sizeof(buf), fmt, args);
    va_end(args);

    if (len > 0) {
        cdc_ts_write(buf, (uint32_t)len);
    }
    return len;
}

uint32_t cdc_ts_write_n(uint8_t itf, const void *buf, uint32_t len)
{
    uint32_t written = 0;

    if (xSemaphoreTake(s_cdc_mutex, portMAX_DELAY) == pdTRUE) {
        if (tud_cdc_n_connected(itf)) {
            written = tud_cdc_n_write(itf, buf, len);
            tud_cdc_n_write_flush(itf);
        }
        xSemaphoreGive(s_cdc_mutex);
    }

    return written;
}
Timeout Note: Using portMAX_DELAY in application tasks means they will block indefinitely if the mutex is never released (e.g., if the USB task crashes while holding the mutex — which cannot happen with this design since only application tasks hold the write mutex). For real-time tasks that cannot afford to block indefinitely, use a finite timeout like pdMS_TO_TICKS(10) and handle the timeout case gracefully.

Producer/Consumer Queue Pattern

The cleanest architecture for CDC receive with FreeRTOS is a producer/consumer queue pattern: the USB callback (running inside the USB task) produces messages to a queue, and a separate application task consumes from the queue. This completely decouples USB timing from application processing speed.

// cdc_rx_queue.h — CDC receive queue interface
#ifndef CDC_RX_QUEUE_H
#define CDC_RX_QUEUE_H

#include <stdint.h>

// Maximum bytes per received USB packet (CDC bulk EP max = 64 bytes at FS)
#define CDC_RX_PACKET_MAX  64

// Number of packets the queue can hold before dropping
// Size calculation: CDC_RX_QUEUE_DEPTH × CDC_RX_PACKET_MAX bytes
// = 8 × 64 = 512 bytes total queue memory
#define CDC_RX_QUEUE_DEPTH  8

// Structure for one CDC receive message
typedef struct {
    uint8_t  data[CDC_RX_PACKET_MAX];
    uint32_t len;
} cdc_rx_packet_t;

// Initialize the receive queue — call once at startup
void cdc_rx_queue_init(void);

// Called from tud_cdc_rx_cb() — posts packet to queue
// Returns pdTRUE if posted, pdFALSE if queue full (data dropped)
// NOTE: called from USB task context, NOT from ISR
BaseType_t cdc_rx_queue_post(const uint8_t *data, uint32_t len);

// Called from application task — blocks until packet available
BaseType_t cdc_rx_queue_receive(cdc_rx_packet_t *pkt,
                                TickType_t timeout_ticks);

#endif // CDC_RX_QUEUE_H
// cdc_rx_queue.c — implementation
#include "cdc_rx_queue.h"
#include "FreeRTOS.h"
#include "queue.h"
#include <string.h>

static QueueHandle_t s_cdc_rx_queue = NULL;

void cdc_rx_queue_init(void)
{
    // xQueueCreate(uxQueueLength, uxItemSize)
    // Each item is a cdc_rx_packet_t struct (64 + 4 = 68 bytes)
    s_cdc_rx_queue = xQueueCreate(CDC_RX_QUEUE_DEPTH,
                                   sizeof(cdc_rx_packet_t));
    configASSERT(s_cdc_rx_queue != NULL);
}

BaseType_t cdc_rx_queue_post(const uint8_t *data, uint32_t len)
{
    cdc_rx_packet_t pkt;

    // Clamp to maximum packet size
    if (len > CDC_RX_PACKET_MAX) {
        len = CDC_RX_PACKET_MAX;
    }

    memcpy(pkt.data, data, len);
    pkt.len = len;

    // xQueueSend with timeout 0 — do not block.
    // This function is called from the USB task (inside tud_cdc_rx_cb).
    // We must not block the USB task here — if the queue is full,
    // drop the packet (application is too slow to consume).
    return xQueueSend(s_cdc_rx_queue, &pkt, 0);
}

BaseType_t cdc_rx_queue_receive(cdc_rx_packet_t *pkt,
                                TickType_t timeout_ticks)
{
    return xQueueReceive(s_cdc_rx_queue, pkt, timeout_ticks);
}
// tusb_callbacks.c — TinyUSB CDC receive callback
#include "tusb.h"
#include "cdc_rx_queue.h"

// Called by tud_task() when data arrives on CDC bulk OUT endpoint
void tud_cdc_rx_cb(uint8_t itf)
{
    (void)itf;

    uint8_t buf[64];
    uint32_t count;

    // Read all available bytes from TinyUSB CDC FIFO
    // tud_cdc_available() returns byte count, not packet count
    while (tud_cdc_available()) {
        count = tud_cdc_read(buf, sizeof(buf));
        if (count > 0) {
            // Post to application queue — non-blocking from USB task
            BaseType_t result = cdc_rx_queue_post(buf, count);
            if (result != pdTRUE) {
                // Queue full — increment drop counter for diagnostics
                // Do not block — just drop this packet
            }
        }
    }
}

// Application task — consumes CDC data from queue
void cdc_rx_processing_task(void *param)
{
    (void)param;
    cdc_rx_packet_t pkt;

    while (1) {
        // Block until a packet arrives (or 100ms timeout)
        if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(100)) == pdTRUE) {
            // Process the received data
            // Example: command parser
            process_command(pkt.data, pkt.len);
        }
        // If timeout (100ms with no data): check connection, status, etc.
    }
}

DMA Cache Coherency on Cortex-M7

DMA cache coherency is the most insidious USB problem on STM32H7 and STM32F7 devices. It causes data corruption that appears random, manifests only at certain buffer addresses, and disappears when you add debug print statements (because they flush the cache as a side effect). Understanding this is essential before deploying USB on any Cortex-M7 platform.

The Problem: D-Cache and DMA

The Cortex-M7 has a 32-byte data cache (D-cache). When your firmware reads memory, the CPU loads a 32-byte cache line from RAM and caches it. If the DMA controller later writes new data to the same RAM address, the CPU may still read the old cached value — it does not know the DMA has changed the memory. This is the cache coherency problem.

Without cache maintenance (BROKEN):
──────────────────────────────────────────────────────────────
Time 0: CPU reads USB RX buffer[0..31] → loads into D-cache
Time 1: USB DMA receives new packet → writes new data to buffer[0..31]
Time 2: CPU reads USB RX buffer[0..31] → returns STALE CACHED data!
         → Your firmware processes old data, thinking it's new

With SCB_InvalidateDCache_by_Addr() (CORRECT):
──────────────────────────────────────────────────────────────
Time 0: USB DMA receives new packet → writes to buffer[0..31]
Time 1: DMA complete callback fires
Time 2: Call SCB_InvalidateDCache_by_Addr(buffer, 32)
         → CPU cache for buffer[0..31] is invalidated (marked stale)
Time 3: CPU reads USB RX buffer[0..31]
         → Cache miss → CPU loads fresh data from RAM → correct!

With SCB_CleanDCache_by_Addr() (for TX direction):
──────────────────────────────────────────────────────────────
Time 0: CPU fills USB TX buffer[0..31] with new data
         → data is in D-cache, MAY NOT be written to RAM yet
Time 1: Call SCB_CleanDCache_by_Addr(buffer, 32)
         → Forces cache to write pending data to RAM
Time 2: DMA reads buffer from RAM → gets correct data
Time 3: USB transmits correct data to host

Cortex-M4 and M3 Are Not Affected

The STM32F4 (Cortex-M4) and STM32F1/F3 (Cortex-M3) series do not have a D-cache — all memory accesses go directly to RAM without caching. Cache coherency is exclusively a Cortex-M7 issue (STM32H7, STM32F7). If you are developing on an STM32F4 and then port to STM32H7, you will encounter this problem for the first time during the port.

Solution 1: Non-Cached SRAM Region

The cleanest solution is to place all USB DMA buffers in a memory region that is not covered by the D-cache. On STM32H7, SRAM4 (0x38000000, 64 KB) is the ideal candidate — it is accessible by the USB DMA and is in the AHB bus fabric. Configure the MPU (Memory Protection Unit) to mark this region as non-cacheable:

// mpu_config.c — configure MPU for non-cached SRAM4 on STM32H7
#include "stm32h7xx_hal.h"

void MPU_Config(void)
{
    MPU_Region_InitTypeDef mpu_init = {0};

    // Disable MPU before reconfiguring
    HAL_MPU_Disable();

    // Configure SRAM4 (0x38000000, 64 KB) as non-cacheable
    // All USB DMA buffers will be placed in this region
    mpu_init.Enable           = MPU_REGION_ENABLE;
    mpu_init.BaseAddress      = 0x38000000;
    mpu_init.Size             = MPU_REGION_SIZE_64KB;
    mpu_init.AccessPermission = MPU_REGION_FULL_ACCESS;
    mpu_init.IsBufferable     = MPU_ACCESS_BUFFERABLE;
    mpu_init.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;  // KEY
    mpu_init.IsShareable      = MPU_ACCESS_SHAREABLE;
    mpu_init.Number           = MPU_REGION_NUMBER0;
    mpu_init.TypeExtField     = MPU_TEX_LEVEL1;
    mpu_init.SubRegionDisable = 0x00;
    mpu_init.DisableExec      = MPU_INSTRUCTION_ACCESS_DISABLE;
    HAL_MPU_ConfigRegion(&mpu_init);

    // Re-enable MPU with privileged default memory map
    HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}
// Place USB DMA buffers in non-cached SRAM4
// Use the CFG_TUSB_MEM_SECTION attribute in tusb_config.h

// In tusb_config.h:
#define CFG_TUSB_MEM_SECTION  __attribute__((section(".noinit")))
#define CFG_TUSB_MEM_ALIGN    __attribute__((aligned(32)))

// In your linker script (.ld file), add a section in SRAM4:
// .noinit (NOLOAD) :
// {
//   *(.noinit)
// } >SRAM4

Solution 2: Manual Cache Maintenance

If placing buffers in SRAM4 is not possible (e.g., buffer is shared with other peripherals in cached SRAM1), add explicit cache maintenance calls around DMA operations:

// Manual cache maintenance for USB DMA buffers
// Buffer MUST be aligned to 32 bytes (D-cache line size on Cortex-M7)

// USB DMA buffer — aligned to 32 bytes for cache line operations
static uint8_t usb_rx_buf[64] __attribute__((aligned(32)));
static uint8_t usb_tx_buf[64] __attribute__((aligned(32)));

// Before starting a DMA receive (USB DMA will write to usb_rx_buf):
// Invalidate the cache for the receive buffer so the CPU will
// fetch fresh data from RAM after DMA completes
void usb_prepare_dma_receive(void)
{
    // Size rounded up to 32-byte cache line boundary
    SCB_InvalidateDCache_by_Addr((uint32_t*)usb_rx_buf,
                                  sizeof(usb_rx_buf));
}

// After DMA receive complete — before CPU reads the buffer:
// (Actually, invalidate BEFORE DMA starts — see note below)
void usb_dma_receive_complete_cb(void)
{
    // Data is now in RAM. Cache was invalidated before DMA started,
    // so CPU will load fresh data on next access.
    // Process usb_rx_buf safely here.
    process_received_data(usb_rx_buf, sizeof(usb_rx_buf));
}

// Before starting a DMA transmit (USB DMA will read from usb_tx_buf):
// Clean the cache to flush pending writes from cache to RAM
void usb_prepare_dma_transmit(const uint8_t *data, uint32_t len)
{
    memcpy(usb_tx_buf, data, len);
    // Force cache write-back to RAM before DMA reads
    SCB_CleanDCache_by_Addr((uint32_t*)usb_tx_buf, sizeof(usb_tx_buf));
    // Now start DMA — it will read fresh data from RAM
}

STM32H7 USB + FreeRTOS Complete Example

This section brings everything together: a complete STM32H7 project with FreeRTOS + TinyUSB CDC, mutex-protected writes, a producer/consumer queue for receives, non-cached SRAM4 buffers, and the correct USB clock configuration.

tusb_config.h for STM32H7

// tusb_config.h — STM32H7 with FreeRTOS
#ifndef _TUSB_CONFIG_H_
#define _TUSB_CONFIG_H_

// STM32H743 OTG_HS in FS mode (internal PHY, 12 Mbit/s)
// For external ULPI PHY (480 Mbit/s), change to OPT_MODE_HIGH_SPEED
#define CFG_TUSB_MCU          OPT_MCU_STM32H7

// FreeRTOS RTOS backend — enables event-driven blocking in tud_task()
#define CFG_TUSB_OS           CFG_TUSB_OS_FREERTOS

// Debug level: 0=off, 1=errors, 2=all transactions (set to 0 in production)
#define CFG_TUSB_DEBUG        0

// Memory placement: non-cached SRAM4 on STM32H7
// Linker script must have .sram4_bss section pointing to SRAM4
#define CFG_TUSB_MEM_SECTION  __attribute__((section(".sram4_bss")))
#define CFG_TUSB_MEM_ALIGN    __attribute__((aligned(32)))

// Enable CDC device class
#define CFG_TUD_CDC           1
#define CFG_TUD_CDC_RX_BUFSIZE 256
#define CFG_TUD_CDC_TX_BUFSIZE 256

// Disable unused classes
#define CFG_TUD_HID           0
#define CFG_TUD_MSC           0
#define CFG_TUD_MIDI          0
#define CFG_TUD_AUDIO         0
#define CFG_TUD_VENDOR        0

#endif // _TUSB_CONFIG_H_

STM32H7 USB Clock Configuration

// clock_config.c — STM32H7 USB clock from HSI48
// STM32H7 USB requires exactly 48 MHz for FS PHY.
// Source options: HSI48 (internal 48 MHz RC) or PLL3Q (from external HSE).
// HSI48 is the simpler option for most projects.

void SystemClock_Config(void)
{
    RCC_OscInitTypeDef osc = {0};
    RCC_ClkInitTypeDef clk = {0};
    RCC_PeriphClkInitTypeDef periph = {0};

    // Enable HSI48 for USB clock
    osc.OscillatorType      = RCC_OSCILLATORTYPE_HSI48 | RCC_OSCILLATORTYPE_HSE;
    osc.HSI48State          = RCC_HSI48_ON;
    osc.HSEState            = RCC_HSE_ON;  // 25 MHz crystal for main clock
    osc.PLL.PLLState        = RCC_PLL_ON;
    osc.PLL.PLLSource       = RCC_PLLSOURCE_HSE;
    osc.PLL.PLLM            = 5;    // 25 MHz / 5 = 5 MHz VCO input
    osc.PLL.PLLN            = 160;  // × 160 = 800 MHz VCO output
    osc.PLL.PLLP            = 2;    // / 2 = 400 MHz system clock
    osc.PLL.PLLQ            = 4;    // / 4 = 200 MHz (not used for USB)
    osc.PLL.PLLR            = 2;    // / 2 = 400 MHz
    HAL_RCC_OscConfig(&osc);

    // Configure USB clock: use HSI48 directly → 48 MHz
    periph.PeriphClockSelection = RCC_PERIPHCLK_USB;
    periph.UsbClockSelection    = RCC_USBCLKSOURCE_HSI48;
    HAL_RCCEx_PeriphCLKConfig(&periph);

    // USB_OTG_FS clock enable happens in tusb_init() via stm32_fsdev HAL
}

Main Application with FreeRTOS Tasks

// main.c — STM32H7 + FreeRTOS + TinyUSB CDC echo example
#include "stm32h7xx_hal.h"
#include "FreeRTOS.h"
#include "task.h"
#include "semphr.h"
#include "tusb.h"
#include "cdc_thread_safe.h"
#include "cdc_rx_queue.h"

// ── DMA-safe buffer in non-cached SRAM4 ──────────────────────────────────
// TinyUSB places its internal USB buffers here via CFG_TUSB_MEM_SECTION.
// Application buffers that directly interface with DMA should also go here.
static uint8_t usb_echo_buf[64] __attribute__((section(".sram4_bss"),
                                               aligned(32)));

// ── Forward declarations ──────────────────────────────────────────────────
static void usb_device_task(void *param);
static void app_echo_task(void *param);

// ── Main entry point ─────────────────────────────────────────────────────
int main(void)
{
    HAL_Init();
    SystemClock_Config();   // configures HSI48 for USB
    MPU_Config();           // marks SRAM4 as non-cacheable

    // Initialize CDC mutex and receive queue BEFORE creating tasks
    cdc_ts_init();
    cdc_rx_queue_init();

    // Create USB task at highest user priority
    xTaskCreate(usb_device_task, "usb",    512, NULL,
                configMAX_PRIORITIES - 1, NULL);

    // Create echo application task at lower priority
    xTaskCreate(app_echo_task, "echo", 256, NULL,
                tskIDLE_PRIORITY + 2, NULL);

    // Start FreeRTOS scheduler — this function does not return
    vTaskStartScheduler();
    while (1) { __NOP(); }
}

// ── USB device task ───────────────────────────────────────────────────────
static void usb_device_task(void *param)
{
    (void)param;
    tusb_init();

    while (1) {
        // tud_task() blocks internally on the USB event queue (FreeRTOS mode).
        // It wakes up when the USB ISR posts an event.
        // No vTaskDelay needed — adding one reduces responsiveness.
        tud_task();
    }
}

// ── CDC echo application task ─────────────────────────────────────────────
static void app_echo_task(void *param)
{
    (void)param;
    cdc_rx_packet_t pkt;

    while (1) {
        // Block waiting for CDC data (500 ms timeout)
        if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(500)) == pdTRUE) {
            // Echo received data back to host (thread-safe)
            cdc_ts_write(pkt.data, pkt.len);

        } else {
            // Timeout — send a heartbeat to show we're alive
            if (tud_cdc_connected()) {
                cdc_ts_printf("[heartbeat]\r\n");
            }
        }
    }
}
Common STM32H7 USB Gotcha: On STM32H7, the USB OTG_FS peripheral uses a separate USB voltage regulator (3.3 V). You must enable it with HAL_PWREx_EnableUSBVoltageDetector() after HAL_Init(). Without this, the USB PHY does not power up and the device is completely invisible to the host. This is a board bring-up gotcha specific to STM32H7 — not required on STM32F4.

TinyUSB RTOS Hooks

TinyUSB's RTOS abstraction layer defines a set of functions that TinyUSB calls internally for synchronisation and timing. Understanding these helps you diagnose problems and customise behaviour.

Key OSAL Functions (FreeRTOS Mapping)

TinyUSB OSAL Function FreeRTOS Implementation Called When
osal_task_delay(ms) vTaskDelay(ms / portTICK_PERIOD_MS) TinyUSB needs to wait without blocking CPU
osal_queue_create(depth, size) xQueueCreate(depth, size) TinyUSB initialises its internal event queue
osal_queue_send(queue, data, in_isr) xQueueSendFromISR() or xQueueSend() USB ISR posts event to USB task queue
osal_queue_receive(queue, data, ms) xQueueReceive(queue, data, pdMS_TO_TICKS(ms)) USB task waits for next event (blocks here when idle)
osal_semaphore_create() xSemaphoreCreateBinary() TinyUSB creates sync semaphores for endpoint transfers
osal_semaphore_post(sem, in_isr) xSemaphoreGiveFromISR() or xSemaphoreGive() Transfer complete — signal waiting task
osal_semaphore_wait(sem, ms) xSemaphoreTake(sem, pdMS_TO_TICKS(ms)) Wait for endpoint transfer completion

What Happens Without CFG_TUSB_OS (Bare-Metal Mode in RTOS)

If you forget to set CFG_TUSB_OS CFG_TUSB_OS_FREERTOS and leave it at the default CFG_TUSB_OS_NONE, TinyUSB uses a busy-wait loop for all synchronisation. The tud_task() call never blocks — it polls continuously and returns immediately when no events are pending. In a FreeRTOS system, this means:

  • The USB task runs at 100% CPU utilisation, starving every other task
  • The FreeRTOS idle task never runs → FreeRTOS tickless idle does not work → no power saving
  • Lower-priority tasks get almost no CPU time → application appears frozen
  • Watchdog timeouts if you have a watchdog that requires the idle task to run
Debugging Tip: If your FreeRTOS application appears to freeze after USB is connected, and you have no other explanation, check CFG_TUSB_OS first. Setting it to CFG_TUSB_OS_NONE while running FreeRTOS is the second most common TinyUSB + FreeRTOS mistake (after incorrect task priority).

Handling USB Suspend/Resume with RTOS

USB suspend happens when the host stops sending SOF (Start-of-Frame) packets for more than 3 ms — typically when the computer sleeps, the USB cable is not actively used, or the host driver suspends the device to save power. The USB specification requires bus-powered devices to draw no more than 2.5 mA during suspend.

TinyUSB Suspend and Resume Callbacks

// tusb_suspend_cb.c — suspend/resume with FreeRTOS task notifications
#include "tusb.h"
#include "FreeRTOS.h"
#include "task.h"
#include "event_groups.h"

// Event group for USB state changes
// Bit 0: USB suspended
// Bit 1: USB resumed / connected
static EventGroupHandle_t s_usb_events = NULL;
#define USB_EVENT_SUSPENDED  (1 << 0)
#define USB_EVENT_RESUMED    (1 << 1)

void usb_events_init(void)
{
    s_usb_events = xEventGroupCreate();
    configASSERT(s_usb_events != NULL);
}

// Called by TinyUSB when host stops sending SOF for >3 ms
void tud_suspend_cb(bool remote_wakeup_en)
{
    // remote_wakeup_en: true if host allowed device to request wakeup
    (void)remote_wakeup_en;

    // Signal all tasks that USB is suspended
    xEventGroupSetBits(s_usb_events, USB_EVENT_SUSPENDED);
    xEventGroupClearBits(s_usb_events, USB_EVENT_RESUMED);
}

// Called by TinyUSB when host resumes the bus (SOF packets restart)
void tud_resume_cb(void)
{
    // Signal all tasks that USB has resumed
    xEventGroupSetBits(s_usb_events, USB_EVENT_RESUMED);
    xEventGroupClearBits(s_usb_events, USB_EVENT_SUSPENDED);
}

// Application task — respects USB suspend to save power
static void app_echo_task_with_suspend(void *param)
{
    (void)param;
    cdc_rx_packet_t pkt;

    while (1) {
        // Check if USB is suspended
        EventBits_t bits = xEventGroupGetBits(s_usb_events);

        if (bits & USB_EVENT_SUSPENDED) {
            // USB is suspended — pause this task to save CPU
            // Wait until USB resumes (block with no timeout)
            xEventGroupWaitBits(s_usb_events,
                                USB_EVENT_RESUMED,
                                pdFALSE,    // Don't clear on exit
                                pdTRUE,     // Wait for all bits
                                portMAX_DELAY);
        }

        // USB is active — process data
        if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(100)) == pdTRUE) {
            cdc_ts_write(pkt.data, pkt.len);
        }
    }
}

STM32 Stop Mode During USB Suspend

For battery-powered applications, entering STM32 Stop mode while USB is suspended dramatically reduces current consumption. The challenge is ensuring the USB controller can wake the MCU when the host resumes:

// Low-power suspend handling on STM32 with FreeRTOS
// Enter Stop mode only when USB is suspended AND no application tasks
// need CPU time. The USB wakeup interrupt will bring the MCU out of Stop.

void tud_suspend_cb(bool remote_wakeup_en)
{
    (void)remote_wakeup_en;

    // Enable USB wakeup interrupt so we can exit Stop mode
    // This is STM32-specific; exact register depends on MCU family
    __HAL_USB_OTG_FS_WAKEUP_EXTI_ENABLE_IT();

    // Signal FreeRTOS tasks to pause (via event group)
    xEventGroupSetBitsFromISR(s_usb_events, USB_EVENT_SUSPENDED, NULL);

    // Note: actual entry to Stop mode should happen in a low-priority
    // task or in configPRE_SLEEP_PROCESSING() hook — not directly here,
    // as this callback runs in USB task context.
}

// FreeRTOS tickless idle hook — called by FreeRTOS idle task
// when all tasks are blocked and the scheduler has nothing to do
void vApplicationIdleHook(void)
{
    EventBits_t bits = xEventGroupGetBits(s_usb_events);

    if (bits & USB_EVENT_SUSPENDED) {
        // All tasks are idle AND USB is suspended
        // Safe to enter Stop mode
        // HAL_PWR_EnterSTOPMode(PWR_MAINREGULATOR_ON, PWR_STOPENTRY_WFI);
        // Note: this is a simplified illustration; actual Stop mode
        // requires careful handling of peripheral clocks on wakeup.
        __WFI();  // At minimum, use Wait-For-Interrupt
    }
}

Practical Exercises

Exercise 1 Beginner

Priority Starvation Demonstration

Build a TinyUSB CDC device on an STM32F4 (no cache issues) with FreeRTOS. Create two scenarios and measure the difference: (a) USB task at priority 1, application task at priority 5 — run a CDC throughput test and measure KB/s; then observe what happens to CDC communication when the application task runs a busy loop. (b) USB task at priority 6 (highest), application task at priority 2. Repeat the throughput test. The difference in throughput and reliability demonstrates why USB priority must be highest. Document the exact priority configuration, measured throughputs, and any error counts observed in each scenario.

Task Priority FreeRTOS CDC Throughput
Exercise 2 Intermediate

Thread-Safe CDC with Race Condition Detection

Build a FreeRTOS project with 3 tasks all writing to CDC simultaneously: Task A writes "AAAA...A\r\n" (64 A's), Task B writes "BBBB...B\r\n" (64 B's), Task C writes "CCCC...C\r\n" (64 C's). First run WITHOUT a mutex and capture the output — observe interleaving and corruption. Then add the mutex-protected CDC write wrapper and run again — verify that lines are now complete and never interleaved. Use a Python script on the host to count complete lines vs broken lines and calculate the error rate for each scenario. Log your findings with uxTaskGetStackHighWaterMark() for each task to verify stack depth.

Thread Safety Mutex Race Condition
Exercise 3 Advanced

DMA Cache Coherency on STM32H7

This exercise requires an STM32H7 Nucleo or Discovery board with USB OTG_FS. Build a CDC loopback device that echoes every received byte. First, place the USB receive buffer in normal cached SRAM (e.g., SRAM1 at 0x24000000) and observe data corruption: send 1 KB of sequential byte values (0x00, 0x01, 0x02...) and compare received vs echoed — you should see corruption at 32-byte boundaries. Then move the buffer to SRAM4 (0x38000000) with the MPU configured as non-cacheable and repeat — corruption should disappear. Finally, move back to SRAM1 but add SCB_InvalidateDCache_by_Addr() before each DMA receive — verify this also eliminates corruption. Document the exact corruption pattern, the memory addresses involved, and which fix you would use in a production design and why.

STM32H7 DMA Cache Cortex-M7

USB RTOS Design Generator

Use this tool to document your USB + RTOS architecture — project, MCU, RTOS version, USB task priority and stack size, queue configuration, and design notes. Download as Word, Excel, PDF, or PPTX for project documentation or design review.

USB RTOS Design Generator

Document your USB + RTOS architecture. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

Integrating TinyUSB with FreeRTOS is straightforward when you follow the design rules correctly. The key points from this part:

  • Set CFG_TUSB_OS CFG_TUSB_OS_FREERTOS: Without this, TinyUSB busy-waits and starves every other FreeRTOS task. This single configuration change transforms TinyUSB from CPU-wasting poller to efficient event-driven task.
  • USB task at highest user priority: The USB task must be able to preempt all application tasks. A USB host that gets no response within its timeout window will reset the bus — causing the device to re-enumerate, which appears as random disconnects.
  • Use xSemaphoreCreateMutex() for CDC writes: Binary semaphores do not have priority inheritance. Only mutex-type semaphores protect against priority inversion between the high-priority USB task and low-priority application tasks that hold the write lock.
  • Producer/consumer queue for receives: Never block the USB task in a receive callback. Post to a FreeRTOS queue with timeout zero — if the queue is full, drop the packet gracefully. A separate consumer task processes received data at its own pace.
  • DMA cache coherency on Cortex-M7: Place USB DMA buffers in non-cached SRAM4 on STM32H7 (0x38000000). If you must use cached SRAM, call SCB_InvalidateDCache_by_Addr() before each DMA receive and SCB_CleanDCache_by_Addr() before each DMA transmit. Forgetting this causes silent data corruption that is extremely difficult to debug.
  • Handle suspend with event groups: Use xEventGroupSetBits() in tud_suspend_cb() to notify application tasks to pause. Use tud_resume_cb() to resume them. This enables proper power management and prevents application tasks from timing out while the host is asleep.

Next in the Series

In Part 12: Advanced USB Topics, we venture into USB host mode, OTG (On-The-Go) dual-role operation on STM32, isochronous transfers for USB audio, the UAC (USB Audio Class) descriptor hierarchy, and USB video class. These topics require solid understanding of everything covered in Parts 1–11 — the foundation you have built through this series.

Technology