Series Context: This is Part 11 of the 17-part USB Development Mastery series. Parts 1–10 covered fundamentals through USB debugging. This part assumes you have a working bare-metal TinyUSB device from earlier parts. We now add FreeRTOS to that device and make USB work correctly in a multitasking environment.
1
USB Fundamentals
USB system architecture, transfer types, host/device model, protocol stack
2
Electrical & Hardware
D+/D- signalling, pull-ups, connectors, USB-C, STM32 USB peripherals
3
Protocol & Enumeration
Enumeration sequence, USB packets, descriptors, endpoint concepts
4
Device Classes
HID, CDC, MSC, MIDI, Audio, composite devices, vendor class
5
TinyUSB Deep Dive
Stack architecture, execution model, STM32 integration, descriptor callbacks
6
CDC Virtual COM Port
CDC class, bulk transfers, printf over USB, baud rate handling
7
HID Devices
HID descriptors, report format, keyboard/mouse/gamepad implementation
8
Mass Storage
MSC class, SCSI commands, FATFS integration, RAM disk
9
Composite Devices
Multiple classes, IAD descriptor, CDC+HID, CDC+MSC
10
USB Debugging
Wireshark capture, protocol analyser, enumeration debugging, common failures
11
RTOS + USB
FreeRTOS + TinyUSB, task priorities, thread-safe communication
You Are Here
12
Advanced Topics
Host mode, OTG, isochronous, USB audio, USB video
13
Performance Optimization
DMA, zero-copy buffers, throughput maximisation, latency tuning
14
Custom Class Drivers
Vendor class, writing descriptors, OS driver interaction
15
Bare-Metal USB
Direct register programming, writing USB stack from scratch, PHY timing
16
USB Security
BadUSB attacks, device authentication, secure firmware, USB firewall
17
Hardware Design
PCB layout, differential pairs, impedance matching, EMI, USB-C PD
TinyUSB + FreeRTOS Integration Overview
TinyUSB is designed to be RTOS-agnostic. It supports bare-metal (polling in main loop), FreeRTOS, Azure RTOS (ThreadX), Zephyr, and others through an abstraction layer defined in osal/osal_freertos.h. Selecting the correct RTOS is a compile-time choice made in tusb_config.h.
The key configuration is:
// In tusb_config.h — select FreeRTOS as the RTOS abstraction layer
// Options: CFG_TUSB_OS_NONE (bare-metal polling)
// CFG_TUSB_OS_FREERTOS
// CFG_TUSB_OS_MYNEWT
// CFG_TUSB_OS_PICO (RP2040 SDK)
// CFG_TUSB_OS_RTTHREAD
#define CFG_TUSB_OS CFG_TUSB_OS_FREERTOS
When CFG_TUSB_OS_FREERTOS is selected, TinyUSB's OSAL (OS Abstraction Layer) maps its internal primitives to FreeRTOS equivalents:
| TinyUSB OSAL Primitive |
FreeRTOS Equivalent |
Used For |
osal_queue_t |
QueueHandle_t |
USB event queue between ISR and USB task |
osal_semaphore_t |
SemaphoreHandle_t |
Signalling USB task from USB ISR |
osal_mutex_t |
SemaphoreHandle_t (mutex) |
Protecting shared USB resources |
osal_task_delay(ms) |
vTaskDelay(ms / portTICK_PERIOD_MS) |
Yielding in USB task when no events pending |
Critical Rule: Never call TinyUSB API functions (tud_cdc_write(), tud_hid_report(), etc.) from within a FreeRTOS ISR context. All TinyUSB API calls must come from task context. TinyUSB's callbacks (tud_cdc_rx_cb(), etc.) are called from within tud_task(), which runs in the USB task — they are safe to use FreeRTOS ISR-safe APIs from, but not regular task APIs.
USB Task Design
The fundamental rule of TinyUSB + FreeRTOS integration is: tud_task() must run in a dedicated task. You cannot call tud_task() from multiple tasks, from the idle task, or from within an ISR. One task, one tud_task(), always.
Basic USB Task Structure
// usb_task.c — the dedicated USB device task
#include "tusb.h"
#include "FreeRTOS.h"
#include "task.h"
// Stack size: 512 words (2048 bytes) is the recommended minimum.
// Increase to 1024 words if using complex class callbacks or debug output.
#define USB_DEVICE_TASK_STACK_SIZE 512
#define USB_DEVICE_TASK_PRIORITY (configMAX_PRIORITIES - 1)
// Forward declaration
static void usb_device_task(void *param);
void usb_task_init(void)
{
// Initialize TinyUSB BEFORE creating the task
tusb_init();
// Create the USB device task
xTaskCreate(
usb_device_task, // Task function
"usb_dev", // Task name (for debugging)
USB_DEVICE_TASK_STACK_SIZE, // Stack depth in words
NULL, // Task parameter (unused)
USB_DEVICE_TASK_PRIORITY, // Priority
NULL // Task handle (not needed here)
);
}
static void usb_device_task(void *param)
{
(void)param;
// tud_task() processes USB events:
// - Calls endpoint callbacks (tud_cdc_rx_cb, tud_hid_set_report_cb, etc.)
// - Handles control requests (enumeration, class commands)
// - Manages endpoint state machines
//
// With CFG_TUSB_OS_FREERTOS, tud_task() internally calls
// osal_queue_receive() which blocks on the USB event queue.
// The USB ISR posts events to this queue, unblocking the task.
// This means the USB task sleeps when idle — no CPU waste.
while (1) {
tud_task();
// No vTaskDelay needed here — tud_task() blocks internally
// via osal_queue_receive() with portMAX_DELAY when no events.
// Adding vTaskDelay(1) here would REDUCE responsiveness.
}
}
Stack Size Considerations
The 512-word (2048-byte) stack recommendation covers:
- TinyUSB internal control request processing: ~200 bytes
- Descriptor callbacks (your code): typically 100–300 bytes
- Class driver callbacks (CDC, HID, MSC): 100–400 bytes each
- Debug output via
tu_printf if enabled: add 256 bytes for the format buffer
If you see stack overflow crashes (detected via FreeRTOS stack watermark: uxTaskGetStackHighWaterMark(NULL)), increase to 1024 words. Enable configCHECK_FOR_STACK_OVERFLOW 2 in FreeRTOSConfig.h to catch stack overflows at runtime.
portYIELD vs vTaskDelay
When using CFG_TUSB_OS_FREERTOS, never add vTaskDelay(1) after tud_task(). The FreeRTOS OSAL already handles task blocking via the queue receive — adding a delay would introduce up to 1 tick (1 ms by default) of additional latency for every USB event. This is acceptable in bare-metal mode (where you poll in a while loop), but not in RTOS mode where the event-driven blocking is already efficient.
FreeRTOS Task Priorities for USB
Priority assignment is the most critical and most frequently wrong aspect of USB + RTOS integration. Getting it wrong causes intermittent disconnects, dropped data, and enumeration failures that are nearly impossible to debug without understanding the priority relationship.
The Priority Ladder
Priority ladder (highest to lowest):
══════════════════════════════════════════════════════════════
Priority 7 (configMAX_PRIORITIES-1): USB device task (tud_task)
→ Must respond to USB events within ~1 ms
→ If delayed longer, host sees timeout → bus reset → re-enumeration
Priority 6: Time-critical application tasks
→ Tasks that produce/consume USB data at high rate
Priority 5: CDC receive processing task
→ Reads from USB receive queue and processes data
Priority 4: General application tasks
→ Non-time-critical application logic
Priority 3: Logging / diagnostic tasks
→ Low priority — can be delayed without affecting USB
Priority 2: (reserved — avoid)
Priority 1 (tskIDLE_PRIORITY+1): Lowest user task priority
Priority 0 (tskIDLE_PRIORITY): FreeRTOS idle task (automatic)
══════════════════════════════════════════════════════════════
Why USB Must Be Highest: The USB host expects the device to respond to control requests within 50 ms (per USB spec), and to respond to IN tokens within 6.5 bit times at FS. If the USB task is blocked by a lower-priority task for more than a few milliseconds during active transfers, the host may see a timeout and reset the bus. This manifests as the device disconnecting and re-enumerating randomly — a notoriously difficult problem to diagnose.
Priority Inversion Scenario
Priority inversion occurs when a high-priority task (USB task) is blocked waiting for a resource held by a low-priority task (application task). FreeRTOS provides priority inheritance in mutexes to mitigate this — but only for xSemaphoreCreateMutex(), not xSemaphoreCreateBinary(). The scenario:
Priority Inversion in USB + FreeRTOS:
────────────────────────────────────────────────────────
Time 0ms: App task (priority 4) takes CDC write mutex
Time 1ms: USB task (priority 7) needs CDC write mutex → BLOCKED
Time 1ms: Medium-priority task (priority 5) preempts App task
→ App task cannot release mutex
→ USB task stays blocked
→ Host sees USB device not responding → bus reset
────────────────────────────────────────────────────────
FreeRTOS priority inheritance fix:
When USB task (P7) blocks on mutex held by App task (P4),
FreeRTOS temporarily raises App task priority to P7.
This prevents medium-priority tasks from preempting App task.
App task completes, releases mutex, returns to P4.
USB task resumes at P7.
────────────────────────────────────────────────────────
Requirement: Use xSemaphoreCreateMutex() (not binary semaphore)
for any mutex that can be held by application tasks
and waited on by the USB task.
Creating Tasks with Correct Priorities
// main.c — creating all tasks with correct priorities
#include "FreeRTOS.h"
#include "task.h"
#include "usb_task.h"
#include "app_task.h"
// Task handle declarations
TaskHandle_t usb_task_handle = NULL;
TaskHandle_t app_task_handle = NULL;
TaskHandle_t cdc_rx_task_handle = NULL;
int main(void)
{
// Hardware initialization (clock, GPIO, UART, etc.)
SystemClock_Config();
HAL_Init();
// Initialize USB task (calls tusb_init() internally)
// USB task runs at configMAX_PRIORITIES - 1
xTaskCreate(usb_device_task, "usb_dev", 512, NULL,
configMAX_PRIORITIES - 1, &usb_task_handle);
// CDC receive processing: lower than USB, higher than app
xTaskCreate(cdc_rx_processing_task, "cdc_rx", 256, NULL,
configMAX_PRIORITIES - 3, &cdc_rx_task_handle);
// Application task: lowest user priority
xTaskCreate(app_main_task, "app", 512, NULL,
tskIDLE_PRIORITY + 1, &app_task_handle);
// Start scheduler — does not return
vTaskStartScheduler();
// Should never reach here
while (1) { __NOP(); }
}
Thread-Safe CDC with FreeRTOS
The core problem with CDC in a multitasking system: tud_cdc_write() and tud_cdc_write_flush() access a shared transmit buffer. If two tasks both call tud_cdc_write() simultaneously, the data interleaves randomly — you get garbled output and potential buffer corruption.
The solution is a mutex that serialises all CDC write operations:
// cdc_thread_safe.h — thread-safe CDC write interface
#ifndef CDC_THREAD_SAFE_H
#define CDC_THREAD_SAFE_H
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
// Initialize the CDC mutex — call once before starting tasks
void cdc_ts_init(void);
// Thread-safe CDC write — blocks until mutex available
// Returns number of bytes actually written (may be less than len if FIFO full)
uint32_t cdc_ts_write(const void *buf, uint32_t len);
// Thread-safe CDC printf — convenience wrapper
int cdc_ts_printf(const char *fmt, ...);
// Thread-safe CDC write for multi-port TinyUSB (port 0 = default)
uint32_t cdc_ts_write_n(uint8_t itf, const void *buf, uint32_t len);
#endif // CDC_THREAD_SAFE_H
// cdc_thread_safe.c — implementation
#include "cdc_thread_safe.h"
#include "tusb.h"
#include "FreeRTOS.h"
#include "semphr.h"
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
// Mutex handle — created at init, never deleted
static SemaphoreHandle_t s_cdc_mutex = NULL;
void cdc_ts_init(void)
{
// Create a mutex with priority inheritance.
// xSemaphoreCreateMutex() enables FreeRTOS priority inheritance,
// which prevents priority inversion between USB task and app tasks.
s_cdc_mutex = xSemaphoreCreateMutex();
configASSERT(s_cdc_mutex != NULL);
}
uint32_t cdc_ts_write(const void *buf, uint32_t len)
{
uint32_t written = 0;
// Take the mutex — wait forever (USB task at higher priority,
// so app task will be promoted via priority inheritance if needed)
if (xSemaphoreTake(s_cdc_mutex, portMAX_DELAY) == pdTRUE) {
// Check that CDC is connected and ready
if (tud_cdc_connected()) {
written = tud_cdc_write(buf, len);
// Flush immediately — without this, data sits in the
// TinyUSB transmit FIFO until the FIFO is full or the
// USB task triggers a flush automatically.
// For low-latency output, always flush after write.
tud_cdc_write_flush();
}
xSemaphoreGive(s_cdc_mutex);
}
return written;
}
int cdc_ts_printf(const char *fmt, ...)
{
char buf[256];
va_list args;
va_start(args, fmt);
int len = vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
if (len > 0) {
cdc_ts_write(buf, (uint32_t)len);
}
return len;
}
uint32_t cdc_ts_write_n(uint8_t itf, const void *buf, uint32_t len)
{
uint32_t written = 0;
if (xSemaphoreTake(s_cdc_mutex, portMAX_DELAY) == pdTRUE) {
if (tud_cdc_n_connected(itf)) {
written = tud_cdc_n_write(itf, buf, len);
tud_cdc_n_write_flush(itf);
}
xSemaphoreGive(s_cdc_mutex);
}
return written;
}
Timeout Note: Using portMAX_DELAY in application tasks means they will block indefinitely if the mutex is never released (e.g., if the USB task crashes while holding the mutex — which cannot happen with this design since only application tasks hold the write mutex). For real-time tasks that cannot afford to block indefinitely, use a finite timeout like pdMS_TO_TICKS(10) and handle the timeout case gracefully.
Producer/Consumer Queue Pattern
The cleanest architecture for CDC receive with FreeRTOS is a producer/consumer queue pattern: the USB callback (running inside the USB task) produces messages to a queue, and a separate application task consumes from the queue. This completely decouples USB timing from application processing speed.
// cdc_rx_queue.h — CDC receive queue interface
#ifndef CDC_RX_QUEUE_H
#define CDC_RX_QUEUE_H
#include <stdint.h>
// Maximum bytes per received USB packet (CDC bulk EP max = 64 bytes at FS)
#define CDC_RX_PACKET_MAX 64
// Number of packets the queue can hold before dropping
// Size calculation: CDC_RX_QUEUE_DEPTH × CDC_RX_PACKET_MAX bytes
// = 8 × 64 = 512 bytes total queue memory
#define CDC_RX_QUEUE_DEPTH 8
// Structure for one CDC receive message
typedef struct {
uint8_t data[CDC_RX_PACKET_MAX];
uint32_t len;
} cdc_rx_packet_t;
// Initialize the receive queue — call once at startup
void cdc_rx_queue_init(void);
// Called from tud_cdc_rx_cb() — posts packet to queue
// Returns pdTRUE if posted, pdFALSE if queue full (data dropped)
// NOTE: called from USB task context, NOT from ISR
BaseType_t cdc_rx_queue_post(const uint8_t *data, uint32_t len);
// Called from application task — blocks until packet available
BaseType_t cdc_rx_queue_receive(cdc_rx_packet_t *pkt,
TickType_t timeout_ticks);
#endif // CDC_RX_QUEUE_H
// cdc_rx_queue.c — implementation
#include "cdc_rx_queue.h"
#include "FreeRTOS.h"
#include "queue.h"
#include <string.h>
static QueueHandle_t s_cdc_rx_queue = NULL;
void cdc_rx_queue_init(void)
{
// xQueueCreate(uxQueueLength, uxItemSize)
// Each item is a cdc_rx_packet_t struct (64 + 4 = 68 bytes)
s_cdc_rx_queue = xQueueCreate(CDC_RX_QUEUE_DEPTH,
sizeof(cdc_rx_packet_t));
configASSERT(s_cdc_rx_queue != NULL);
}
BaseType_t cdc_rx_queue_post(const uint8_t *data, uint32_t len)
{
cdc_rx_packet_t pkt;
// Clamp to maximum packet size
if (len > CDC_RX_PACKET_MAX) {
len = CDC_RX_PACKET_MAX;
}
memcpy(pkt.data, data, len);
pkt.len = len;
// xQueueSend with timeout 0 — do not block.
// This function is called from the USB task (inside tud_cdc_rx_cb).
// We must not block the USB task here — if the queue is full,
// drop the packet (application is too slow to consume).
return xQueueSend(s_cdc_rx_queue, &pkt, 0);
}
BaseType_t cdc_rx_queue_receive(cdc_rx_packet_t *pkt,
TickType_t timeout_ticks)
{
return xQueueReceive(s_cdc_rx_queue, pkt, timeout_ticks);
}
// tusb_callbacks.c — TinyUSB CDC receive callback
#include "tusb.h"
#include "cdc_rx_queue.h"
// Called by tud_task() when data arrives on CDC bulk OUT endpoint
void tud_cdc_rx_cb(uint8_t itf)
{
(void)itf;
uint8_t buf[64];
uint32_t count;
// Read all available bytes from TinyUSB CDC FIFO
// tud_cdc_available() returns byte count, not packet count
while (tud_cdc_available()) {
count = tud_cdc_read(buf, sizeof(buf));
if (count > 0) {
// Post to application queue — non-blocking from USB task
BaseType_t result = cdc_rx_queue_post(buf, count);
if (result != pdTRUE) {
// Queue full — increment drop counter for diagnostics
// Do not block — just drop this packet
}
}
}
}
// Application task — consumes CDC data from queue
void cdc_rx_processing_task(void *param)
{
(void)param;
cdc_rx_packet_t pkt;
while (1) {
// Block until a packet arrives (or 100ms timeout)
if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(100)) == pdTRUE) {
// Process the received data
// Example: command parser
process_command(pkt.data, pkt.len);
}
// If timeout (100ms with no data): check connection, status, etc.
}
}
DMA Cache Coherency on Cortex-M7
DMA cache coherency is the most insidious USB problem on STM32H7 and STM32F7 devices. It causes data corruption that appears random, manifests only at certain buffer addresses, and disappears when you add debug print statements (because they flush the cache as a side effect). Understanding this is essential before deploying USB on any Cortex-M7 platform.
The Problem: D-Cache and DMA
The Cortex-M7 has a 32-byte data cache (D-cache). When your firmware reads memory, the CPU loads a 32-byte cache line from RAM and caches it. If the DMA controller later writes new data to the same RAM address, the CPU may still read the old cached value — it does not know the DMA has changed the memory. This is the cache coherency problem.
Without cache maintenance (BROKEN):
──────────────────────────────────────────────────────────────
Time 0: CPU reads USB RX buffer[0..31] → loads into D-cache
Time 1: USB DMA receives new packet → writes new data to buffer[0..31]
Time 2: CPU reads USB RX buffer[0..31] → returns STALE CACHED data!
→ Your firmware processes old data, thinking it's new
With SCB_InvalidateDCache_by_Addr() (CORRECT):
──────────────────────────────────────────────────────────────
Time 0: USB DMA receives new packet → writes to buffer[0..31]
Time 1: DMA complete callback fires
Time 2: Call SCB_InvalidateDCache_by_Addr(buffer, 32)
→ CPU cache for buffer[0..31] is invalidated (marked stale)
Time 3: CPU reads USB RX buffer[0..31]
→ Cache miss → CPU loads fresh data from RAM → correct!
With SCB_CleanDCache_by_Addr() (for TX direction):
──────────────────────────────────────────────────────────────
Time 0: CPU fills USB TX buffer[0..31] with new data
→ data is in D-cache, MAY NOT be written to RAM yet
Time 1: Call SCB_CleanDCache_by_Addr(buffer, 32)
→ Forces cache to write pending data to RAM
Time 2: DMA reads buffer from RAM → gets correct data
Time 3: USB transmits correct data to host
Cortex-M4 and M3 Are Not Affected
The STM32F4 (Cortex-M4) and STM32F1/F3 (Cortex-M3) series do not have a D-cache — all memory accesses go directly to RAM without caching. Cache coherency is exclusively a Cortex-M7 issue (STM32H7, STM32F7). If you are developing on an STM32F4 and then port to STM32H7, you will encounter this problem for the first time during the port.
Solution 1: Non-Cached SRAM Region
The cleanest solution is to place all USB DMA buffers in a memory region that is not covered by the D-cache. On STM32H7, SRAM4 (0x38000000, 64 KB) is the ideal candidate — it is accessible by the USB DMA and is in the AHB bus fabric. Configure the MPU (Memory Protection Unit) to mark this region as non-cacheable:
// mpu_config.c — configure MPU for non-cached SRAM4 on STM32H7
#include "stm32h7xx_hal.h"
void MPU_Config(void)
{
MPU_Region_InitTypeDef mpu_init = {0};
// Disable MPU before reconfiguring
HAL_MPU_Disable();
// Configure SRAM4 (0x38000000, 64 KB) as non-cacheable
// All USB DMA buffers will be placed in this region
mpu_init.Enable = MPU_REGION_ENABLE;
mpu_init.BaseAddress = 0x38000000;
mpu_init.Size = MPU_REGION_SIZE_64KB;
mpu_init.AccessPermission = MPU_REGION_FULL_ACCESS;
mpu_init.IsBufferable = MPU_ACCESS_BUFFERABLE;
mpu_init.IsCacheable = MPU_ACCESS_NOT_CACHEABLE; // KEY
mpu_init.IsShareable = MPU_ACCESS_SHAREABLE;
mpu_init.Number = MPU_REGION_NUMBER0;
mpu_init.TypeExtField = MPU_TEX_LEVEL1;
mpu_init.SubRegionDisable = 0x00;
mpu_init.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&mpu_init);
// Re-enable MPU with privileged default memory map
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}
// Place USB DMA buffers in non-cached SRAM4
// Use the CFG_TUSB_MEM_SECTION attribute in tusb_config.h
// In tusb_config.h:
#define CFG_TUSB_MEM_SECTION __attribute__((section(".noinit")))
#define CFG_TUSB_MEM_ALIGN __attribute__((aligned(32)))
// In your linker script (.ld file), add a section in SRAM4:
// .noinit (NOLOAD) :
// {
// *(.noinit)
// } >SRAM4
Solution 2: Manual Cache Maintenance
If placing buffers in SRAM4 is not possible (e.g., buffer is shared with other peripherals in cached SRAM1), add explicit cache maintenance calls around DMA operations:
// Manual cache maintenance for USB DMA buffers
// Buffer MUST be aligned to 32 bytes (D-cache line size on Cortex-M7)
// USB DMA buffer — aligned to 32 bytes for cache line operations
static uint8_t usb_rx_buf[64] __attribute__((aligned(32)));
static uint8_t usb_tx_buf[64] __attribute__((aligned(32)));
// Before starting a DMA receive (USB DMA will write to usb_rx_buf):
// Invalidate the cache for the receive buffer so the CPU will
// fetch fresh data from RAM after DMA completes
void usb_prepare_dma_receive(void)
{
// Size rounded up to 32-byte cache line boundary
SCB_InvalidateDCache_by_Addr((uint32_t*)usb_rx_buf,
sizeof(usb_rx_buf));
}
// After DMA receive complete — before CPU reads the buffer:
// (Actually, invalidate BEFORE DMA starts — see note below)
void usb_dma_receive_complete_cb(void)
{
// Data is now in RAM. Cache was invalidated before DMA started,
// so CPU will load fresh data on next access.
// Process usb_rx_buf safely here.
process_received_data(usb_rx_buf, sizeof(usb_rx_buf));
}
// Before starting a DMA transmit (USB DMA will read from usb_tx_buf):
// Clean the cache to flush pending writes from cache to RAM
void usb_prepare_dma_transmit(const uint8_t *data, uint32_t len)
{
memcpy(usb_tx_buf, data, len);
// Force cache write-back to RAM before DMA reads
SCB_CleanDCache_by_Addr((uint32_t*)usb_tx_buf, sizeof(usb_tx_buf));
// Now start DMA — it will read fresh data from RAM
}
STM32H7 USB + FreeRTOS Complete Example
This section brings everything together: a complete STM32H7 project with FreeRTOS + TinyUSB CDC, mutex-protected writes, a producer/consumer queue for receives, non-cached SRAM4 buffers, and the correct USB clock configuration.
tusb_config.h for STM32H7
// tusb_config.h — STM32H7 with FreeRTOS
#ifndef _TUSB_CONFIG_H_
#define _TUSB_CONFIG_H_
// STM32H743 OTG_HS in FS mode (internal PHY, 12 Mbit/s)
// For external ULPI PHY (480 Mbit/s), change to OPT_MODE_HIGH_SPEED
#define CFG_TUSB_MCU OPT_MCU_STM32H7
// FreeRTOS RTOS backend — enables event-driven blocking in tud_task()
#define CFG_TUSB_OS CFG_TUSB_OS_FREERTOS
// Debug level: 0=off, 1=errors, 2=all transactions (set to 0 in production)
#define CFG_TUSB_DEBUG 0
// Memory placement: non-cached SRAM4 on STM32H7
// Linker script must have .sram4_bss section pointing to SRAM4
#define CFG_TUSB_MEM_SECTION __attribute__((section(".sram4_bss")))
#define CFG_TUSB_MEM_ALIGN __attribute__((aligned(32)))
// Enable CDC device class
#define CFG_TUD_CDC 1
#define CFG_TUD_CDC_RX_BUFSIZE 256
#define CFG_TUD_CDC_TX_BUFSIZE 256
// Disable unused classes
#define CFG_TUD_HID 0
#define CFG_TUD_MSC 0
#define CFG_TUD_MIDI 0
#define CFG_TUD_AUDIO 0
#define CFG_TUD_VENDOR 0
#endif // _TUSB_CONFIG_H_
STM32H7 USB Clock Configuration
// clock_config.c — STM32H7 USB clock from HSI48
// STM32H7 USB requires exactly 48 MHz for FS PHY.
// Source options: HSI48 (internal 48 MHz RC) or PLL3Q (from external HSE).
// HSI48 is the simpler option for most projects.
void SystemClock_Config(void)
{
RCC_OscInitTypeDef osc = {0};
RCC_ClkInitTypeDef clk = {0};
RCC_PeriphClkInitTypeDef periph = {0};
// Enable HSI48 for USB clock
osc.OscillatorType = RCC_OSCILLATORTYPE_HSI48 | RCC_OSCILLATORTYPE_HSE;
osc.HSI48State = RCC_HSI48_ON;
osc.HSEState = RCC_HSE_ON; // 25 MHz crystal for main clock
osc.PLL.PLLState = RCC_PLL_ON;
osc.PLL.PLLSource = RCC_PLLSOURCE_HSE;
osc.PLL.PLLM = 5; // 25 MHz / 5 = 5 MHz VCO input
osc.PLL.PLLN = 160; // × 160 = 800 MHz VCO output
osc.PLL.PLLP = 2; // / 2 = 400 MHz system clock
osc.PLL.PLLQ = 4; // / 4 = 200 MHz (not used for USB)
osc.PLL.PLLR = 2; // / 2 = 400 MHz
HAL_RCC_OscConfig(&osc);
// Configure USB clock: use HSI48 directly → 48 MHz
periph.PeriphClockSelection = RCC_PERIPHCLK_USB;
periph.UsbClockSelection = RCC_USBCLKSOURCE_HSI48;
HAL_RCCEx_PeriphCLKConfig(&periph);
// USB_OTG_FS clock enable happens in tusb_init() via stm32_fsdev HAL
}
Main Application with FreeRTOS Tasks
// main.c — STM32H7 + FreeRTOS + TinyUSB CDC echo example
#include "stm32h7xx_hal.h"
#include "FreeRTOS.h"
#include "task.h"
#include "semphr.h"
#include "tusb.h"
#include "cdc_thread_safe.h"
#include "cdc_rx_queue.h"
// ── DMA-safe buffer in non-cached SRAM4 ──────────────────────────────────
// TinyUSB places its internal USB buffers here via CFG_TUSB_MEM_SECTION.
// Application buffers that directly interface with DMA should also go here.
static uint8_t usb_echo_buf[64] __attribute__((section(".sram4_bss"),
aligned(32)));
// ── Forward declarations ──────────────────────────────────────────────────
static void usb_device_task(void *param);
static void app_echo_task(void *param);
// ── Main entry point ─────────────────────────────────────────────────────
int main(void)
{
HAL_Init();
SystemClock_Config(); // configures HSI48 for USB
MPU_Config(); // marks SRAM4 as non-cacheable
// Initialize CDC mutex and receive queue BEFORE creating tasks
cdc_ts_init();
cdc_rx_queue_init();
// Create USB task at highest user priority
xTaskCreate(usb_device_task, "usb", 512, NULL,
configMAX_PRIORITIES - 1, NULL);
// Create echo application task at lower priority
xTaskCreate(app_echo_task, "echo", 256, NULL,
tskIDLE_PRIORITY + 2, NULL);
// Start FreeRTOS scheduler — this function does not return
vTaskStartScheduler();
while (1) { __NOP(); }
}
// ── USB device task ───────────────────────────────────────────────────────
static void usb_device_task(void *param)
{
(void)param;
tusb_init();
while (1) {
// tud_task() blocks internally on the USB event queue (FreeRTOS mode).
// It wakes up when the USB ISR posts an event.
// No vTaskDelay needed — adding one reduces responsiveness.
tud_task();
}
}
// ── CDC echo application task ─────────────────────────────────────────────
static void app_echo_task(void *param)
{
(void)param;
cdc_rx_packet_t pkt;
while (1) {
// Block waiting for CDC data (500 ms timeout)
if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(500)) == pdTRUE) {
// Echo received data back to host (thread-safe)
cdc_ts_write(pkt.data, pkt.len);
} else {
// Timeout — send a heartbeat to show we're alive
if (tud_cdc_connected()) {
cdc_ts_printf("[heartbeat]\r\n");
}
}
}
}
Common STM32H7 USB Gotcha: On STM32H7, the USB OTG_FS peripheral uses a separate USB voltage regulator (3.3 V). You must enable it with HAL_PWREx_EnableUSBVoltageDetector() after HAL_Init(). Without this, the USB PHY does not power up and the device is completely invisible to the host. This is a board bring-up gotcha specific to STM32H7 — not required on STM32F4.
TinyUSB RTOS Hooks
TinyUSB's RTOS abstraction layer defines a set of functions that TinyUSB calls internally for synchronisation and timing. Understanding these helps you diagnose problems and customise behaviour.
Key OSAL Functions (FreeRTOS Mapping)
| TinyUSB OSAL Function |
FreeRTOS Implementation |
Called When |
osal_task_delay(ms) |
vTaskDelay(ms / portTICK_PERIOD_MS) |
TinyUSB needs to wait without blocking CPU |
osal_queue_create(depth, size) |
xQueueCreate(depth, size) |
TinyUSB initialises its internal event queue |
osal_queue_send(queue, data, in_isr) |
xQueueSendFromISR() or xQueueSend() |
USB ISR posts event to USB task queue |
osal_queue_receive(queue, data, ms) |
xQueueReceive(queue, data, pdMS_TO_TICKS(ms)) |
USB task waits for next event (blocks here when idle) |
osal_semaphore_create() |
xSemaphoreCreateBinary() |
TinyUSB creates sync semaphores for endpoint transfers |
osal_semaphore_post(sem, in_isr) |
xSemaphoreGiveFromISR() or xSemaphoreGive() |
Transfer complete — signal waiting task |
osal_semaphore_wait(sem, ms) |
xSemaphoreTake(sem, pdMS_TO_TICKS(ms)) |
Wait for endpoint transfer completion |
What Happens Without CFG_TUSB_OS (Bare-Metal Mode in RTOS)
If you forget to set CFG_TUSB_OS CFG_TUSB_OS_FREERTOS and leave it at the default CFG_TUSB_OS_NONE, TinyUSB uses a busy-wait loop for all synchronisation. The tud_task() call never blocks — it polls continuously and returns immediately when no events are pending. In a FreeRTOS system, this means:
- The USB task runs at 100% CPU utilisation, starving every other task
- The FreeRTOS idle task never runs → FreeRTOS tickless idle does not work → no power saving
- Lower-priority tasks get almost no CPU time → application appears frozen
- Watchdog timeouts if you have a watchdog that requires the idle task to run
Debugging Tip: If your FreeRTOS application appears to freeze after USB is connected, and you have no other explanation, check CFG_TUSB_OS first. Setting it to CFG_TUSB_OS_NONE while running FreeRTOS is the second most common TinyUSB + FreeRTOS mistake (after incorrect task priority).
Handling USB Suspend/Resume with RTOS
USB suspend happens when the host stops sending SOF (Start-of-Frame) packets for more than 3 ms — typically when the computer sleeps, the USB cable is not actively used, or the host driver suspends the device to save power. The USB specification requires bus-powered devices to draw no more than 2.5 mA during suspend.
TinyUSB Suspend and Resume Callbacks
// tusb_suspend_cb.c — suspend/resume with FreeRTOS task notifications
#include "tusb.h"
#include "FreeRTOS.h"
#include "task.h"
#include "event_groups.h"
// Event group for USB state changes
// Bit 0: USB suspended
// Bit 1: USB resumed / connected
static EventGroupHandle_t s_usb_events = NULL;
#define USB_EVENT_SUSPENDED (1 << 0)
#define USB_EVENT_RESUMED (1 << 1)
void usb_events_init(void)
{
s_usb_events = xEventGroupCreate();
configASSERT(s_usb_events != NULL);
}
// Called by TinyUSB when host stops sending SOF for >3 ms
void tud_suspend_cb(bool remote_wakeup_en)
{
// remote_wakeup_en: true if host allowed device to request wakeup
(void)remote_wakeup_en;
// Signal all tasks that USB is suspended
xEventGroupSetBits(s_usb_events, USB_EVENT_SUSPENDED);
xEventGroupClearBits(s_usb_events, USB_EVENT_RESUMED);
}
// Called by TinyUSB when host resumes the bus (SOF packets restart)
void tud_resume_cb(void)
{
// Signal all tasks that USB has resumed
xEventGroupSetBits(s_usb_events, USB_EVENT_RESUMED);
xEventGroupClearBits(s_usb_events, USB_EVENT_SUSPENDED);
}
// Application task — respects USB suspend to save power
static void app_echo_task_with_suspend(void *param)
{
(void)param;
cdc_rx_packet_t pkt;
while (1) {
// Check if USB is suspended
EventBits_t bits = xEventGroupGetBits(s_usb_events);
if (bits & USB_EVENT_SUSPENDED) {
// USB is suspended — pause this task to save CPU
// Wait until USB resumes (block with no timeout)
xEventGroupWaitBits(s_usb_events,
USB_EVENT_RESUMED,
pdFALSE, // Don't clear on exit
pdTRUE, // Wait for all bits
portMAX_DELAY);
}
// USB is active — process data
if (cdc_rx_queue_receive(&pkt, pdMS_TO_TICKS(100)) == pdTRUE) {
cdc_ts_write(pkt.data, pkt.len);
}
}
}
STM32 Stop Mode During USB Suspend
For battery-powered applications, entering STM32 Stop mode while USB is suspended dramatically reduces current consumption. The challenge is ensuring the USB controller can wake the MCU when the host resumes:
// Low-power suspend handling on STM32 with FreeRTOS
// Enter Stop mode only when USB is suspended AND no application tasks
// need CPU time. The USB wakeup interrupt will bring the MCU out of Stop.
void tud_suspend_cb(bool remote_wakeup_en)
{
(void)remote_wakeup_en;
// Enable USB wakeup interrupt so we can exit Stop mode
// This is STM32-specific; exact register depends on MCU family
__HAL_USB_OTG_FS_WAKEUP_EXTI_ENABLE_IT();
// Signal FreeRTOS tasks to pause (via event group)
xEventGroupSetBitsFromISR(s_usb_events, USB_EVENT_SUSPENDED, NULL);
// Note: actual entry to Stop mode should happen in a low-priority
// task or in configPRE_SLEEP_PROCESSING() hook — not directly here,
// as this callback runs in USB task context.
}
// FreeRTOS tickless idle hook — called by FreeRTOS idle task
// when all tasks are blocked and the scheduler has nothing to do
void vApplicationIdleHook(void)
{
EventBits_t bits = xEventGroupGetBits(s_usb_events);
if (bits & USB_EVENT_SUSPENDED) {
// All tasks are idle AND USB is suspended
// Safe to enter Stop mode
// HAL_PWR_EnterSTOPMode(PWR_MAINREGULATOR_ON, PWR_STOPENTRY_WFI);
// Note: this is a simplified illustration; actual Stop mode
// requires careful handling of peripheral clocks on wakeup.
__WFI(); // At minimum, use Wait-For-Interrupt
}
}
Practical Exercises
Exercise 1
Beginner
Priority Starvation Demonstration
Build a TinyUSB CDC device on an STM32F4 (no cache issues) with FreeRTOS. Create two scenarios and measure the difference: (a) USB task at priority 1, application task at priority 5 — run a CDC throughput test and measure KB/s; then observe what happens to CDC communication when the application task runs a busy loop. (b) USB task at priority 6 (highest), application task at priority 2. Repeat the throughput test. The difference in throughput and reliability demonstrates why USB priority must be highest. Document the exact priority configuration, measured throughputs, and any error counts observed in each scenario.
Task Priority
FreeRTOS
CDC Throughput
Exercise 2
Intermediate
Thread-Safe CDC with Race Condition Detection
Build a FreeRTOS project with 3 tasks all writing to CDC simultaneously: Task A writes "AAAA...A\r\n" (64 A's), Task B writes "BBBB...B\r\n" (64 B's), Task C writes "CCCC...C\r\n" (64 C's). First run WITHOUT a mutex and capture the output — observe interleaving and corruption. Then add the mutex-protected CDC write wrapper and run again — verify that lines are now complete and never interleaved. Use a Python script on the host to count complete lines vs broken lines and calculate the error rate for each scenario. Log your findings with uxTaskGetStackHighWaterMark() for each task to verify stack depth.
Thread Safety
Mutex
Race Condition
Exercise 3
Advanced
DMA Cache Coherency on STM32H7
This exercise requires an STM32H7 Nucleo or Discovery board with USB OTG_FS. Build a CDC loopback device that echoes every received byte. First, place the USB receive buffer in normal cached SRAM (e.g., SRAM1 at 0x24000000) and observe data corruption: send 1 KB of sequential byte values (0x00, 0x01, 0x02...) and compare received vs echoed — you should see corruption at 32-byte boundaries. Then move the buffer to SRAM4 (0x38000000) with the MPU configured as non-cacheable and repeat — corruption should disappear. Finally, move back to SRAM1 but add SCB_InvalidateDCache_by_Addr() before each DMA receive — verify this also eliminates corruption. Document the exact corruption pattern, the memory addresses involved, and which fix you would use in a production design and why.
STM32H7
DMA Cache
Cortex-M7
USB RTOS Design Generator
Use this tool to document your USB + RTOS architecture — project, MCU, RTOS version, USB task priority and stack size, queue configuration, and design notes. Download as Word, Excel, PDF, or PPTX for project documentation or design review.
Conclusion & Next Steps
Integrating TinyUSB with FreeRTOS is straightforward when you follow the design rules correctly. The key points from this part:
- Set
CFG_TUSB_OS CFG_TUSB_OS_FREERTOS: Without this, TinyUSB busy-waits and starves every other FreeRTOS task. This single configuration change transforms TinyUSB from CPU-wasting poller to efficient event-driven task.
- USB task at highest user priority: The USB task must be able to preempt all application tasks. A USB host that gets no response within its timeout window will reset the bus — causing the device to re-enumerate, which appears as random disconnects.
- Use xSemaphoreCreateMutex() for CDC writes: Binary semaphores do not have priority inheritance. Only mutex-type semaphores protect against priority inversion between the high-priority USB task and low-priority application tasks that hold the write lock.
- Producer/consumer queue for receives: Never block the USB task in a receive callback. Post to a FreeRTOS queue with timeout zero — if the queue is full, drop the packet gracefully. A separate consumer task processes received data at its own pace.
- DMA cache coherency on Cortex-M7: Place USB DMA buffers in non-cached SRAM4 on STM32H7 (0x38000000). If you must use cached SRAM, call
SCB_InvalidateDCache_by_Addr() before each DMA receive and SCB_CleanDCache_by_Addr() before each DMA transmit. Forgetting this causes silent data corruption that is extremely difficult to debug.
- Handle suspend with event groups: Use
xEventGroupSetBits() in tud_suspend_cb() to notify application tasks to pause. Use tud_resume_cb() to resume them. This enables proper power management and prevents application tasks from timing out while the host is asleep.
Next in the Series
In Part 12: Advanced USB Topics, we venture into USB host mode, OTG (On-The-Go) dual-role operation on STM32, isochronous transfers for USB audio, the UAC (USB Audio Class) descriptor hierarchy, and USB video class. These topics require solid understanding of everything covered in Parts 1–11 — the foundation you have built through this series.
Related Articles in This Series
Part 10: USB Debugging
Master the full USB debugging toolkit — Wireshark captures, hardware protocol analyzers, TinyUSB UART logs, enumeration decision trees, and the 10 most common USB problems with fixes.
Read Article
Part 12: Advanced USB Topics
USB host mode, OTG dual-role operation, isochronous transfers, USB Audio Class, and USB Video Class on STM32 with TinyUSB.
Read Article
Part 5: TinyUSB Deep Dive
TinyUSB architecture, execution model, STM32 integration steps, descriptor callback implementation, and your first working USB device — the foundation for RTOS integration.
Read Article