Back to Technology

USB Part 12: Advanced Topics

March 31, 2026 Wasil Zafar ~18 min read

Go beyond standard device classes — implement USB Audio Class 2.0, design a DFU bootloader, configure OTG host mode, manage hub topologies, master suspend/resume power management, and understand USB PD and SuperSpeed architectures.

Table of Contents

  1. USB Audio Class 2.0
  2. Isochronous Transfers Deep Dive
  3. DFU Bootloader Design
  4. STM32 Built-in DFU Bootloader
  5. USB OTG Host Mode
  6. USB Hub Support
  7. Suspend & Remote Wakeup
  8. USB Power Delivery
  9. USB 3.x SuperSpeed Overview
  10. Exercises
  11. Advanced Config Generator
  12. Conclusion & Next Steps
Series Context: This is Part 12 of 17 in the USB Development Mastery series. Parts 1–11 covered USB fundamentals through RTOS integration. Here we tackle advanced features — UAC2 audio, DFU bootloaders, OTG host mode, hub support, and USB PD — that appear in production embedded systems.

USB Development Mastery

Your 17-step learning path • Currently on Step 12
1
USB Fundamentals
USB system architecture, transfer types, host/device model, protocol stack
Completed
2
Electrical & Hardware Layer
D+/D- signalling, pull-ups, connectors, USB-C, STM32 USB peripherals
Completed
3
Protocol & Enumeration
Enumeration sequence, USB packets, descriptors, endpoint concepts
Completed
4
USB Device Classes
HID, CDC, MSC, MIDI, Audio, composite devices, vendor class
Completed
5
TinyUSB Deep Dive
Stack architecture, execution model, STM32 integration, descriptor callbacks
Completed
6
CDC Virtual COM Port
CDC class, bulk transfers, printf over USB, baud rate handling
Completed
7
HID Keyboard & Mouse
HID descriptors, report format, keyboard/mouse/gamepad implementation
Completed
8
USB Mass Storage
MSC class, SCSI commands, FATFS integration, RAM disk
Completed
9
Composite Devices
Multiple classes, IAD descriptor, CDC+HID, CDC+MSC
Completed
10
Debugging USB
Wireshark capture, protocol analyser, enumeration debugging, common failures
Completed
11
RTOS + USB Integration
FreeRTOS + TinyUSB, task priorities, thread-safe communication
Completed
12
Advanced USB Topics
UAC2 audio, DFU bootloader, OTG host mode, hubs, suspend, USB PD, SuperSpeed
You Are Here
13
Performance & Optimisation
DMA, zero-copy buffers, throughput maximisation, latency tuning
14
Custom USB Class Drivers
Vendor class, writing descriptors, OS driver interaction
15
Bare-Metal USB
Direct register programming, writing USB stack from scratch, PHY timing
16
Security in USB
BadUSB attacks, device authentication, secure firmware, USB firewall
17
USB Hardware Design
PCB layout, differential pairs, impedance matching, EMI, USB-C PD

USB Audio Class 2.0 (UAC2)

USB Audio Class 2.0 is the specification that enables professional-grade audio over USB — 24-bit/96 kHz stereo, low-latency monitoring, and multi-channel configurations that the original UAC1 specification simply cannot support. Understanding why UAC2 demands High Speed USB and how its audio function topology works is essential before attempting any implementation.

Why UAC2 Requires High Speed

USB Audio Class 1.0 was designed in the USB 1.1 era and is limited to Full Speed (12 Mbit/s). The practical audio ceiling for UAC1 is approximately 24-bit/48 kHz stereo — roughly 4.6 Mbit/s — which fits within FS isochronous bandwidth. UAC2 targets professional audio requirements:

Configuration Bit Depth Sample Rate Channels Bandwidth Required USB Speed Needed
CD Quality 16-bit 44.1 kHz 2 ~1.4 Mbit/s Full Speed (UAC1)
Studio Standard 24-bit 96 kHz 2 ~4.6 Mbit/s Full Speed (UAC1 limit)
High-Res Audio 24-bit 192 kHz 2 ~9.2 Mbit/s High Speed (UAC2)
Multi-Channel Studio 24-bit 96 kHz 8 ~18.4 Mbit/s High Speed (UAC2)
Pro Audio Interface 32-bit float 192 kHz 16 ~98 Mbit/s High Speed (UAC2)

Audio Function Topology

UAC2 models the audio signal path through a topology of Units and Terminals described in class-specific descriptors. The minimal playback topology is: Clock Source Entity → Input Terminal (USB streaming IN) → Feature Unit (volume/mute controls) → Output Terminal (speaker/line out). Each entity has a numeric ID referenced by downstream entities.

/* UAC2 Clock Source descriptor (ID = 1) */
/* bClockType: internal fixed — no feedback endpoint needed */
/* bClockFrequency: 48000 or 96000 Hz reported via CS_ENDPOINT */

/* UAC2 Input Terminal descriptor (ID = 2) — USB streaming */
/* wTerminalType: 0x0101 = USB streaming */
/* bCSourceID: 1 (references Clock Source above) */
/* bNrChannels: 2 (stereo) */
/* bmChannelConfig: left front + right front */

/* UAC2 Feature Unit descriptor (ID = 3) */
/* bSourceID: 2 (Input Terminal) */
/* Controls: bmaControls[0] = volume + mute on master channel */
/* Controls: bmaControls[1] = volume on left, bmaControls[2] = volume on right */

/* UAC2 Output Terminal descriptor (ID = 4) — analog speaker out */
/* wTerminalType: 0x0302 = headphones */
/* bSourceID: 3 (Feature Unit) */
/* bCSourceID: 1 (Clock Source) */

/* TinyUSB UAC2 descriptor macro */
#define TUD_AUDIO_HEADSET_STEREO_DESCRIPTOR(itf, ...)         \
  /* Standard Interface Association Descriptor */             \
  TUD_AUDIO_DESC_IAD(itf, 3, AUDIO_FUNC_PROTOCOL_CODE_V2),   \
  /* AudioControl Interface */                                 \
  TUD_AUDIO_DESC_STD_AC(itf, 0, 0),                          \
  /* Class-Specific AudioControl Interface */                  \
  TUD_AUDIO_DESC_CS_AC(9+8+17+6+12+9+9+9+12,                 \
      AUDIO_FUNC_PROTOCOL_CODE_V2, 1, 1),                     \
  /* Clock Source: ID=1, internal fixed */                    \
  TUD_AUDIO_DESC_CLK_SRC(1, 1, 0, 0, 0),                     \
  /* Input Terminal: ID=2, USB streaming */                   \
  TUD_AUDIO_DESC_INPUT_TERM(2, AUDIO_TERM_TYPE_USB_STREAMING, \
      0, 1, 2, AUDIO_CHANNEL_CONFIG_NON_PREDEFINED, 0, 0, 0, 0), \
  /* Feature Unit: ID=3 */                                    \
  TUD_AUDIO_DESC_FEATURE_UNIT_TWO_CHANNEL(3, 2, 3, 3, 3, 0), \
  /* Output Terminal: ID=4, headphones */                     \
  TUD_AUDIO_DESC_OUTPUT_TERM(4, AUDIO_TERM_TYPE_OUT_HEADPHONES, 0, 3, 1, 0, 0)

Interface Alternate Settings

UAC2 audio streaming uses alternate settings on the streaming interface. Alternate setting 0 carries no data and reserves zero isochronous bandwidth — the host switches to alt setting 0 when no audio application is active. Alternate setting 1 activates the isochronous endpoint with the full bandwidth reservation. This design prevents USB from reserving isochronous bandwidth on every USB frame when audio playback is idle.

/* Alternate Setting 0: zero-bandwidth (no isochronous endpoint) */
/* The host selects this when no audio app is running.           */
/* No endpoint descriptor is present — bandwidth = 0.           */

/* Alternate Setting 1: active streaming */
/* wMaxPacketSize = (sample_rate / 1000 + 1) * bytes_per_sample * channels */
/* For 48 kHz / 24-bit stereo: (48 + 1) * 3 * 2 = 294 bytes max */
/* Round up to 588 bytes to handle 96 kHz                       */

tusb_desc_endpoint_t iso_ep = {
    .bLength          = sizeof(tusb_desc_endpoint_t),
    .bDescriptorType  = TUSB_DESC_ENDPOINT,
    .bEndpointAddress = 0x01,          /* EP1 OUT (playback) */
    .bmAttributes = {
        .xfer          = TUSB_XFER_ISOCHRONOUS,
        .sync          = 0x01,         /* Asynchronous */
        .usage         = 0x00,         /* Data endpoint */
    },
    .wMaxPacketSize   = 588,           /* HS: up to 1024 per microframe */
    .bInterval        = 1,            /* Every microframe (125 µs) for HS */
};

Feedback Endpoint for Clock Synchronisation

Asynchronous UAC2 devices maintain their own audio clock (typically a crystal-based PLL) independent of the USB SOF clock. Since the USB host pushes samples at a rate derived from its own clock, the device must inform the host of its actual consumption rate via a feedback endpoint. The feedback endpoint is an isochronous IN endpoint (direction: device to host) that reports the current sample rate as a 10.14 or 16.16 fixed-point fraction of samples per microframe.

If the device buffer is filling up (host sending too fast), the feedback value decreases slightly, causing the host to send fewer samples per microframe. If the buffer is draining too quickly, the feedback increases. This closed-loop system prevents buffer overflow and underflow without requiring the device to synthesise audio at precisely the host's rate.

TinyUSB UAC2 Status: As of TinyUSB 0.16, UAC2 support is present but marked experimental. Full asynchronous feedback endpoint support requires careful integration. Synchronous mode (device slaves its clock to USB SOF) is simpler to implement but introduces jitter. For production USB audio products, evaluate the AudioClass library for STM32 or consider a dedicated USB audio MCU (e.g., Cypress CX3 or Microchip SAM E70 with I2S).

Isochronous Transfers Deep Dive

Isochronous transfers are the most misunderstood USB transfer type. They provide guaranteed bandwidth reservation — the USB host commits to allocating a fixed slice of each frame/microframe to the endpoint, regardless of whether the device has data. The trade-off is that there is no retry on error: if a packet is lost due to line noise or a CRC failure, it is simply gone. For audio and video, a dropped sample is far preferable to the variable latency that retransmission would introduce.

Bandwidth Reservation Model

Each USB frame (FS: 1 ms) or microframe (HS: 125 µs) has a finite bandwidth budget. Isochronous endpoints reserve their share at SET_INTERFACE time — when the host switches to alternate setting 1, the bandwidth is committed and unavailable to other devices. This is why a USB hub with a high-bandwidth isochronous device (like a 4K webcam) may show degraded throughput for bulk transfers on other ports.

USB Speed Frame/Microframe Max Iso Packet Size Max Transactions/Frame Max Bandwidth
Full Speed 1 ms frame 1023 bytes 1 ~1 MB/s
High Speed 125 µs microframe 1024 bytes 3 (high-bandwidth) ~3 × 1024 × 8000 = 24.6 MB/s
HS High-BW Iso 125 µs microframe 3072 bytes (3×1024) 3 ~24.6 MB/s per endpoint

High-bandwidth isochronous endpoints advertise their multi-transaction capability in the upper bits of wMaxPacketSize: bits 12:11 encode (transactions per microframe − 1), allowing 1, 2, or 3 transactions per 125 µs microframe.

Synchronisation Types

The bmAttributes sync field of an isochronous endpoint descriptor specifies the synchronisation model:

Sync Type Value Description Feedback Endpoint Use Case
No Synchronisation 0b00 Free-running, no sync No Non-audio isochronous data
Asynchronous 0b01 Device has independent clock, reports rate via feedback Yes (explicit feedback) High-quality USB audio DAC
Adaptive 0b10 Device adapts its clock to USB SOF No (implicit via SRC) USB audio ADC (capture)
Synchronous 0b11 Device slaves clock directly to USB SOF No Low-cost audio, introduces jitter

Isochronous Descriptor Configuration in C

/* Standard Isochronous Endpoint Descriptor for UAC2 playback */
/* This is the AS endpoint on alternate setting 1             */

static const uint8_t as_iso_ep_desc[] = {
    /* Standard AS Isochronous Audio Data Endpoint Descriptor */
    7,                              /* bLength */
    TUSB_DESC_ENDPOINT,             /* bDescriptorType */
    0x01,                           /* bEndpointAddress: EP1 OUT */
    (0x01 << 2) | 0x01,            /* bmAttributes: iso + async */
    U16_TO_U8S_LE(294),            /* wMaxPacketSize: 48 kHz 24-bit stereo */
    0x01,                           /* bInterval: 1 microframe (HS) */

    /* Class-Specific AS Isochronous Audio Data Endpoint Descriptor */
    8,                              /* bLength */
    TUSB_DESC_CS_ENDPOINT,          /* bDescriptorType */
    0x01,                           /* bDescriptorSubtype: EP_GENERAL */
    0x00,                           /* bmAttributes: no packet-only pitch ctrl */
    0x00,                           /* bmControls: none */
    0x00,                           /* bLockDelayUnits: undefined */
    U16_TO_U8S_LE(0x0000),         /* wLockDelay */
};

/* Feedback Endpoint Descriptor (for async sync type) */
static const uint8_t feedback_ep_desc[] = {
    7,                              /* bLength */
    TUSB_DESC_ENDPOINT,
    0x81,                           /* bEndpointAddress: EP1 IN (feedback) */
    (0x01 << 2) | 0x01,            /* bmAttributes: iso + feedback usage */
    U16_TO_U8S_LE(4),              /* wMaxPacketSize: 4 bytes for feedback value */
    0x01,                           /* bInterval */
};

/* Feedback value format: 16.16 fixed point (samples per microframe) */
/* For 48 kHz: 48000 / 8000 = 6.0 samples/microframe = 0x00060000 */
/* For 96 kHz: 96000 / 8000 = 12.0 = 0x000C0000                    */
void send_feedback_value(uint32_t samples_per_uframe_16_16) {
    uint8_t fb[4];
    fb[0] = (samples_per_uframe_16_16 >>  0) & 0xFF;
    fb[1] = (samples_per_uframe_16_16 >>  8) & 0xFF;
    fb[2] = (samples_per_uframe_16_16 >> 16) & 0xFF;
    fb[3] = (samples_per_uframe_16_16 >> 24) & 0xFF;
    usbd_edpt_xfer(0, 0x81, fb, 4);
}

DFU Bootloader Design

Device Firmware Upgrade (DFU) is a standardised USB class protocol defined in the USB DFU Specification 1.1 that allows host-side tools to download new firmware into a device over USB without any proprietary protocol or out-of-band communication channel. A well-designed DFU bootloader enables field firmware updates without special hardware beyond the USB cable already present.

DFU Runtime Class vs DFU Mode

DFU operates in two distinct modes. In DFU Runtime mode, the application firmware exposes a DFU interface alongside its normal application interface (CDC, HID, etc.). The runtime interface tells the host "I support DFU, please send a DFU_DETACH request when you want to update me." When the host sends DFU_DETACH, the device resets and boots into DFU mode — a separate firmware image (the bootloader) that implements the full DFU download/upload protocol.

/* DFU Runtime Interface bmAttributes */
/* Bit 0: bitWillDetach — device resets itself on DFU_DETACH (1=yes)   */
/* Bit 1: bitManifestationTolerant — device can tolerate manifestation  */
/* Bit 2: bitCanUpload — supports reading firmware back from device     */
/* Bit 3: bitCanDnload — supports downloading firmware to device        */

#define DFU_ATTR_CAN_DOWNLOAD            (1u << 0)
#define DFU_ATTR_CAN_UPLOAD              (1u << 1)
#define DFU_ATTR_MANIFESTATION_TOLERANT  (1u << 2)
#define DFU_ATTR_WILL_DETACH             (1u << 3)

/* Typical runtime descriptor */
static const uint8_t dfu_runtime_func_desc[] = {
    9,                              /* bLength */
    0x21,                           /* bDescriptorType: DFU FUNCTIONAL */
    DFU_ATTR_CAN_DOWNLOAD |
    DFU_ATTR_CAN_UPLOAD   |
    DFU_ATTR_WILL_DETACH  |
    DFU_ATTR_MANIFESTATION_TOLERANT, /* bmAttributes */
    U16_TO_U8S_LE(1000),           /* wDetachTimeout: 1000 ms */
    U16_TO_U8S_LE(2048),           /* wTransferSize: must match page size */
    U16_TO_U8S_LE(0x011A),         /* bcdDFUVersion: 1.1 */
};

DFU State Machine

The DFU specification defines a precise state machine that both device firmware and host software must follow. Understanding the states is critical for debugging DFU failures:

State Value Description Next States
appIDLE 0 Application running, no DFU activity appDETACH (on DFU_DETACH)
appDETACH 1 Waiting for USB reset to enter DFU mode dfuIDLE (after reset)
dfuIDLE 2 DFU mode active, ready to receive download dfuDNLOAD-SYNC (on DFU_DNLOAD)
dfuDNLOAD-SYNC 3 Block received, waiting for DFU_GETSTATUS dfuDNBUSY or dfuDNLOAD-IDLE
dfuDNBUSY 4 Device programming flash (bwPollTimeout in effect) dfuDNLOAD-SYNC
dfuDNLOAD-IDLE 5 Block programmed, ready for next block dfuDNLOAD-SYNC or dfuMANIFEST-SYNC
dfuMANIFEST-SYNC 6 Zero-length DFU_DNLOAD received (end of image) dfuMANIFEST
dfuMANIFEST 7 Device applying/verifying complete firmware image dfuIDLE or dfuMANIFEST-WAIT-RST
dfuMANIFEST-WAIT-RST 8 Manifestation done, waiting for USB reset to boot new FW appIDLE (after reset)
dfuERROR 10 Error occurred — requires DFU_CLRSTATUS to recover dfuIDLE (after DFU_CLRSTATUS)

wTransferSize and Flash Page Alignment

The wTransferSize field in the DFU functional descriptor specifies the maximum size of a single DFU_DNLOAD block. This should be set to the flash page (or minimum erase unit) size of the target MCU. For STM32F4 with 16 KB sector 0, a common choice is 2048 bytes — large enough to transfer efficiently but small enough to fit within a control transfer or a small USB buffer.

/* host-side dfu-util commands */
/*
 * List DFU devices:
 *   dfu-util -l
 *
 * Download firmware (force DFU mode if in runtime):
 *   dfu-util -D firmware.bin --detach
 *
 * Download to a specific memory region (alt interface):
 *   dfu-util -a 0 -D firmware.bin -s 0x08000000:leave
 *
 * Upload firmware from device:
 *   dfu-util -a 0 -U backup.bin -s 0x08000000:0x80000
 *
 * Reset to application after download:
 *   dfu-util -D firmware.bin -s 0x08000000:leave
 *
 * Flags:
 *   -s addr:size   — start address and size
 *   :leave         — jump to application after manifestation
 *   :force         — force operation even if DFU descriptor mismatch
 */

STM32 Built-in DFU Bootloader

ST Microelectronics has programmed a USB DFU bootloader into the system memory of virtually every STM32 microcontroller. This bootloader is stored in a protected ROM region (typically starting at 0x1FFF0000 on STM32F4) and is activated by holding BOOT0 high at reset. It requires no custom firmware — it is ready to use on any STM32 straight from the factory.

Activating the System Bootloader

/*
 * STM32 System Memory bootloader activation:
 *
 * Method 1: Hardware BOOT0 pin
 *   - Hold BOOT0 HIGH (to VDD) before power-on or reset
 *   - BOOT1 (PB2) must be LOW (selects system memory vs SRAM)
 *   - Device appears as "STM Device in DFU Mode" (VID=0483, PID=DF11)
 *
 * Method 2: Software jump from application
 *   - Reliable on all STM32 families
 *   - Use STM32's HAL_RCC_DeInit() + remapping before jump
 */
void jump_to_system_bootloader(void) {
    /* Disable all interrupts and peripherals */
    __disable_irq();
    HAL_RCC_DeInit();

    /* De-init SysTick */
    SysTick->CTRL = 0;
    SysTick->LOAD = 0;
    SysTick->VAL  = 0;

    /* Clear all pending NVIC IRQs */
    for (int i = 0; i < 8; i++) {
        NVIC->ICER[i] = 0xFFFFFFFF;
        NVIC->ICPR[i] = 0xFFFFFFFF;
    }

    /* STM32F4 system memory base address */
    #define SYSTEM_MEMORY_BASE  0x1FFF0000UL

    /* Read stack pointer and reset handler from system memory */
    uint32_t sp  = *((volatile uint32_t *)(SYSTEM_MEMORY_BASE));
    uint32_t pc  = *((volatile uint32_t *)(SYSTEM_MEMORY_BASE + 4));

    /* Remap system memory to 0x00000000 */
    __HAL_RCC_SYSCFG_CLK_ENABLE();
    SYSCFG->MEMRMP = 0x01;  /* Map system flash to 0x00000000 */

    /* Set MSP and jump */
    __set_MSP(sp);
    ((void (*)(void))pc)();
    /* Never returns */
}

Memory Layout and Alt Interfaces

The STM32 DFU bootloader exposes multiple DFU alternate interface settings, each mapping to a different memory region. Alternate 0 is typically user flash (0x08000000), alternate 1 is option bytes, and alternate 2 is OTP area (on some devices). The iInterface string for each alt setting describes the memory map in a specific format that dfu-util parses:

/* STM32F407 DFU interface strings (from built-in bootloader) */
/* @Internal Flash  /0x08000000/04*016Kg,01*064Kg,07*128Kg */
/*                                                          */
/* Format: @Name /start_addr/sectors                        */
/* 04*016Kg = 4 sectors of 16 KB (erasable)                 */
/* 01*064Kg = 1 sector of 64 KB                             */
/* 07*128Kg = 7 sectors of 128 KB                           */

/* Using STM32CubeProgrammer (GUI/CLI) */
/*   STM32_Programmer_CLI -c port=USB1              */
/*   STM32_Programmer_CLI -c port=USB1 -d fw.bin 0x08000000 -v -rst */
/*   STM32_Programmer_CLI -c port=USB1 -r 0x08000000 0x100000 dump.bin */

/* Combining TinyUSB DFU with application code:              */
/* Place bootloader at 0x08000000 (128 KB)                   */
/* Place application at 0x08020000 (FLASH_BASE + 128 KB)     */
/* Bootloader checks magic value in RAM or BOOT0 pin,        */
/* then either jumps to application or waits for DFU         */

#define APP_START_ADDR   0x08020000UL
#define BOOT_MAGIC       0xDEADBEEFUL
#define BOOT_MAGIC_ADDR  (SRAM1_BASE + SRAM1_SIZE - 4)

void bootloader_main(void) {
    if (*((volatile uint32_t *)BOOT_MAGIC_ADDR) != BOOT_MAGIC) {
        /* Check if valid application exists at APP_START_ADDR */
        uint32_t app_sp = *((volatile uint32_t *)APP_START_ADDR);
        if ((app_sp & 0xFF000000) == 0x20000000) {
            /* Valid stack pointer — jump to application */
            jump_to_address(APP_START_ADDR);
        }
    }
    /* Clear magic and run DFU */
    *((volatile uint32_t *)BOOT_MAGIC_ADDR) = 0;
    run_dfu_mode();
}
Dual-Boot Consideration: When using a custom DFU bootloader alongside application code, ensure the bootloader's interrupt vector table at 0x08000000 is correct and the application's vector table offset register (SCB->VTOR) is updated to APP_START_ADDR before the application initialises. Missing this step causes the application to handle interrupts with the bootloader's vector table, producing unpredictable behaviour.

USB OTG Host Mode

USB On-The-Go (OTG) allows a device to dynamically assume either the host or device role, eliminating the requirement for a dedicated PC host in embedded systems. An STM32 or RP2040 with OTG support can act as a USB host — enumerating and driving a USB keyboard, USB flash drive, or other USB peripheral without any PC in the loop.

A-Device vs B-Device (ID Pin Detection)

OTG determines the initial role from the ID pin of the USB Mini/Micro-B or USB-C connector. A cable with the ID pin grounded (Mini/Micro-A plug) designates the A-device, which assumes the host role and drives VBUS. A cable with the ID pin floating (Mini/Micro-B plug) designates the B-device, which assumes the peripheral role. On USB-C, the host/device role is negotiated through the CC pins and the UFP/DFP configuration.

/* TinyUSB host mode configuration (tusb_config.h) */
#define CFG_TUH_ENABLED         1
#define CFG_TUH_MAX_SPEED       OPT_MODE_HIGH_SPEED  /* or FULL_SPEED */
#define CFG_TUH_HUB             1   /* support hub */
#define CFG_TUH_CDC             1   /* host CDC driver */
#define CFG_TUH_HID             4   /* up to 4 HID devices */
#define CFG_TUH_MSC             1   /* mass storage host */
#define CFG_TUH_VENDOR          1   /* vendor class host */

/* Host task — must be called from main loop or RTOS task */
void usb_host_task(void *param) {
    (void)param;
    while (1) {
        tuh_task();  /* Process all pending USB host events */
        /* Add application logic here */
    }
}

/* Callback: device mounted (enumerated successfully) */
void tuh_mount_cb(uint8_t dev_addr) {
    printf("USB device mounted at address %u\n", dev_addr);

    /* Query device descriptor to identify device */
    tusb_desc_device_t dev_desc;
    tuh_descriptor_get_device_sync(dev_addr, &dev_desc, sizeof(dev_desc));
    printf("  VID: %04X  PID: %04X\n", dev_desc.idVendor, dev_desc.idProduct);
}

/* Callback: device unmounted */
void tuh_umount_cb(uint8_t dev_addr) {
    printf("USB device at address %u disconnected\n", dev_addr);
}

/* Host CDC callbacks */
void tuh_cdc_mount_cb(uint8_t idx) {
    printf("CDC device mounted at index %u\n", idx);
    /* Open CDC interface and start receiving */
    tuh_cdc_receive(idx, cdc_rx_buf, sizeof(cdc_rx_buf), true);
}

void tuh_cdc_rx_cb(uint8_t idx) {
    uint32_t count = tuh_cdc_read(idx, cdc_rx_buf, sizeof(cdc_rx_buf));
    printf("CDC RX %lu bytes: %.*s\n", count, (int)count, cdc_rx_buf);
    /* Re-arm receive */
    tuh_cdc_receive(idx, cdc_rx_buf, sizeof(cdc_rx_buf), true);
}

/* Host HID callbacks */
void tuh_hid_mount_cb(uint8_t dev_addr, uint8_t instance,
                      uint8_t const *desc_report, uint16_t desc_len) {
    printf("HID mounted: dev=%u inst=%u protocol=%u\n",
           dev_addr, instance, tuh_hid_get_protocol(dev_addr, instance));
    tuh_hid_receive_report(dev_addr, instance);
}

void tuh_hid_report_received_cb(uint8_t dev_addr, uint8_t instance,
                                 uint8_t const *report, uint16_t len) {
    /* Process incoming HID report (keyboard, mouse, gamepad) */
    process_hid_report(tuh_hid_get_interface_protocol(dev_addr, instance),
                       report, len);
    tuh_hid_receive_report(dev_addr, instance);
}

SRP and HNP

Session Request Protocol (SRP) allows the B-device to request a USB session from the A-device without the A-device continuously powering VBUS. The B-device pulses D+ briefly to signal it wants to connect. Host Negotiation Protocol (HNP) allows the A-device to suspend the bus and grant the B-device a chance to become the host — enabling bidirectional host role swapping. HNP is rarely used in practice and not required for most embedded OTG host applications.

Embedded Host Use Cases: OTG host mode enables compelling embedded applications: a handheld instrument that reads data from a USB flash drive; a machine controller that accepts input from a USB keyboard or barcode scanner; an embedded display system that drives a USB HID touchscreen without needing a full Linux system. The STM32H7 and STM32F4 series with OTG_HS are the most capable options for embedded USB host applications.

USB Hub Support

A USB hub is a class 9 device that extends the bus by providing multiple downstream ports from a single upstream connection. In embedded host mode, supporting hubs dramatically increases the flexibility of the system — allowing a single OTG port to drive a keyboard, mouse, and flash drive simultaneously through an external hub.

Transaction Translator (TT)

A high-speed hub that has Full Speed or Low Speed devices on its downstream ports must include a Transaction Translator (TT). The TT bridges the timing gap: it receives HS transactions from the host and converts them to FS/LS transactions for the downstream device, buffering the protocol translation. Without a TT, FS devices cannot coexist on a HS hub.

/* TinyUSB hub support configuration */
/* In tusb_config.h: */
#define CFG_TUH_HUB     1    /* Enable hub class driver */

/* Hub class codes */
#define HUB_CLASS_CODE          0x09
#define HUB_SUBCLASS_CODE       0x00
#define HUB_PROTOCOL_FS         0x00   /* Full/Low-speed hub */
#define HUB_PROTOCOL_HS_SINGLE  0x01   /* HS hub, single TT */
#define HUB_PROTOCOL_HS_MULTI   0x02   /* HS hub, multiple TT */

/* Port Status Change Endpoint */
/* Hub exposes a single Interrupt IN endpoint (EP1 IN)   */
/* Reports bitmap: bit N set = port N has status change  */
/* Hub has up to 7 downstream ports in USB 2.0           */

/* Device address assignment through hub:                */
/* Each device behind a hub gets its own USB address     */
/* TinyUSB tracks hub address + port number for routing  */

/* Hub descriptor key fields */
typedef struct {
    uint8_t  bDescLength;
    uint8_t  bDescriptorType;   /* 0x29 for hub */
    uint8_t  bNbrPorts;         /* Number of downstream ports */
    uint16_t wHubCharacteristics;
    uint8_t  bPwrOn2PwrGood;    /* Power-on delay in 2ms units */
    uint8_t  bHubContrCurrent;  /* Controller current in mA */
    /* Variable length: DeviceRemovable and PortPwrCtrlMask bitmaps */
} hub_desc_t;

Embedded Hub Limitations

Hub support in embedded USB hosts has practical limitations that firmware developers must understand:

  • Address table size: CFG_TUH_DEVICE_MAX limits total attached devices (hub ports count against this). Increase it if supporting multi-port hubs.
  • Enumeration serialisation: TinyUSB enumerates one device at a time. A hub with 4 devices takes 4× the enumeration time of a single device.
  • Power management: The embedded host must supply at least 100 mA per port to powered (bus-powered) downstream devices. Total hub power budget must be calculated.
  • Compound devices: USB devices with built-in hubs (e.g., keyboard with USB passthrough) work as regular hubs but must be detected and enumerated as compound hub + function.
  • TT interaction: When a HS hub has FS devices attached, the host must use split transactions (SSPLIT/CSPLIT tokens) for FS device communication. TinyUSB handles this automatically when hub support is enabled.

USB Suspend & Remote Wakeup

USB suspend is the mechanism by which the host conserves power by halting bus activity when a device is idle. For battery-powered embedded devices, correct suspend implementation is mandatory — both to respect the USB specification and to achieve the ultra-low power consumption that modern embedded products require.

Suspend Condition

The USB bus enters suspend when there are no Start-of-Frame (SOF) packets for 3 consecutive milliseconds. The device must detect this condition and enter its own suspended state within 7 ms of the last bus activity. While suspended, the device must not draw more than 2.5 mA from VBUS (for a configured device) or 500 µA (for an unconfigured device).

/* TinyUSB suspend/resume callbacks */

/* Called when USB bus activity ceases for 3 ms (suspend detected) */
void tud_suspend_cb(bool remote_wakeup_en) {
    /* remote_wakeup_en: true if host granted remote wakeup permission */
    g_remote_wakeup_enabled = remote_wakeup_en;

    /* Enter low-power mode: stop non-essential clocks              */
    /* STM32: enter Stop mode — USB clock kept alive at low current */
    HAL_SuspendTick();

    /* Configure wakeup source: USB WKUP EXTI line (EXTI18 on STM32F4) */
    __HAL_PWR_CLEAR_FLAG(PWR_FLAG_WU);
    HAL_PWR_EnableWakeUpPin(PWR_WAKEUP_PIN1);

    /* Enter Stop mode (reduces current from ~50 mA to ~100 µA) */
    HAL_PWR_EnterSTOPMode(PWR_LOWPOWERREGULATOR_ON, PWR_STOPENTRY_WFI);

    /* Execution resumes here after wakeup (USB activity or remote wakeup) */
    /* Re-initialise system clocks after Stop mode */
    SystemClock_Config();
    HAL_ResumeTick();
}

/* Called when USB bus activity resumes (host wakes device) */
void tud_resume_cb(void) {
    /* Normal USB operation resumed */
    /* Re-enable peripherals that were stopped during suspend */
    HAL_ResumeTick();
}

/* Remote Wakeup: device wakes the host by asserting K state */
/* Host must have granted permission (bmAttributes bit 5 set) */
/* and device must wait for suspend before asserting wakeup   */
void device_request_remote_wakeup(void) {
    if (!g_remote_wakeup_enabled) {
        printf("Remote wakeup not permitted by host\n");
        return;
    }
    if (!tud_suspended()) {
        printf("Device not suspended — cannot remote wakeup\n");
        return;
    }
    /* Assert K state on D+/D- for 1–15 ms */
    tud_remote_wakeup();
}

/* In Configuration Descriptor — enable remote wakeup capability */
/* bmAttributes: bit 6 = self-powered, bit 5 = remote wakeup    */
#define CONFIG_ATTRIBUTES   (0x80 | 0x20)  /* bus-powered + remote wakeup */

Suspend Current Budget

Meeting the 2.5 mA suspended current limit requires careful hardware design. The STM32 USB OTG peripheral itself consumes approximately 1.5–3 mA in normal operation. In Stop mode with the USB clock reduced to the minimum for VBUS detection, the USB-related current drops to approximately 200–400 µA, leaving margin for a low-power MCU core state. Ensure all external pull-up resistors, LEDs, and oscillators are disabled during suspend.

USB Power Delivery (USB PD)

USB Power Delivery is a specification that dramatically extends the power capability of USB-C connections — from the default 5V/0.9A (4.5W) to up to 20V/5A (100W), with newer Extended Power Range (EPR) reaching 28V/5A (140W). USB PD negotiation is conducted over the CC1/CC2 pins using a separate protocol that is entirely independent of the USB 2.0 data lines — a device can negotiate 20V/3A (60W) over USB PD while simultaneously transferring data at USB 2.0 High Speed.

PD Negotiation Flow

When a USB-C cable is connected, the Source (charger/port) and Sink (device) detect each other through pull-up/pull-down resistors on the CC pins. Once connection is confirmed, the Source advertises its Power Data Objects (PDOs) — a list of voltage/current combinations it can supply. The Sink selects the most suitable PDO and sends a Request Data Object (RDO). After the Source accepts, it sends a PS_RDY message and the negotiated voltage is applied to VBUS.

/* USB PD implementation options for embedded systems */

/* Option 1: FUSB302 — dedicated USB PD PHY (I2C interface) */
/* FUSB302 handles CC pin monitoring, message framing, CRC  */
/* MCU sends/receives USB PD messages over I2C             */

/* FUSB302 basic initialisation */
void fusb302_init(void) {
    /* Software reset */
    fusb302_write(FUSB302_REG_RESET, FUSB302_RESET_SW_RST);
    HAL_Delay(10);

    /* Configure CC pins — Sink (DFP) or Source (UFP) */
    fusb302_write(FUSB302_REG_SWITCHES0,
        FUSB302_SW0_CC1_PD_EN |    /* Enable pull-down on CC1 */
        FUSB302_SW0_CC2_PD_EN);    /* Enable pull-down on CC2 */

    /* Enable auto CRC and goodCRC transmission */
    fusb302_write(FUSB302_REG_CONTROL3,
        FUSB302_CTRL3_AUTO_RETRY | FUSB302_CTRL3_N_RETRIES(3));

    /* Enable VBUS measurement and interrupt */
    fusb302_write(FUSB302_REG_MASK1, ~FUSB302_M1_VBUSOK);
    fusb302_write(FUSB302_REG_MASKA, 0xFF);
    fusb302_write(FUSB302_REG_MASKB, 0xFF);
}

/* Option 2: STM32G0 — built-in UCPD peripheral             */
/* No external chip required — CC pins connected directly   */
/* to UCPD1/UCPD2 peripheral pins on STM32G0B1/G0C1        */

/* STM32G0 UCPD initialisation */
void ucpd_init(void) {
    /* Enable UCPD1 clock */
    __HAL_RCC_UCPD1_CLK_ENABLE();

    /* Configure CC pins as analog (no GPIO alternate function) */
    /* PA8 = CC1, PB15 = CC2 on STM32G0B1 */

    UCPD_HandleTypeDef ucpd_handle = {0};
    ucpd_handle.Instance = UCPD1;
    ucpd_handle.Init.PSCDivider = 0x09;   /* Prescaler for CC timing */
    ucpd_handle.Init.HbitClockPrescaler = 0x0D;
    ucpd_handle.Init.SWInterClockPrescaler = 0x03;
    ucpd_handle.Init.IfEventEnable = 0x01;
    HAL_UCPD_Init(&ucpd_handle);
}

/* Achievable power levels via USB PD negotiation */
/* PDO types: Fixed, Variable, Battery, Augmented (PPS)    */
/* Fixed PDO: 5V/3A, 9V/3A, 12V/3A, 15V/3A, 20V/3A = 60W */
/* PPS (Programmable Power Supply): 3.3–21V adjustable     */
USB PD in Embedded Systems: USB PD is transformative for power-hungry embedded systems. A STM32H7 board driving a large display, multiple sensors, and a servo motor can negotiate 20V/3A (60W) from any modern USB-C charger, eliminating the need for a dedicated power brick. The FUSB302 + STM32 combination is the most popular approach; the STM32G0's built-in UCPD peripheral is the most integrated. The open-source PD Buddy firmware and the USB Power Delivery library from Chromium EC provide excellent reference implementations.

USB 3.x SuperSpeed Overview

USB 3.x adds a SuperSpeed (SS) channel alongside the USB 2.0 differential pair, using additional SS+ lanes with 8b/10b (USB 3.2 Gen 1) or 128b/132b (USB 3.2 Gen 2) encoding. The USB 2.0 and USB 3.x channels operate independently — a device plugged into a USB 3.x port negotiates both a USB 2.0 connection (for compatibility) and a SuperSpeed connection (for throughput), using whichever the host prefers.

Generation Marketing Name Raw Bit Rate Encoding Effective Throughput Common Use Case
USB 3.2 Gen 1 USB 3.0 / SuperSpeed 5 Gbps 8b/10b ~400 MB/s Flash drives, external SSD
USB 3.2 Gen 2 USB 3.1 / SuperSpeed+ 10 Gbps 128b/132b ~900 MB/s NVMe enclosures, cameras
USB 3.2 Gen 2×2 SuperSpeed+ 20G 20 Gbps (2 lanes) 128b/132b × 2 ~1800 MB/s High-speed storage arrays
USB4 Gen 3×2 USB4 40Gbps 40 Gbps 128b/132b × 2 ~3500 MB/s + Thunderbolt eGPU, 8K display, docking

LTSSM State Machine

USB 3.x link management is handled by the Link Training and Status State Machine (LTSSM). Unlike USB 2.0 where enumeration is entirely software-driven, the SS link goes through hardware-level training before software enumeration begins:

  • Rx.Detect: Host and device detect each other's receiver terminations (1.2 kΩ to ground on SS pins)
  • Polling: Devices exchange TS1 training sequences, then TS2 sequences to agree on link parameters
  • U0 (Active): Link is operational — packets flow normally
  • U1/U2 (Selective Suspend): Partial power savings — link recovers in microseconds
  • U3 (Suspend): Deep power savings — equivalent to USB 2.0 suspend, takes milliseconds to recover

Why Most Embedded MCUs Still Use USB 2.0

USB 3.x SuperSpeed is conspicuously absent from the majority of embedded microcontrollers. This is not an oversight — it reflects genuine engineering constraints:

  • Analog complexity: 5 Gbps serial links require precision analog front ends (CDR circuits, equalisation) that consume significant die area and power, making them impractical for general-purpose MCUs.
  • PCB requirements: SuperSpeed pairs require controlled differential impedance (90Ω), via stubs removed, with strict length matching — feasible on a 4-layer PCB but challenging at MCU prototype scale.
  • Throughput ceiling rarely hit: The vast majority of embedded USB use cases (serial, HID, MSC with SD card) never saturate USB 2.0 High Speed's 40 MB/s practical limit.
  • Exceptions: USB 3.x is necessary for 4K UVC (USB video) streaming, high-speed data acquisition at >200 MB/s, and NVMe storage. MCUs with USB 3.x include i.MX8, Rockchip RK3568, and application processors (not microcontrollers) that pair with a dedicated SS PHY.

Exercises

Exercise 1 Beginner

Activate STM32 System Bootloader via BOOT0

Using an STM32F4 or STM32G0 development board: (a) short the BOOT0 pin to VDD (3.3V) and reset the board; (b) run dfu-util -l on your host machine and confirm the "STM Device in DFU Mode" appears with VID=0483, PID=DF11; (c) note the alternate interface strings and decode the memory map format; (d) use dfu-util -a 0 -D blink.bin -s 0x08000000:leave to flash a known-good blink firmware; (e) remove the BOOT0 short, reset the board, and confirm the blink firmware runs. Document the exact command sequence and timing.

DFU Bootloader STM32 dfu-util
Exercise 2 Intermediate

Implement USB Suspend with Stop Mode Entry

Starting from a working TinyUSB CDC device on an STM32 board: (a) implement tud_suspend_cb() that enters STM32 Stop mode; (b) implement tud_resume_cb() that re-initialises the system clock after Stop mode; (c) measure the VBUS current draw before suspend, during Stop mode suspend, and after resume using a USB power meter or bench supply with current measurement; (d) verify the device re-enumerates correctly after the PC wakes from sleep (which generates resume signalling); (e) implement remote wakeup — pressing a button while the PC is asleep should wake the host if permission was granted during enumeration.

USB Suspend Low Power Remote Wakeup Stop Mode
Exercise 3 Advanced

Build a Dual-Mode DFU Bootloader + Application

Design and implement a complete dual-boot system on an STM32F4: (a) write a minimal DFU bootloader that occupies the first 32 KB of flash (0x08000000–0x08007FFF), implementing the TinyUSB DFU class with wTransferSize = 2048; (b) place the application starting at 0x08008000, updating SCB->VTOR in the application startup; (c) implement the boot decision logic — if a GPIO button is held at reset, stay in DFU mode; otherwise validate the application CRC32 and jump; (d) add a magic-value mechanism so the application can trigger DFU update programmatically (write magic to a specific RAM address, then reset); (e) test the full cycle: build application, DFU download via dfu-util, verify application runs, trigger DFU via magic value from running application, DFU download again. Measure total DFU transfer time for a 512 KB image.

DFU Bootloader TinyUSB Flash Programming Dual-Boot CRC Validation

Advanced USB Configuration Generator

Use this tool to document your advanced USB configuration — target MCU, advanced feature selection, isochronous transfer type, remote wakeup requirement, and implementation notes. Download as Word, Excel, PDF, or PPTX for project documentation and design review.

USB Advanced Config Generator

Document your advanced USB feature requirements. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

Part 12 has taken you through nine advanced USB topics that separate intermediate USB developers from senior engineers:

  • USB Audio Class 2.0 requires High Speed USB, a carefully constructed audio function topology (clock source, input terminal, feature unit, output terminal), alternate settings for bandwidth management, and feedback endpoints for asynchronous clock synchronisation.
  • Isochronous transfers provide guaranteed bandwidth at the cost of no retransmission — the correct choice for real-time audio and video where latency matters more than perfect reliability. High-bandwidth HS isochronous can deliver up to 3072 bytes per 125 µs microframe.
  • DFU bootloaders enable field firmware updates over the existing USB cable. The DFU state machine (appIDLE → dfuIDLE → dfuDNLOAD-IDLE → dfuMANIFEST) must be implemented precisely. STM32's factory DFU bootloader in system memory is ready to use without any custom code.
  • OTG host mode allows embedded MCUs to act as USB hosts, enumerating keyboards, flash drives, and CDC devices through TinyUSB's host-mode drivers (tuh_task(), tuh_cdc_*, tuh_hid_*).
  • Hub support extends embedded host flexibility. Transaction translators bridge High Speed hosts to Full Speed devices. TinyUSB handles split transactions automatically with CFG_TUH_HUB = 1.
  • Suspend and remote wakeup are mandatory for USB compliance. Proper implementation saves power (from 50 mA to <2.5 mA) and enables systems to wake the host from sleep.
  • USB PD over USB-C CC pins enables up to 100W power negotiation using chips like the FUSB302 or the STM32G0's built-in UCPD peripheral, coexisting seamlessly with USB 2.0 data.
  • USB 3.x SuperSpeed exists in application processors but not embedded MCUs due to analog complexity. Most embedded designs never need it — but understanding LTSSM and when you do need it (4K UVC, >200 MB/s acquisition) prevents architecture mistakes.

Next in the Series

In Part 13: USB Performance Optimization, we focus entirely on maximising USB throughput. We will calculate the theoretical limits of Full Speed and High Speed bulk transfers, implement double buffering and DMA for zero-copy transfers, tune endpoint buffer sizes, measure real throughput with Python benchmarks, optimise the TX and RX data paths for CDC and MSC, and identify bottlenecks using GPIO timing and the DWT cycle counter.

Technology