CMSIS Part 6: CMSIS-DSP — Filters, FFT & Math Functions — CMSIS Mastery Series Part 6

                        
                        Series Progress: This is Part 6 of our 20-part CMSIS Mastery Series. Parts 1–5 covered the ecosystem, CMSIS-Core, startup code, RTOS threads, and RTOS IPC. Now we enter the signal processing domain with CMSIS-DSP.
                    

CMSIS Mastery Series

Your 20-step learning path • Currently on Step 6

1

6

CMSIS-DSP: Filters, FFT & Math Functions

FIR/IIR filters, FFT, SIMD optimizations

You Are Here

7

CMSIS-Driver: UART, SPI & I2C

Driver abstraction layer, callbacks, DMA integration

8

CMSIS-Pack & Software Components

Pack files, device support, dependency management

9

Debugging with CMSIS-DAP & CoreSight

SWD/JTAG, HardFault analysis, ITM tracing

10

Portable Firmware: Multi-Vendor Projects

HAL vs CMSIS, cross-platform BSPs, reusable libraries

11

Interrupts, Concurrency & Real-Time Constraints

Interrupt latency, critical sections, lock-free programming

12

Memory Management in Embedded Systems

Static vs dynamic, heap fragmentation, memory pools

13

Low Power & Energy Optimization

Sleep modes, clock gating, tickless RTOS, power profiling

14

DMA & High-Performance Data Handling

DMA basics, peripheral transfers, zero-copy techniques

15

Security: ARMv8-M & TrustZone

Secure/non-secure worlds, secure boot, firmware protection

16

Bootloaders & Firmware Updates

OTA updates, dual-bank flash, fail-safe strategies

17

Testing & Validation

Unity/Ceedling unit tests, HIL testing, integration testing

18

Performance Optimization

Compiler flags, inline assembly, cache (M7/M33), profiling

19

Embedded Software Architecture

Layered design, event-driven, state machines, component-based

20

Tooling & Workflow (Professional Level)

CI/CD for embedded, MISRA, static analysis, Doxygen

DSP Fundamentals

Before writing a single line of CMSIS-DSP code, you need a firm grasp of the three concepts that underpin all digital signal processing on microcontrollers: the sampling theorem, quantisation noise, and the discrete-time system model. Getting these right means your filters will work. Getting them wrong means wasted silicon and firmware that silently produces garbage.

The Nyquist-Shannon sampling theorem states that to digitise an analogue signal without aliasing you must sample at least twice the highest frequency present in the signal. An audio microphone capturing voice (bandwidth 3.4 kHz) needs a sample rate of at least 6.8 kHz; in practice 8 kHz is the telephony standard. An accelerometer for vibration analysis with content up to 5 kHz needs at least 10 kHz sampling. Miss this requirement and high-frequency content folds back into the baseband, appearing as phantom low-frequency signals that no software filter can remove.

                        
                        Anti-Aliasing Filter: Always pair your ADC with an analogue anti-aliasing filter — a simple RC low-pass — before sampling. The cut-off frequency should be at or below fs/2. No digital filter can undo aliasing after the fact; it must be prevented in hardware.
                    

Quantisation noise arises because a finite-bit ADC rounds the true analogue value to the nearest representable level. A 12-bit ADC introduces quantisation noise with SNR ≈ 6.02N + 1.76 dB ≈ 74 dB. A 16-bit ADC gives ≈ 98 dB. For audio quality or precision measurements, the ADC bit depth sets the noise floor no amount of filtering can overcome.

Fixed-point vs floating-point is the constant trade-off in MCU DSP. Cortex-M4 and M7 cores with the optional FPU can execute single-precision float operations in 1–14 cycles. Cortex-M0/M3 without FPU perform float emulation in software — typically 100+ cycles per multiply. CMSIS-DSP provides both float32 and fixed-point (Q7, Q15, Q31) variants for every algorithm, letting you choose the right format for your core and precision requirements.

CMSIS-DSP Library Architecture

CMSIS-DSP is a compiled library — not header-only like CMSIS-Core. It ships as a pre-built static library (libarm_cortexM4lf_math.a for Cortex-M4 with FPU, for example) and as source that you can compile yourself with the exact flags matching your target. The single include is arm_math.h, which pulls in type definitions, function declarations, and the instance structs that CMSIS-DSP algorithms use to hold their persistent state.

Category	Example Functions	Approx. Function Count
Basic Math	`arm_add_f32`, `arm_mult_q15`, `arm_scale_f32`	~30
Complex Math	`arm_cmplx_mult_cmplx_f32`, `arm_cmplx_mag_f32`	~15
Filtering	`arm_fir_f32`, `arm_biquad_cascade_df2T_f32`, `arm_lms_f32`	~40
Transforms	`arm_rfft_fast_f32`, `arm_cfft_f32`, `arm_dct4_f32`	~20
Statistics	`arm_mean_f32`, `arm_rms_f32`, `arm_var_f32`, `arm_max_f32`	~25
Matrix Operations	`arm_mat_mult_f32`, `arm_mat_inverse_f32`	~20
SVM / Bayes	`arm_svm_linear_predict_f32`, `arm_gaussian_naive_bayes_predict_f32`	~10

The instance struct model is central to CMSIS-DSP design. Stateful algorithms (FIR, IIR, FFT) keep their internal state — delay line, twiddle factors, coefficient array — in a struct that you allocate in your application and pass to every function call. This eliminates hidden global state and makes it trivial to run multiple independent filter instances simultaneously, each with its own state buffer.

# Add CMSIS-DSP to a CMake project using the CMSIS-DSP source tree
# (CMSIS-DSP 1.15+ supports CMake natively via add_subdirectory)

# In CMakeLists.txt:
# set(DISABLEFLOAT16 ON)   # disable fp16 if not needed
# add_subdirectory(CMSIS-DSP/Source CMSISDSPBinary)
# target_link_libraries(my_firmware PRIVATE CMSISDSP)
# target_compile_definitions(my_firmware PRIVATE ARM_MATH_CM4 __FPU_PRESENT=1)

# Or link the pre-built library for Cortex-M4F hard-float ABI:
# target_link_libraries(my_firmware PRIVATE
#     ${CMSIS_DSP_LIB_DIR}/libarm_cortexM4lf_math.a)

# Verify the library exports the expected symbols
arm-none-eabi-nm libarm_cortexM4lf_math.a | grep arm_fir_init_f32

                        
                        Compile Flag Requirement: Always define ARM_MATH_CM4 (or the correct variant for your core) and __FPU_PRESENT=1 when targeting a core with an FPU. Without these, CMSIS-DSP falls back to software FP emulation, negating all SIMD optimisations. Pass these via -DARM_MATH_CM4 -D__FPU_PRESENT=1 in your compiler flags.
                    

FIR Filters

A Finite Impulse Response (FIR) filter computes each output sample as a weighted sum of the current and previous N-1 input samples, where N is the filter order (tap count). FIR filters are inherently stable (no feedback), can achieve exactly linear phase (vital for audio and communications), and map directly to the multiply-accumulate hardware available in Cortex-M4/M7 DSP extensions.

The standard design workflow: (1) specify the filter in the frequency domain (passband, stopband, transition width, attenuation); (2) compute coefficients using a windowed sinc method (Hamming window for audio, Kaiser window for tighter specifications); (3) pass the coefficients and state buffer to arm_fir_init_f32(); (4) call arm_fir_f32() once per block of samples.

/* ── fir_audio_lowpass.c ─────────────────────────────────────────────────
 * 64-tap FIR low-pass filter for audio at 48 kHz sample rate.
 * Cut-off: 8 kHz. Designed with Hamming window (windowed sinc).
 * Coefficients generated by SciPy: scipy.signal.firwin(64, 8000/24000)
 * ──────────────────────────────────────────────────────────────────────── */
#include "arm_math.h"

#define BLOCK_SIZE  64U   /* samples processed per call — match DMA buffer  */
#define NUM_TAPS    64U   /* filter order + 1                                 */

/* FIR coefficients (symmetric, Hamming-windowed sinc, fc = 8 kHz @ 48 kHz) */
static const float32_t g_fir_coeffs[NUM_TAPS] = {
    /* Generated offline; symmetric so only half shown here, padded to 64 */
     0.00000f,  0.00019f, -0.00048f, -0.00063f,  0.00000f,  0.00159f,
     0.00247f,  0.00000f, -0.00489f, -0.00647f,  0.00000f,  0.01103f,
     0.01371f,  0.00000f, -0.02221f, -0.02756f,  0.00000f,  0.05011f,
     0.07568f,  0.09003f,  0.09003f,  0.07568f,  0.05011f,  0.00000f,
    -0.02756f, -0.02221f,  0.00000f,  0.01371f,  0.01103f,  0.00000f,
    -0.00647f, -0.00489f,  0.00000f,  0.00247f,  0.00159f,  0.00000f,
    -0.00063f, -0.00048f,  0.00019f,  0.00000f,  0.00000f,  0.00000f,
     0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,
     0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,
     0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,  0.00000f,
     0.00000f,  0.00000f,  0.00000f,  0.00000f
};

/* State buffer: NUM_TAPS + BLOCK_SIZE - 1 elements */
static float32_t g_fir_state[NUM_TAPS + BLOCK_SIZE - 1U];

/* Instance struct — holds pointers to coefficients and state */
static arm_fir_instance_f32 g_fir;

/* ── One-time initialisation ─────────────────────────────────────────── */
void fir_init(void)
{
    arm_fir_init_f32(
        &g_fir,         /* instance struct (persistent state)  */
        NUM_TAPS,       /* number of filter taps               */
        g_fir_coeffs,   /* pointer to coefficient array        */
        g_fir_state,    /* pointer to state buffer             */
        BLOCK_SIZE);    /* block size for block processing     */
}

/* ── Called every DMA interrupt or RTOS block tick ──────────────────── */
void fir_process(float32_t *p_input, float32_t *p_output)
{
    /* Process BLOCK_SIZE samples in a single vectorised call.
     * On Cortex-M4F the inner loop uses SIMD MAC instructions,
     * achieving ~4 MACs per cycle vs 1 in scalar code.             */
    arm_fir_f32(&g_fir, p_input, p_output, BLOCK_SIZE);
}

                        
                        State Buffer Sizing: The state buffer must be exactly NUM_TAPS + BLOCK_SIZE - 1 float32 elements. This is a silent bug — if the buffer is too small you will corrupt adjacent memory with no immediate fault. Always derive the size at compile time with the macro shown above.
                    

IIR Biquad Filters

Where FIR filters require many taps to achieve steep roll-off, Infinite Impulse Response (IIR) filters achieve the same response with far fewer coefficients by using feedback. The trade-off: IIR filters can be unstable if poorly implemented, and they introduce non-linear phase shift. For most sensor and audio applications these are acceptable trade-offs — a 4-stage biquad cascade achieves an 8th-order Butterworth response with only 20 coefficients.

CMSIS-DSP implements the biquad in Direct Form II Transposed (arm_biquad_cascade_df2T_f32), which has superior numerical properties compared to Direct Form I — it requires fewer delay elements and is less prone to coefficient quantisation noise. Each stage has five coefficients: b0, b1, b2 (feedforward), a1, a2 (feedback), stored in the order [b0, b1, b2, a1, a2].

/* ── iir_biquad_dc_removal.c ─────────────────────────────────────────────
 * Two-stage biquad cascade for DC removal from a sensor signal.
 * Stage 1: High-pass 1 Hz (removes DC offset and slow drift).
 * Stage 2: Notch at 50 Hz (removes mains interference).
 * Coefficients for 1 kHz sample rate, computed with scipy.signal.
 * ──────────────────────────────────────────────────────────────────────── */
#include "arm_math.h"

#define NUM_STAGES  2U   /* cascade of 2 biquad sections */

/* Coefficients in DF2T order: [b0, b1, b2, a1, a2] per stage
 * Negative a-coefficients because CMSIS-DSP sign convention flips them. */
static float32_t g_biquad_coeffs[5U * NUM_STAGES] = {
    /* Stage 1 — 1 Hz high-pass (fc=1 Hz, Q=0.707, fs=1000 Hz) */
     0.99368f, -1.98736f,  0.99368f,  /* b0, b1, b2 */
     1.98728f, -0.98744f,             /* a1, a2 (CMSIS sign: stored positive) */
    /* Stage 2 — 50 Hz notch (fs=1000 Hz, bandwidth=5 Hz) */
     0.97204f, -1.90211f,  0.97204f,  /* b0, b1, b2 */
     1.90211f, -0.94408f              /* a1, a2 */
};

/* State buffer: 2 elements per stage */
static float32_t g_biquad_state[2U * NUM_STAGES];

/* Instance struct */
static arm_biquad_cascade_df2T_instance_f32 g_biquad;

void biquad_init(void)
{
    arm_biquad_cascade_df2T_init_f32(
        &g_biquad,
        NUM_STAGES,
        g_biquad_coeffs,
        g_biquad_state);
}

void biquad_process(float32_t *p_src, float32_t *p_dst, uint32_t block_size)
{
    arm_biquad_cascade_df2T_f32(&g_biquad, p_src, p_dst, block_size);
}

                        
                        Coefficient Sign Convention: CMSIS-DSP stores the negative feedback coefficients — i.e., the values you pass for a1 and a2 should be the positive values from your filter design tool. Scipy's iirfilter returns denominator coefficients with the convention a[0]=1, a[1]=−(your a1), a[2]=−(your a2). Negate them before passing to CMSIS or your filter will be unstable.
                    

FFT & Spectral Analysis

The Fast Fourier Transform converts a block of time-domain samples into a frequency-domain magnitude spectrum. For embedded systems, the most common application is identifying dominant frequencies in a vibration, acoustic, or physiological signal — without knowing in advance what those frequencies are. CMSIS-DSP provides arm_rfft_fast_f32() for real-valued input at half the computational cost of a full complex FFT.

The output of the RFFT for N input samples is N/2 complex pairs (N floats total): bin 0 is the DC component, bin 1 corresponds to frequency fs/N Hz, bin k corresponds to frequency k*fs/N Hz, up to the Nyquist frequency at bin N/2. Use arm_cmplx_mag_f32() to compute magnitudes and arm_max_f32() to find the peak bin.

/* ── fft_vibration_analysis.c ────────────────────────────────────────────
 * Real FFT for vibration spectrum analysis on accelerometer data.
 * FFT size: 1024 samples. Sample rate: 10 kHz.
 * Frequency resolution: 10000/1024 ≈ 9.77 Hz per bin.
 * ──────────────────────────────────────────────────────────────────────── */
#include "arm_math.h"
#include 

#define FFT_SIZE    1024U
#define SAMPLE_RATE 10000.0f   /* Hz */

/* Hann window coefficients (computed offline, stored in flash) */
static const float32_t g_hann_window[FFT_SIZE] = {
    /* w[n] = 0.5 * (1 - cos(2*pi*n/(N-1))) for n = 0..N-1 */
    /* Abbreviated — full 1024-element array in production code */
    0.0f, /* [0]   */
    /* ... */
};

/* Input buffer and FFT output buffer */
static float32_t g_input_buf[FFT_SIZE];    /* windowed time-domain samples  */
static float32_t g_fft_output[FFT_SIZE];   /* complex output (N floats)     */
static float32_t g_magnitude[FFT_SIZE/2U]; /* magnitude spectrum            */

/* RFFT instance struct */
static arm_rfft_fast_instance_f32 g_rfft;

void fft_init(void)
{
    /* Initialise for FFT_SIZE-point transform */
    arm_rfft_fast_init_f32(&g_rfft, FFT_SIZE);
}

float32_t fft_find_peak_frequency(const float32_t *p_accel_samples)
{
    /* Step 1: Apply Hann window to reduce spectral leakage */
    arm_mult_f32(p_accel_samples, g_hann_window, g_input_buf, FFT_SIZE);

    /* Step 2: Compute real FFT (forward transform, ifftFlag = 0) */
    arm_rfft_fast_f32(&g_rfft, g_input_buf, g_fft_output,
                      0U /* ifftFlag=0 for forward FFT */);

    /* Step 3: Compute magnitude of each complex bin
     * g_fft_output layout: [Re0, Re_N/2, Re1, Im1, Re2, Im2, ...] */
    /* Skip bin 0 (DC) and bin N/2 (Nyquist) — start from index 2 */
    arm_cmplx_mag_f32(&g_fft_output[2], &g_magnitude[1],
                      (FFT_SIZE / 2U) - 1U);

    /* Step 4: Find the bin with the maximum magnitude */
    float32_t max_val;
    uint32_t  max_idx;
    arm_max_f32(&g_magnitude[1], (FFT_SIZE / 2U) - 1U,
                &max_val, &max_idx);
    max_idx += 1U; /* adjust for skipped DC bin */

    /* Step 5: Convert bin index to frequency in Hz */
    float32_t peak_freq_hz = (float32_t)max_idx * SAMPLE_RATE / (float32_t)FFT_SIZE;

    return peak_freq_hz;
}

Performance Optimisation

CMSIS-DSP's hand-optimised code uses ARMv7E-M SIMD intrinsics (such as __SMLAD, __PKHBT) to pack two 16-bit operations into one 32-bit instruction on Cortex-M4/M7. This gives 2–4x speedup over equivalent scalar C. On Cortex-M55 and M85 with the M-Profile Vector Extension (MVE / Helium), CMSIS-DSP 1.10+ uses 128-bit SIMD lanes for up to 8x throughput on Q15 operations.

/* ── perf_comparison.c ───────────────────────────────────────────────────
 * Compare Q15 vs float32 FIR performance using DWT cycle counter.
 * Run on Cortex-M4F at 168 MHz (STM32F407).
 * ──────────────────────────────────────────────────────────────────────── */
#include "arm_math.h"
#include "core_cm4.h"  /* for DWT_CYCCNT */
#include 
#include 

#define TAPS       64U
#define BLOCK      256U

/* Enable DWT cycle counter (one-time setup) */
static void dwt_enable(void)
{
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
    DWT->CYCCNT  = 0U;
    DWT->CTRL   |= DWT_CTRL_CYCCNTENA_Msk;
}

static uint32_t dwt_cycles(void) { return DWT->CYCCNT; }

static float32_t  g_f32_coeffs[TAPS];
static float32_t  g_f32_state[TAPS + BLOCK - 1U];
static float32_t  g_f32_in[BLOCK], g_f32_out[BLOCK];
static arm_fir_instance_f32 g_f32_fir;

static q15_t g_q15_coeffs[TAPS];
static q15_t g_q15_state[TAPS + BLOCK - 1U];
static q15_t g_q15_in[BLOCK], g_q15_out[BLOCK];
static arm_fir_instance_q15 g_q15_fir;

void perf_compare(void)
{
    dwt_enable();

    arm_fir_init_f32(&g_f32_fir, TAPS, g_f32_coeffs, g_f32_state, BLOCK);
    arm_fir_init_q15(&g_q15_fir, TAPS, g_q15_coeffs, g_q15_state, BLOCK);

    /* Benchmark float32 FIR */
    uint32_t t0 = dwt_cycles();
    arm_fir_f32(&g_f32_fir, g_f32_in, g_f32_out, BLOCK);
    uint32_t f32_cycles = dwt_cycles() - t0;

    /* Benchmark Q15 FIR */
    t0 = dwt_cycles();
    arm_fir_fast_q15(&g_q15_fir, g_q15_in, g_q15_out, BLOCK);
    uint32_t q15_cycles = dwt_cycles() - t0;

    /* Typical results on STM32F407 @ 168 MHz, 64-tap, 256-sample block:
     * float32 FIR : ~3200 cycles  (~19 µs)
     * Q15 FIR     : ~1400 cycles  (~8.3 µs) — 2.3x faster
     * Cycle counts depend on data cache hit/miss; run multiple iterations */
    printf("float32: %lu cycles | Q15: %lu cycles\r\n",
           (unsigned long)f32_cycles, (unsigned long)q15_cycles);
}

Format	Bit Width	Range	Precision	When to Use
Q7	8-bit	-1.0 to +0.992	~40 dB SNR	RAM-constrained M0, coarse classification
Q15	16-bit	-1.0 to +0.99997	~90 dB SNR	M4/M7 SIMD, audio processing, sensor fusion
Q31	32-bit	-1.0 to +1.0	~186 dB SNR	High precision without FPU on M3/M4
float32	32-bit IEEE 754	±3.4×10³⁸	~150 dB SNR	M4F/M7F with FPU, convenience, rapid prototyping

Exercises

Exercise 1 Beginner

Design and Implement a 50 Hz Notch Filter

Design a second-order IIR notch filter centred at 50 Hz for a 1 kHz sample rate using SciPy (scipy.signal.iirnotch(50, 30, 1000)). Extract the b and a coefficients, convert them to CMSIS-DSP DF2T format (remembering the sign convention for a1 and a2), and implement the filter using arm_biquad_cascade_df2T_f32(). Test with synthetic data: a 10 Hz sine wave corrupted with a 50 Hz sine wave. Verify the output amplitude at 50 Hz is attenuated by at least 30 dB while the 10 Hz component is unchanged. Plot both input and output spectrograms using Python with the captured data.

IIR Biquad Coefficient Conversion Notch Filter

Exercise 2 Intermediate

Peak-Frequency Detection Using RFFT and arm_max_f32

Implement a complete spectral analysis pipeline: capture 1024 samples from an ADC at a known sample rate, apply a Hann window using arm_mult_f32(), compute the real FFT with arm_rfft_fast_f32(), compute magnitudes with arm_cmplx_mag_f32(), and locate the peak frequency bin using arm_max_f32(). Test by driving the MCU ADC input with a function generator at three known frequencies (100 Hz, 500 Hz, 2 kHz). For each, verify the detected peak frequency is within ±(fs/N) of the true frequency. Explain why the resolution improves if you increase N from 512 to 1024.

arm_rfft_fast_f32 Windowing Spectral Analysis

Exercise 3 Advanced

Benchmark FIR Filter in Q15 vs float32 on Real Hardware

Implement a 64-tap low-pass FIR filter in both float32 and Q15 variants using arm_fir_f32() and arm_fir_fast_q15(). Use the DWT cycle counter (CoreDebug/DWT registers) to measure the exact clock cycles consumed by each variant for a 256-sample block. Run the benchmark at three optimisation levels: -O0, -O2, and -Os. Record results in a table. Convert Q15 coefficients from float32 using arm_float_to_q15() and verify output equivalence: both filters applied to the same input should produce outputs within 0.01% of each other (Q15 quantisation error). Discuss the trade-off between execution time, RAM usage, and numerical accuracy.

DWT Profiling Q15 vs float32 Optimisation Levels

DSP Pipeline Specification Generator

Use this tool to document your CMSIS-DSP signal processing pipeline — signal source, sample rate, data format, filter type and parameters, FFT configuration, and performance targets. Download as Word, Excel, PDF, or PPTX for design documentation or team handoff.

DSP Pipeline Specification Generator

Document your CMSIS-DSP signal processing design. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Project Name *

Target MCU *

Signal Source (sensor/ADC/mic)

Sample Rate (Hz)

Data Format

Filter Type

FIR Taps / IIR Stages

FFT Size

Window Function

Output Format / Consumer

Performance Budget (cycles or time)

Author Name

Conclusion & Next Steps

In this article we have worked through the full CMSIS-DSP toolkit from fundamentals to implementation:

Sampling theory — the Nyquist theorem, aliasing prevention with analogue anti-aliasing filters, and quantisation noise set hard limits that no digital processing can overcome.
CMSIS-DSP architecture — instance structs keep state external, enabling multiple independent filter instances; compile-time flags (ARM_MATH_CM4) unlock SIMD optimisations.
FIR filters — arm_fir_init_f32 + arm_fir_f32 implement windowed-sinc designs in block mode; state buffer sizing (NUM_TAPS + BLOCK_SIZE − 1) is critical.
IIR biquad filters — arm_biquad_cascade_df2T_f32 achieves high-order responses with few coefficients; mind the CMSIS sign convention for feedback coefficients.
FFT — arm_rfft_fast_f32 + arm_cmplx_mag_f32 + arm_max_f32 form a complete spectral analysis pipeline; windowing is essential to suppress spectral leakage.
Performance — Q15 on Cortex-M4 with SIMD gives ~2–3x speedup over float32; the DWT cycle counter is the definitive benchmarking tool.

Next in the Series

In Part 7: CMSIS-Driver — UART, SPI & I2C, we shift focus to the peripheral abstraction layer: the ARM_DRIVER_xx struct pattern, asynchronous callback events, DMA-backed transfers, and how RTOS semaphores turn non-blocking drivers into clean blocking APIs for application code.

Cookie Consent

Cookie Preferences

CMSIS Part 6: CMSIS-DSP — Filters, FFT & Math Functions

Table of Contents

CMSIS Mastery Series

Overview & ARM Cortex-M Ecosystem

CMSIS-Core: Registers, NVIC & SysTick

Startup Code, Linker Scripts & Vector Table

CMSIS-RTOS2: Threads, Mutexes & Semaphores

CMSIS-RTOS2: Message Queues & Event Flags

CMSIS-DSP: Filters, FFT & Math Functions

CMSIS-Driver: UART, SPI & I2C

CMSIS-Pack & Software Components

Debugging with CMSIS-DAP & CoreSight

Portable Firmware: Multi-Vendor Projects

Interrupts, Concurrency & Real-Time Constraints

Memory Management in Embedded Systems

Low Power & Energy Optimization

DMA & High-Performance Data Handling

Security: ARMv8-M & TrustZone

Bootloaders & Firmware Updates

Testing & Validation

Performance Optimization

Embedded Software Architecture

Tooling & Workflow (Professional Level)

DSP Fundamentals

CMSIS-DSP Library Architecture

FIR Filters

IIR Biquad Filters

FFT & Spectral Analysis

Performance Optimisation

Exercises

Design and Implement a 50 Hz Notch Filter

Peak-Frequency Detection Using RFFT and arm_max_f32

Benchmark FIR Filter in Q15 vs float32 on Real Hardware

DSP Pipeline Specification Generator

DSP Pipeline Specification Generator

Conclusion & Next Steps

Next in the Series

Cookie Consent

Cookie Preferences

CMSIS Part 6: CMSIS-DSP — Filters, FFT & Math Functions

Table of Contents

CMSIS Mastery Series

Overview & ARM Cortex-M Ecosystem

CMSIS-Core: Registers, NVIC & SysTick

Startup Code, Linker Scripts & Vector Table

CMSIS-RTOS2: Threads, Mutexes & Semaphores

CMSIS-RTOS2: Message Queues & Event Flags

CMSIS-DSP: Filters, FFT & Math Functions

CMSIS-Driver: UART, SPI & I2C

CMSIS-Pack & Software Components

Debugging with CMSIS-DAP & CoreSight

Portable Firmware: Multi-Vendor Projects

Interrupts, Concurrency & Real-Time Constraints

Memory Management in Embedded Systems

Low Power & Energy Optimization

DMA & High-Performance Data Handling

Security: ARMv8-M & TrustZone

Bootloaders & Firmware Updates

Testing & Validation

Performance Optimization

Embedded Software Architecture

Tooling & Workflow (Professional Level)

DSP Fundamentals

CMSIS-DSP Library Architecture

FIR Filters

IIR Biquad Filters

FFT & Spectral Analysis

Performance Optimisation

Exercises

Design and Implement a 50 Hz Notch Filter

Peak-Frequency Detection Using RFFT and arm_max_f32

Benchmark FIR Filter in Q15 vs float32 on Real Hardware

DSP Pipeline Specification Generator

DSP Pipeline Specification Generator

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 7: CMSIS-Driver — UART, SPI & I2C

Part 18: Performance Optimization

Part 14: DMA & High-Performance Data Handling