Back to Technology

CMSIS Part 19: Embedded Software Architecture

March 31, 2026 Wasil Zafar 28 min read

Architecture is the difference between firmware that scales and firmware that collapses — layered design, event-driven FSMs, and component contracts that make embedded software maintainable across hardware revisions.

Table of Contents

  1. Layered Architecture
  2. Hardware Abstraction Layer Design
  3. Event-Driven Design
  4. Hierarchical State Machines
  5. Component-Based Design
  6. Publish-Subscribe Event Bus
  7. Exercises
  8. Architecture Design Canvas
  9. Conclusion & Next Steps
Series Context: This is Part 19 of the 20-part CMSIS Mastery Series — the penultimate article. Everything built so far (CMSIS-Core, RTOS2, DSP, drivers, security, testing, optimisation) is raw material. This part shows you how to organise it into firmware that survives hardware revisions, team growth, and requirements change.

CMSIS Mastery Series

Your 20-step learning path • Currently on Step 19
1
Overview & ARM Cortex-M Ecosystem
CMSIS layers, Cortex-M families, memory map, toolchains
2
CMSIS-Core: Registers, NVIC & SysTick
core_cmX.h, register access, interrupt controller, SysTick timer
3
Startup Code, Linker Scripts & Vector Table
Reset handler, BSS init, scatter files, boot process
4
CMSIS-RTOS2: Threads, Mutexes & Semaphores
Thread management, synchronization primitives, scheduling
5
CMSIS-RTOS2: Message Queues & Event Flags
Inter-thread comms, ISR-to-thread, real-time design patterns
6
CMSIS-DSP: Filters, FFT & Math Functions
FIR/IIR filters, FFT, SIMD optimizations
7
CMSIS-Driver: UART, SPI & I2C
Driver abstraction layer, callbacks, DMA integration
8
CMSIS-Pack & Software Components
Pack files, device support, dependency management
9
Debugging with CMSIS-DAP & CoreSight
SWD/JTAG, HardFault analysis, ITM tracing
10
Portable Firmware: Multi-Vendor Projects
HAL vs CMSIS, cross-platform BSPs, reusable libraries
11
Interrupts, Concurrency & Real-Time Constraints
Interrupt latency, critical sections, lock-free programming
12
Memory Management in Embedded Systems
Static vs dynamic, heap fragmentation, memory pools
13
Low Power & Energy Optimization
Sleep modes, clock gating, tickless RTOS, power profiling
14
DMA & High-Performance Data Handling
DMA basics, peripheral transfers, zero-copy techniques
15
Security: ARMv8-M & TrustZone
Secure/non-secure worlds, secure boot, firmware protection
16
Bootloaders & Firmware Updates
OTA updates, dual-bank flash, fail-safe strategies
17
Testing & Validation
Unity/Ceedling unit tests, HIL testing, integration testing
18
Performance Optimization
Compiler flags, inline assembly, cache (M7/M33), profiling
19
Embedded Software Architecture
Layered design, event-driven, state machines, component-based
You Are Here
20
Tooling & Workflow (Professional Level)
CI/CD for embedded, MISRA, static analysis, Doxygen

Layered Architecture

The most important architectural rule in embedded firmware is directional dependency: upper layers call lower layers, never the reverse. An application layer calls a service layer. A service layer calls a driver layer. A driver layer calls a HAL layer. HAL calls CMSIS. CMSIS calls hardware registers. No layer reaches across or skips a layer.

This rule makes firmware testable (mock the layer below in unit tests), portable (swap the HAL layer to retarget to a new MCU), and maintainable (change the application layer without touching drivers). Violating it is how firmware accumulates technical debt that eventually makes the codebase unmaintainable.

Pattern Testability Complexity Portability RTOS Suitability
Monolithic Very Low Low initially, high over time Very Low Poor
Layered High Medium High (swap HAL layer) Good
Event-Driven High Medium-High Medium Excellent
Component-Based Very High High (upfront design) Very High Excellent
/**
 * Layered architecture in C — four-layer firmware structure.
 * Rule: each layer only calls functions from the layer directly below it.
 *
 * Layer 4: Application   — business logic, use-case orchestration
 * Layer 3: Service       — reusable services (logging, config, comms manager)
 * Layer 2: Driver        — peripheral-level abstraction (UART driver, SPI driver)
 * Layer 1: HAL           — vendor-specific register access (wraps CMSIS device headers)
 */

/* ── Layer 1: HAL (Hardware Abstraction Layer) ── */
/* hal_uart.h */
typedef struct {
    void (*const init)(uint32_t baud);
    void (*const send_byte)(uint8_t byte);
    int  (*const recv_byte)(uint8_t *byte, uint32_t timeout_ms);
} hal_uart_t;

/* hal_uart_stm32f4.c — STM32F4-specific implementation */
static void hal_uart_stm32f4_init(uint32_t baud) {
    /* Directly accesses CMSIS device header registers */
    RCC->APB1ENR |= RCC_APB1ENR_USART2EN;
    USART2->BRR   = SystemCoreClock / baud;
    USART2->CR1   = USART_CR1_TE | USART_CR1_RE | USART_CR1_UE;
}

const hal_uart_t HAL_UART2 = {
    .init      = hal_uart_stm32f4_init,
    .send_byte = hal_uart_stm32f4_send_byte,
    .recv_byte = hal_uart_stm32f4_recv_byte,
};

/* ── Layer 2: Driver — UART framing, buffering ── */
/* uart_driver.h — calls only HAL functions */
typedef struct {
    const hal_uart_t *hal;
    ring_buffer_t     rx_buf;
    ring_buffer_t     tx_buf;
} uart_driver_t;

void uart_driver_init(uart_driver_t *drv, const hal_uart_t *hal, uint32_t baud) {
    drv->hal = hal;
    hal->init(baud);  /* Calls down to HAL — never up to Service */
}

int32_t uart_driver_write(uart_driver_t *drv, const uint8_t *data, uint32_t len) {
    for (uint32_t i = 0; i < len; i++) {
        drv->hal->send_byte(data[i]);
    }
    return (int32_t)len;
}

/* ── Layer 3: Service — logging service ── */
/* log_service.h — calls only Driver functions */
typedef struct {
    uart_driver_t *uart;
    uint8_t        level;
} log_service_t;

void log_service_write(log_service_t *svc, uint8_t level, const char *msg) {
    if (level < svc->level) { return; }
    uart_driver_write(svc->uart, (const uint8_t *)msg, strlen(msg));
}

/* ── Layer 4: Application — never calls HAL or CMSIS directly ── */
void app_run(void) {
    log_service_write(&g_log, LOG_INFO, "System started\r\n");
    /* All hardware interaction through services and drivers */
}

Hardware Abstraction Layer Design

A well-designed HAL uses function pointer structs as interfaces — the C equivalent of virtual function tables. Each platform provides its own concrete implementation of the struct. The layers above hold a pointer to the interface and call through it, never knowing which platform they are running on. This is the same pattern CMSIS-Driver uses at the driver level.

CMSIS-Driver as Your HAL Model: Examine Driver_USART.h — it is exactly a HAL interface as function pointers in a struct (ARM_DRIVER_USART). When you design your own HAL, follow this pattern. The driver struct is allocated statically at link time (no heap), const-qualified to allow ROM placement, and injected via pointer into the layer above.

Event-Driven Design

Event-driven firmware replaces the classic super-loop (while(1) with sequential polling) with an event dispatcher: events are produced by ISRs or timers, placed in a queue, and consumed by state machines or task handlers. This decouples producers from consumers and enables the firmware to respond to multiple concurrent stimulus sources without tight polling.

In an RTOS context (CMSIS-RTOS2), each logical subsystem is a thread that blocks on a message queue or event flags. The RTOS scheduler becomes the event dispatcher. In a bare-metal context, you implement a lightweight event queue in the super-loop and dispatch events to handler functions — keeping interrupt service routines minimal.

Hierarchical State Machines in C

The finite state machine (FSM) is the fundamental building block of event-driven embedded firmware. Any subsystem with distinct operating modes (idle, initialising, running, fault, sleep) should be modelled as an FSM. In C, the cleanest implementation uses a state enum, an event enum, and a transition table — a MISRA-C:2012 compliant function-pointer dispatch table.

Feature FSM HSM Statechart (UML)
Transitions Flat, any state to any state Hierarchical (sub-states inherit parent transitions) Full UML semantics
History States No Yes (shallow and deep history) Yes
Entry/Exit Actions Manual Built-in per state Built-in
Code Complexity Low Medium High (requires codegen tool)
/**
 * Hierarchical State Machine (HSM) in C.
 * Example: Connection Manager FSM.
 *
 * States: DISCONNECTED -> CONNECTING -> CONNECTED -> DISCONNECTING
 * Events: EVT_CONNECT, EVT_CONNECTED, EVT_DISCONNECT, EVT_TIMEOUT, EVT_ERROR
 */
#include 
#include 

/* ── State and event enumerations ── */
typedef enum {
    STATE_DISCONNECTED  = 0,
    STATE_CONNECTING,
    STATE_CONNECTED,
    STATE_DISCONNECTING,
    STATE_COUNT
} conn_state_t;

typedef enum {
    EVT_CONNECT         = 0,
    EVT_CONNECTED,
    EVT_DISCONNECT,
    EVT_TIMEOUT,
    EVT_ERROR,
    EVT_COUNT
} conn_event_t;

/* ── FSM context ── */
typedef struct {
    conn_state_t state;
    uint32_t     retry_count;
    uint32_t     timeout_ms;
} conn_fsm_t;

/* ── Action function type ── */
typedef conn_state_t (*fsm_action_fn)(conn_fsm_t *ctx);

/* ── Transition table entry ── */
typedef struct {
    conn_state_t  next_state;
    fsm_action_fn action;
} fsm_transition_t;

/* ── Action implementations ── */
static conn_state_t action_start_connect(conn_fsm_t *ctx) {
    ctx->retry_count = 0;
    network_start_connect();           /* Side effect: initiate connection */
    return STATE_CONNECTING;
}

static conn_state_t action_connected(conn_fsm_t *ctx) {
    (void)ctx;
    event_bus_publish(EVT_BUS_NETWORK_UP, NULL);
    return STATE_CONNECTED;
}

static conn_state_t action_disconnect(conn_fsm_t *ctx) {
    (void)ctx;
    network_close();
    return STATE_DISCONNECTED;
}

static conn_state_t action_retry(conn_fsm_t *ctx) {
    ctx->retry_count++;
    if (ctx->retry_count >= 3U) {
        log_error("Max retries — giving up");
        return STATE_DISCONNECTED;
    }
    network_start_connect();
    return STATE_CONNECTING;
}

/* ── Transition table [current_state][event] ── */
/* NULL action = ignore event in this state */
static const fsm_transition_t s_transitions[STATE_COUNT][EVT_COUNT] = {
    /* DISCONNECTED */
    [STATE_DISCONNECTED] = {
        [EVT_CONNECT]    = { STATE_CONNECTING,    action_start_connect },
        [EVT_CONNECTED]  = { STATE_DISCONNECTED,  NULL },
        [EVT_DISCONNECT] = { STATE_DISCONNECTED,  NULL },
        [EVT_TIMEOUT]    = { STATE_DISCONNECTED,  NULL },
        [EVT_ERROR]      = { STATE_DISCONNECTED,  NULL },
    },
    /* CONNECTING */
    [STATE_CONNECTING] = {
        [EVT_CONNECT]    = { STATE_CONNECTING,    NULL },
        [EVT_CONNECTED]  = { STATE_CONNECTED,     action_connected },
        [EVT_DISCONNECT] = { STATE_DISCONNECTED,  action_disconnect },
        [EVT_TIMEOUT]    = { STATE_CONNECTING,    action_retry },
        [EVT_ERROR]      = { STATE_DISCONNECTED,  action_disconnect },
    },
    /* CONNECTED */
    [STATE_CONNECTED] = {
        [EVT_CONNECT]    = { STATE_CONNECTED,     NULL },
        [EVT_CONNECTED]  = { STATE_CONNECTED,     NULL },
        [EVT_DISCONNECT] = { STATE_DISCONNECTED,  action_disconnect },
        [EVT_TIMEOUT]    = { STATE_CONNECTED,     NULL },
        [EVT_ERROR]      = { STATE_DISCONNECTED,  action_disconnect },
    },
};

/* ── Event dispatcher ── */
void conn_fsm_process(conn_fsm_t *ctx, conn_event_t evt) {
    if (ctx->state >= STATE_COUNT || evt >= EVT_COUNT) { return; }

    const fsm_transition_t *t = &s_transitions[ctx->state][evt];

    if (t->action != NULL) {
        ctx->state = t->action(ctx);   /* Execute action, which returns next state */
    }
}

Component-Based Design

Component-based design takes layered architecture one step further: each module exposes a standardised lifecycle interfaceinit(), process(), deinit() — and registers itself with a component registry. The application layer iterates the registry to initialise and run all components, without knowing their internal implementation.

This pattern is directly analogous to Linux kernel modules, AUTOSAR SWCs (Software Components), and the CMSIS-Pack component model. It enables component libraries to be composed at link time or via build system configuration, without modifying the application layer.

/**
 * Component-based design in C.
 * Each component exposes an init/process/deinit interface.
 * The application iterates a static component registry.
 */

/* ── Component interface ── */
typedef struct {
    const char  *name;
    int32_t    (*init)   (void);
    void       (*process)(void);
    void       (*deinit) (void);
} component_t;

/* ── Component implementations ── */
static int32_t  temperature_sensor_init(void)    { /* init I2C, configure sensor */ return 0; }
static void     temperature_sensor_process(void) { /* read sensor, update state  */ }
static void     temperature_sensor_deinit(void)  { /* power down sensor          */ }

static int32_t  display_init(void)    { /* init SPI, clear display */ return 0; }
static void     display_process(void) { /* refresh display buffer  */ }
static void     display_deinit(void)  { /* blank display, power off */ }

static int32_t  comm_manager_init(void)    { /* init UART, connect */ return 0; }
static void     comm_manager_process(void) { /* poll/send messages  */ }
static void     comm_manager_deinit(void)  { /* close connections    */ }

/* ── Static component registry ── */
static const component_t s_components[] = {
    { "TempSensor",  temperature_sensor_init,  temperature_sensor_process,  temperature_sensor_deinit  },
    { "Display",     display_init,             display_process,             display_deinit             },
    { "CommManager", comm_manager_init,         comm_manager_process,        comm_manager_deinit        },
};
static const uint32_t COMPONENT_COUNT =
    sizeof(s_components) / sizeof(s_components[0]);

/* ── Application layer — iterates registry, never calls component functions directly ── */
int main(void) {
    /* Initialise all components */
    for (uint32_t i = 0; i < COMPONENT_COUNT; i++) {
        if (s_components[i].init() != 0) {
            log_error("Component %s failed to initialise", s_components[i].name);
            /* Handle gracefully — continue or halt */
        }
    }

    /* Main application loop */
    for (;;) {
        for (uint32_t i = 0; i < COMPONENT_COUNT; i++) {
            s_components[i].process();
        }
        /* RTOS variant: each component runs in its own osThread */
    }
}

Publish-Subscribe Event Bus

The publish-subscribe (observer) pattern completely decouples producers and consumers of events. The sensor component publishes EVT_TEMP_UPDATED — it has no knowledge of which components subscribe to it. The display component and the alarm component both subscribe independently. Removing one subscriber requires no change to the sensor component.

In bare-metal firmware, this is implemented as a static callback table per event type. In RTOS firmware, each subscriber is a task waiting on a message queue or event flag, and the event bus posts to all registered queues. The key invariant: the event bus must be ISR-safe — publishers are often ISRs or interrupt-context callbacks.

/**
 * Lightweight publish-subscribe event bus for embedded firmware.
 * ISR-safe: publish uses a critical section to protect the callback list.
 * Suitable for bare-metal and RTOS (replace critical section with mutex for RTOS).
 */
#include 
#include 
#include "cmsis_compiler.h"   /* __disable_irq / __enable_irq */

#define EVENT_BUS_MAX_EVENTS      16U
#define EVENT_BUS_MAX_SUBSCRIBERS  8U

typedef void (*event_callback_t)(uint32_t event_id, const void *data);

typedef struct {
    event_callback_t callbacks[EVENT_BUS_MAX_SUBSCRIBERS];
    uint32_t         count;
} event_subscribers_t;

static event_subscribers_t s_subscribers[EVENT_BUS_MAX_EVENTS];

/* ── Subscribe ── */
int32_t event_bus_subscribe(uint32_t event_id, event_callback_t cb) {
    if (event_id >= EVENT_BUS_MAX_EVENTS || cb == NULL) { return -1; }

    event_subscribers_t *subs = &s_subscribers[event_id];
    if (subs->count >= EVENT_BUS_MAX_SUBSCRIBERS)       { return -2; }

    uint32_t primask = __get_PRIMASK();
    __disable_irq();                   /* Critical section enter */
    subs->callbacks[subs->count++] = cb;
    if (!primask) { __enable_irq(); }  /* Critical section exit */

    return 0;
}

/* ── Publish (ISR-safe) ── */
void event_bus_publish(uint32_t event_id, const void *data) {
    if (event_id >= EVENT_BUS_MAX_EVENTS) { return; }

    const event_subscribers_t *subs = &s_subscribers[event_id];

    /* Iterate callbacks — do NOT call from within critical section
     * to avoid priority inversion. Copy count, then call. */
    uint32_t count = subs->count;   /* Atomic read on Cortex-M */

    for (uint32_t i = 0; i < count; i++) {
        if (subs->callbacks[i] != NULL) {
            subs->callbacks[i](event_id, data);
        }
    }
}

/* ── Usage example: decouple sensor task from display task ── */
#define EVT_TEMP_UPDATED    0U
#define EVT_ALARM_TRIGGERED 1U

/* Sensor task — publishes, never calls display directly */
static void sensor_task(void *arg) {
    float temperature;
    for (;;) {
        temperature = read_temperature_sensor();
        event_bus_publish(EVT_TEMP_UPDATED, &temperature);
        osDelay(100U);
    }
}

/* Display task — subscribes to temperature event */
static void on_temp_updated(uint32_t id, const void *data) {
    (void)id;
    const float *temp = (const float *)data;
    display_update_temperature(*temp);
}

void display_task_init(void) {
    event_bus_subscribe(EVT_TEMP_UPDATED, on_temp_updated);
}
Callback Safety: Do not call long-running operations inside event bus callbacks when publishing from ISR context. The callback executes in the publisher's context. For ISR-to-task communication, have the callback post to an RTOS message queue and return immediately.

Exercises

Exercise 1 Intermediate

Refactor a Monolithic main() into Layered Architecture

Take an existing embedded project (or the blink example from Part 1) where all code lives in main.c. Refactor it into four layers: HAL (register access via function pointer structs), Driver (peripheral abstraction), Service (application-level logic), Application (only calls services). Verify that no layer calls more than one layer below it. Write a dependency diagram and confirm zero upward dependencies.

Layered Architecture HAL Design Dependency Rules
Exercise 2 Intermediate

Implement a Moore FSM for a Debounced Button + LED

Design and implement a Moore FSM with 4 states: IDLE (LED off), DEBOUNCE_PRESS (50 ms timer running), LED_ON (LED lit), DEBOUNCE_RELEASE (50 ms timer running). Events: BTN_RAW_PRESSED, BTN_RAW_RELEASED, TIMER_EXPIRED. Use a transition table (function pointer array) — do not use a switch/case chain. Write Unity test cases for every transition by calling fsm_process() directly with injected events.

Moore FSM Transition Table Debounce Unity Tests
Exercise 3 Advanced

Publish-Subscribe Event Bus — Decouple Two RTOS Tasks

Implement the event bus from this article. Create two CMSIS-RTOS2 tasks: a producer task that generates sensor data every 100 ms and publishes it via event_bus_publish(), and a consumer task that subscribes to the event and processes the data. Verify decoupling: remove the consumer task completely and confirm the producer task still runs without modification. Measure latency from publish to callback execution using the DWT cycle counter.

Publish-Subscribe CMSIS-RTOS2 Decoupling DWT Latency

Architecture Design Canvas

Use this tool to document your embedded firmware architecture — pattern selection, layer definitions, event sources, state machines, and component interfaces. Download as Word, Excel, PDF, or PPTX for architecture review meetings or design documentation.

Embedded Architecture Design Canvas

Document your firmware architecture. Download as Word, Excel, PDF, or PPTX.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

In this part we have established the architectural foundations that make firmware maintainable, testable, and portable:

  • Layered architecture with strict directional dependency — application calls services, services call drivers, drivers call HAL, HAL calls CMSIS. Never upward, never skipping.
  • Function pointer structs as HAL interfaces — the same pattern CMSIS-Driver uses, enabling compile-time injection of platform implementations.
  • FSM transition tables over switch/case chains — MISRA-compliant, uniformly structured, and trivially unit-testable by injecting events directly.
  • Component-based design with a standardised init/process/deinit lifecycle and a static component registry iterated by the application layer.
  • Publish-subscribe event bus that decouples producers and consumers of events — both ISR-safe and RTOS-compatible depending on the critical section implementation.

Next: The Final Part

In Part 20: Tooling & Workflow — Professional Embedded Development, we complete the series with the pipeline that wraps everything together — a complete GitHub Actions CI/CD workflow, MISRA-C static analysis with cppcheck, code coverage gating, Doxygen documentation generation, semantic versioning, and automated firmware release artefacts.

Technology