Part 9: Advanced Topics

July 14, 2025 Wasil Zafar 50 min read

Sensor fusion algorithms, Kalman filter implementation, TinyML on microcontrollers, ultra-low power strategies, and building fault-tolerant embedded systems.

Sensor Fusion
TinyML
- TensorFlow Lite Micro
- ML Workflow
Power Optimization
- Sleep Modes
- Power Budgeting
Fault Tolerance
- Watchdog Timers
- Sensor Redundancy
Conclusion & Next Steps

Sensor Fusion

Sensor fusion combines data from multiple sensors to produce a more accurate, reliable, or complete measurement than any single sensor can provide. The classic example is IMU fusion: accelerometers provide accurate orientation under static conditions but are noisy with vibration, while gyroscopes provide smooth angular rate but drift over time.

Sensor Fusion Concept

flowchart LR
    ACC["📐 Accelerometer
Accurate but noisy"] --> FUSE
    GYRO["🔄 Gyroscope
Smooth but drifts"] --> FUSE
    MAG["🧭 Magnetometer
Heading reference"] --> FUSE
    FUSE["🧠 Fusion
Algorithm"] --> OUT["✅ Fused Estimate
Accurate + Stable"]
    style FUSE fill:#3B9797,stroke:#3B9797,color:#fff
    style OUT fill:#132440,stroke:#132440,color:#fff
    style ACC fill:#e8f4f4,stroke:#3B9797,color:#132440
    style GYRO fill:#f0f4f8,stroke:#16476A,color:#132440
    style MAG fill:#e8f4f4,stroke:#3B9797,color:#132440

Complementary Filter

The simplest fusion algorithm. It combines accelerometer (low-pass) and gyroscope (high-pass) data with a tuning parameter $\alpha$ (typically 0.96–0.98):

$$\theta_{fused} = \alpha \cdot (\theta_{prev} + \omega_{gyro} \cdot \Delta t) + (1 - \alpha) \cdot \theta_{accel}$$

// Complementary filter for pitch/roll from IMU
#include <math.h>

#define ALPHA 0.98f
#define RAD_TO_DEG 57.2957795f

typedef struct {
    float pitch;   // degrees
    float roll;    // degrees
} Orientation;

static Orientation orient = {0, 0};

Orientation complementary_filter(float ax, float ay, float az,
                                  float gx, float gy, float gz,
                                  float dt) {
    // Gyroscope integration (high-pass: captures fast changes)
    orient.pitch += gx * dt;
    orient.roll  += gy * dt;

    // Accelerometer angle (low-pass: captures gravity direction)
    float accel_pitch = atan2f(ay, sqrtf(ax * ax + az * az)) * RAD_TO_DEG;
    float accel_roll  = atan2f(-ax, az) * RAD_TO_DEG;

    // Fuse: trust gyro for fast changes, accel for steady-state
    orient.pitch = ALPHA * orient.pitch + (1.0f - ALPHA) * accel_pitch;
    orient.roll  = ALPHA * orient.roll  + (1.0f - ALPHA) * accel_roll;

    return orient;
}

Kalman Filter

The Kalman filter is the optimal estimator for linear systems with Gaussian noise. It operates in two phases: predict (project state forward using the model) and update (correct with new measurement).

$$\hat{x}_{k|k-1} = F \hat{x}_{k-1|k-1} + B u_k \quad \text{(Predict)}$$ $$P_{k|k-1} = F P_{k-1|k-1} F^T + Q$$ $$K_k = P_{k|k-1} H^T (H P_{k|k-1} H^T + R)^{-1} \quad \text{(Kalman Gain)}$$ $$\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (z_k - H \hat{x}_{k|k-1}) \quad \text{(Update)}$$ $$P_{k|k} = (I - K_k H) P_{k|k-1}$$

// 1D Kalman filter for sensor smoothing
typedef struct {
    float x;     // State estimate
    float p;     // Estimate covariance
    float q;     // Process noise
    float r;     // Measurement noise
    float k;     // Kalman gain
} Kalman1D;

void kalman_init(Kalman1D *kf, float initial, float p, float q, float r) {
    kf->x = initial;
    kf->p = p;
    kf->q = q;
    kf->r = r;
}

float kalman_update(Kalman1D *kf, float measurement) {
    // Predict
    kf->p += kf->q;

    // Update
    kf->k = kf->p / (kf->p + kf->r);
    kf->x = kf->x + kf->k * (measurement - kf->x);
    kf->p = (1.0f - kf->k) * kf->p;

    return kf->x;
}

Madgwick / Mahony Filters

                            
                            IMU Orientation Filters:
                            Madgwick: Gradient descent optimization. Single tuning parameter (beta). Handles magnetic distortion. 40–100 µs on Cortex-M4
Mahony: Complementary filter with PI correction. Two tuning parameters (Kp, Ki). Lighter than Madgwick. Popular in flight controllers
Extended Kalman Filter (EKF): Linearizes nonlinear system. Most accurate but heaviest. Used in high-end INS/GPS fusion
Unscented Kalman Filter (UKF): Uses sigma points instead of linearization. Better for highly nonlinear systems

                        

TinyML

TensorFlow Lite for Microcontrollers

TinyML brings machine learning inference to microcontrollers with as little as 16 KB of RAM. Common applications include keyword spotting, gesture recognition, anomaly detection, and predictive maintenance — all running locally without cloud connectivity.

TinyML Platform Comparison

Framework	Min RAM	MCU Support	Features
TF Lite Micro	16 KB	Cortex-M0+ to M7	Quantized models, CMSIS-NN acceleration
Edge Impulse	Varies	Arduino, STM32, Nordic	End-to-end pipeline, data collection
STM32Cube.AI	Varies	STM32 only	Direct TF/ONNX import, optimized for STM32
microTVM (Apache TVM)	Varies	Multiple	Compiler-based optimization

ML Workflow for Embedded

                            
                            TinyML Development Pipeline:
                            Collect: Gather sensor data from target hardware (accelerometer, microphone, etc.)
Train: Train model in Python (TensorFlow/Keras) on PC or cloud
Quantize: Convert float32 weights to int8 (4x size reduction, 2-4x speed improvement)
Convert: Export as .tflite FlatBuffer model file
Deploy: Include as C array in firmware, run inference via TF Lite Micro interpreter
Validate: Compare on-device accuracy vs PC baseline

                        

TinyML Development Pipeline

flowchart LR
    subgraph PC["☁️ PC / Cloud"]
        A["📊 Collect
Training Data"] --> B["🧠 Train
Model"]
        B --> C["📦 Quantize
INT8/Float16"]
        C --> D["🔄 Convert
TF Lite Micro"]
    end
    subgraph MCU["🔌 Microcontroller"]
        E["⬇️ Deploy
C Array in Flash"]
        F["✅ Validate
On-device Accuracy"]
    end
    D --> E --> F
    F -.->|"Accuracy OK?"| G{"Pass?"}
    G -->|"Yes"| H["🚀 Ship"]
    G -->|"No"| A
    style A fill:#3B9797,stroke:#3B9797,color:#fff
    style H fill:#132440,stroke:#132440,color:#fff
    style G fill:#fff5f5,stroke:#BF092F,color:#132440

Power Optimization

Sleep Modes

STM32 Low-Power Modes

Mode	Current	Wake Sources	Resume Time
Run	10–100 mA	N/A	N/A
Sleep	1–10 mA	Any interrupt	<1 µs
Stop	2–100 µA	EXTI, RTC, LPUART	5–50 µs
Standby	0.3–3 µA	WKUP pin, RTC	50 µs (reset)
Shutdown	20–100 nA	WKUP pin only	Full reboot

// STM32 Stop Mode with RTC wakeup every 5 seconds
#include "stm32l4xx.h"

void enter_stop_mode(uint32_t seconds) {
    // Configure RTC wakeup timer
    RTC->WPR = 0xCA;
    RTC->WPR = 0x53;  // Unlock RTC
    RTC->CR &= ~RTC_CR_WUTE;  // Disable wakeup timer
    while (!(RTC->ISR & RTC_ISR_WUTWF));  // Wait

    RTC->WUTR = seconds - 1;  // Wakeup after N seconds
    RTC->CR |= RTC_CR_WUTIE | RTC_CR_WUTE | (4U << 0);  // 1 Hz clock

    // Configure STOP2 mode
    SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk;
    PWR->CR1 = (PWR->CR1 & ~PWR_CR1_LPMS) | PWR_CR1_LPMS_STOP2;

    __WFI();  // Wait for interrupt — enters STOP2

    // Wakes up here — reconfigure clocks
    SystemClock_Config();
}

// Usage: duty-cycle sensor reading
int main(void) {
    SystemInit();
    sensor_init();

    while (1) {
        float temp = read_temperature();
        transmit_data(temp);

        enter_stop_mode(5);  // Sleep 5 seconds → 2 µA
    }
}

Power Budgeting

                            
                            Power Budget Example (Battery-Powered Sensor Node):
                            MCU (STM32L4, Stop2): 2 µA × 99% = 1.98 µA average
MCU (Run, 10ms every 5s): 5 mA × 0.2% = 10 µA average
Sensor (BME280 forced mode): 0.1 µA sleep + 350 µA × 0.01% = 0.14 µA average
Radio (LoRa TX, 100ms every 60s): 120 mA × 0.17% = 200 µA average
Total: ~212 µA average
Battery life (2000 mAh CR123A): 2000 / 0.212 = ~9,400 hours = ~13 months

                        

Fault Tolerance

Watchdog Timers

// Independent Watchdog (IWDG) on STM32
#include "stm32f4xx.h"

void iwdg_init(uint32_t timeout_ms) {
    IWDG->KR  = 0x5555;                   // Enable register access
    IWDG->PR  = 4;                         // Prescaler /64 → 500 Hz
    IWDG->RLR = (timeout_ms * 500) / 1000; // Reload value
    IWDG->KR  = 0xCCCC;                   // Start watchdog
}

void iwdg_refresh(void) {
    IWDG->KR = 0xAAAA;  // Reset countdown
}

// Main loop must call iwdg_refresh() within timeout period
// If software hangs, watchdog resets the MCU
int main(void) {
    SystemInit();
    iwdg_init(2000);  // 2-second timeout

    while (1) {
        read_sensors();
        run_control();
        drive_actuators();

        iwdg_refresh();  // Feed the watchdog
    }
}

Sensor Redundancy

                            
                            Redundancy Strategies:
                            Dual Modular Redundancy (DMR): Two identical sensors. Detect disagreement but cannot determine which is faulty
Triple Modular Redundancy (TMR): Three sensors with majority voting. Tolerates one sensor failure
Dissimilar Redundancy: Different sensor technologies measuring same quantity (e.g., accelerometer + gyro for orientation). Protects against systematic sensor failures
Analytical Redundancy: Use mathematical models to estimate expected sensor values. Detect faults by comparing model predictions vs actual readings

                        

Conclusion & Next Steps

Advanced embedded systems go beyond basic sensor reading and actuator driving. Sensor fusion extracts maximum accuracy from imperfect sensors, TinyML enables on-device intelligence, power optimization extends battery life from days to years, and fault tolerance ensures systems operate safely under adverse conditions.

                            
                            Key Takeaways:
                            Complementary filters are simple and effective for IMU fusion (2 lines of code)
Kalman filters provide optimal estimation for linear Gaussian systems
TinyML enables ML inference on MCUs with as little as 16 KB RAM
Stop mode + duty-cycling can achieve <1 µA average current
Watchdog timers are essential for production embedded systems reliability

                        

In Part 10, we cover System Design & Architecture — PCB design for sensor systems, software architecture patterns, testing methodologies, and debugging tools.

Previous Part 8: Real-World Applications Next Part 10: System Design & Architecture

Cookie Consent

Part 9: Advanced Topics

Table of Contents

Sensor Fusion

Complementary Filter

Kalman Filter

Madgwick / Mahony Filters

TinyML

TensorFlow Lite for Microcontrollers

TinyML Platform Comparison

ML Workflow for Embedded

Power Optimization

Sleep Modes

STM32 Low-Power Modes

Power Budgeting

Fault Tolerance

Watchdog Timers

Sensor Redundancy

Conclusion & Next Steps

Cookie Consent

Part 9: Advanced Topics

Table of Contents

Sensor Fusion

Complementary Filter

Kalman Filter

Madgwick / Mahony Filters

TinyML

TensorFlow Lite for Microcontrollers

TinyML Platform Comparison

ML Workflow for Embedded

Power Optimization

Sleep Modes

STM32 Low-Power Modes

Power Budgeting

Fault Tolerance

Watchdog Timers

Sensor Redundancy

Conclusion & Next Steps

Continue the Series

Part 8: Real-World Applications

Part 10: System Design & Architecture

Part 11: IoT & Connected Systems