Bring-Up Strategy
Board bring-up is the systematic process of verifying a newly assembled PCB. Never power on a new board without a plan — a single short circuit can destroy expensive components. Follow a structured checklist from visual inspection to full functional test.
A Brief History of Hardware Debugging
flowchart TD
A["Visual Inspection"] --> B["Continuity Check"]
B --> C["Current-Limited
Power-On"]
C --> D{"Smoke?
Overcurrent?"}
D -->|Yes| E["Power OFF
Investigate"]
D -->|No| F["Measure Power Rails"]
F --> G{"All Rails
Within Spec?"}
G -->|No| E
G -->|Yes| H["Flash Firmware
via SWD"]
H --> I["Verify Clock &
Oscillator"]
I --> J["Test Peripherals
(UART, I2C, SPI)"]
J --> K["Full Functional
Test"]
Essential Equipment
| Equipment | Budget Pick | Pro Pick | Purpose |
|---|---|---|---|
| Bench PSU | Wanptek 30V/5A | Rigol DP832 | Current-limited power |
| Multimeter | UNI-T UT61E+ | Fluke 87V | Voltage, resistance, continuity |
| Oscilloscope | Rigol DS1054Z | Siglent SDS2104X+ | Waveforms, noise, timing |
| Logic analyzer | Saleae Logic 8 | Saleae Logic Pro 16 | Digital protocol decode |
| SWD debugger | ST-Link V2 clone | J-Link EDU | Firmware flash & debug |
| Microscope | USB microscope | Amscope stereo | Solder joint inspection |
Smoke Test
Visual Inspection Checklist
- Solder bridges between IC pins (especially QFP/QFN)
- Missing components or tombstoned passives
- Reversed polarity on diodes, capacitors, ICs
- Correct component values (check markings vs BOM)
- Clean solder joints (shiny, concave fillets)
- No flux residue on high-impedance nodes
First Power-On Procedure
# First power-on checklist with current monitoring
import time
# Bench power supply settings
VOLTAGE_LIMIT = 5.0 # V — match your input voltage
CURRENT_LIMIT = 0.100 # A — start very low (100mA)
# Expected quiescent currents (no firmware running)
expected_currents = {
"STM32F411 (sleep)": 0.002, # ~2mA quiescent
"LDO (3.3V reg)": 0.005, # ~5mA quiescent
"LEDs (off)": 0.000, # 0mA when off
"Total expected": 0.010, # ~10mA quiescent
}
print("FIRST POWER-ON PROCEDURE")
print("=" * 50)
print(f"Step 1: Set bench PSU to {VOLTAGE_LIMIT}V, {CURRENT_LIMIT*1000:.0f}mA limit")
print(f"Step 2: Connect DMM in series to monitor current")
print(f"Step 3: Power ON — observe current immediately")
print()
for component, current in expected_currents.items():
print(f" {component:>25}: {current*1000:>6.1f} mA")
print(f"\nStep 4: If current > {CURRENT_LIMIT*1000:.0f}mA → POWER OFF immediately")
print(f"Step 5: If current stable at ~10mA → proceed to rail checks")
print(f"Step 6: Touch IC packages — nothing should be hot")
print(f"\n⚠ If any component is hot, power off and inspect!")
FIRST POWER-ON PROCEDURE
==================================================
Step 1: Set bench PSU to 5.0V, 100mA limit
Step 2: Connect DMM in series to monitor current
Step 3: Power ON — observe current immediately
STM32F411 (sleep): 2.0 mA
LDO (3.3V reg): 5.0 mA
LEDs (off): 0.0 mA
Total expected: 10.0 mA
Step 4: If current > 100mA → POWER OFF immediately
Step 5: If current stable at ~10mA → proceed to rail checks
Step 6: Touch IC packages — nothing should be hot
⚠ If any component is hot, power off and inspect!
Power Rail Verification
Measuring Each Rail
# Power rail verification log
rails = [
{"name": "VBUS (USB 5V)", "expected": 5.00, "tolerance": 0.25, "test_point": "TP1"},
{"name": "VCC_3V3", "expected": 3.30, "tolerance": 0.10, "test_point": "TP2"},
{"name": "VCC_1V8 (core)", "expected": 1.80, "tolerance": 0.05, "test_point": "TP3"},
{"name": "VREF_ADC", "expected": 3.30, "tolerance": 0.03, "test_point": "TP4"},
{"name": "VBAT (backup)", "expected": 3.00, "tolerance": 0.30, "test_point": "TP5"},
]
print("Power Rail Verification")
print("=" * 75)
print(f"{'Rail':>18} | {'Expected':>9} | {'Tolerance':>10} | {'TP':>5} | {'Status':>8}")
print("-" * 75)
# Simulated measurements (replace with actual readings)
measured = [5.02, 3.28, 1.79, 3.31, 2.98]
for rail, meas in zip(rails, measured):
exp = rail["expected"]
tol = rail["tolerance"]
within = abs(meas - exp) <= tol
status = "✓ PASS" if within else "✗ FAIL"
print(f"{rail['name']:>18} | {exp:>7.2f} V | ±{tol:>6.2f} V | {rail['test_point']:>5} | {status:>8}")
if within:
print(f"{'':>18} Measured: {meas:.3f} V (Δ = {abs(meas-exp)*1000:.1f} mV)")
print("\nTip: Measure with scope for AC ripple (DC offset + AC coupling)")
Power Rail Verification
===========================================================================
Rail | Expected | Tolerance | TP | Status
---------------------------------------------------------------------------
VBUS (USB 5V) | 5.00 V | ± 0.25 V | TP1 | ✓ PASS
Measured: 5.020 V (Δ = 20.0 mV)
VCC_3V3 | 3.30 V | ± 0.10 V | TP2 | ✓ PASS
Measured: 3.280 V (Δ = 20.0 mV)
VCC_1V8 (core) | 1.80 V | ± 0.05 V | TP3 | ✓ PASS
Measured: 1.790 V (Δ = 10.0 mV)
VREF_ADC | 3.30 V | ± 0.03 V | TP4 | ✓ PASS
Measured: 3.310 V (Δ = 10.0 mV)
VBAT (backup) | 3.00 V | ± 0.30 V | TP5 | ✓ PASS
Measured: 2.980 V (Δ = 20.0 mV)
Tip: Measure with scope for AC ripple (DC offset + AC coupling)
Toyota Unintended Acceleration (2009–2011) — The Debugging Investigation
Between 2009 and 2011, Toyota recalled 9 million vehicles for unintended acceleration that caused 89 deaths. While Toyota blamed floor mats and sticky pedals, NASA engineers were brought in to investigate the electronic throttle control (ETC) system — one of the most thorough embedded hardware debugging investigations in history.
The investigation process: NASA’s team spent 10 months performing: (1) board-level X-ray and visual inspection of 58 ECU boards, (2) power rail analysis under electromagnetic interference (EMI), (3) JTAG boundary scan of the Renesas V850 MCU to verify memory contents, (4) oscilloscope capture of throttle position sensor signals during simulated fault conditions, and (5) firmware source code review of 280,000 lines of C code.
What they found: While NASA couldn’t definitively prove a software bug caused the unintended acceleration, independent experts later identified: single-bit RAM corruption risk (no ECC on the V850), task stack overflow vulnerabilities, and insufficient watchdog coverage. The A/D converter reading the throttle position had no hardware redundancy — a single corrupted ADC reading could command full throttle.
Bring-up lesson: During board bring-up, always verify: (1) ADC readings under noise and EMI conditions, (2) watchdog timer fires correctly, (3) critical signals have hardware redundancy (dual ADC channels, voting logic), (4) firmware handles sensor faults gracefully. A proper bring-up would have caught the missing ADC redundancy on the very first board.
Ripple & Noise Measurement
Firmware Flashing
SWD Connection
# Verify SWD connection with OpenOCD
# Connect ST-Link V2 to board's SWD header
# SWDIO → Pin 2, SWCLK → Pin 4, GND → Pin 3, VCC → Pin 1
# Test connection (STM32F411)
openocd -f interface/stlink.cfg \
-f target/stm32f4x.cfg \
-c "init; targets; exit"
# Expected output:
# Info : STLINK V2J37S7 (API v2) VID:PID 0483:3748
# Info : Target voltage: 3.294346
# Info : stm32f4x.cpu: Cortex-M4 r0p1 processor detected
# Info : stm32f4x.cpu: target state: halted
OpenOCD Flash & Debug
# Flash firmware via OpenOCD + GDB
# Step 1: Start OpenOCD server
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg &
# Step 2: Connect with GDB
arm-none-eabi-gdb firmware.elf
# In GDB:
# (gdb) target remote :3333
# (gdb) monitor reset halt
# (gdb) load
# (gdb) monitor reset run
# Alternative: Flash directly with OpenOCD
openocd -f interface/stlink.cfg \
-f target/stm32f4x.cfg \
-c "program firmware.bin verify reset exit 0x08000000"
# Using STM32CubeProgrammer CLI
STM32_Programmer_CLI -c port=SWD freq=4000 \
-w firmware.bin 0x08000000 -v -rst
Debug Instruments
Oscilloscope Techniques
Top 5 Scope Measurements for Bring-Up
- Power rail ripple: AC-coupled, 20MHz BW limit, short ground lead. Verify <50mV p-p.
- Clock signal: Check oscillator output frequency, duty cycle (target 45–55%), rise/fall times.
- UART waveform: Verify baud rate by measuring bit period (e.g., 115200 baud = 8.68µs/bit).
- I2C timing: Check SCL frequency, SDA setup/hold times, pull-up strength (rise time <300ns for 400kHz).
- SPI signals: Verify SCLK frequency, CPOL/CPHA mode, CS timing vs first clock edge.
Logic Analyzer Protocol Decode
# Logic analyzer capture analysis — UART decode example
# Using sigrok/PulseView or Saleae Logic
# UART timing calculator
baud_rates = [9600, 115200, 460800, 921600, 1000000]
print("UART Bit Timing Reference")
print("=" * 55)
print(f"{'Baud Rate':>12} | {'Bit Period':>12} | {'Byte Time':>12} | {'1KB Time':>10}")
print("-" * 55)
for baud in baud_rates:
bit_us = 1e6 / baud # microseconds per bit
byte_us = 10 * bit_us # 10 bits per byte (start + 8 data + stop)
kb_ms = 1024 * byte_us / 1000 # ms to send 1KB
print(f"{baud:>12,} | {bit_us:>9.2f} µs | {byte_us:>9.1f} µs | {kb_ms:>7.1f} ms")
print("\nTip: Measure bit period on scope to verify actual baud rate")
print("Tip: Use logic analyzer protocol decoder for I2C/SPI/UART")
UART Bit Timing Reference
=======================================================
Baud Rate | Bit Period | Byte Time | 1KB Time
-------------------------------------------------------
9,600 | 104.17 µs | 1041.7 µs | 1066.7 ms
115,200 | 8.68 µs | 86.8 µs | 88.9 ms
460,800 | 2.17 µs | 21.7 µs | 22.2 ms
921,600 | 1.09 µs | 10.9 µs | 11.1 ms
1,000,000 | 1.00 µs | 10.0 µs | 10.2 ms
Tip: Measure bit period on scope to verify actual baud rate
Tip: Use logic analyzer protocol decoder for I2C/SPI/UART
Mars Pathfinder Priority Inversion Bug (1997) — Debugging from 190 Million km Away
On July 4, 1997, NASA’s Mars Pathfinder lander successfully touched down on Mars. But days later, the system began randomly resetting — losing data and interrupting the Sojourner rover’s mission. The reset occurred approximately every few hours with no predictable pattern.
Remote debugging: With the lander 190 million km away, engineers had to debug entirely through telemetry logs and a replica system on Earth. Using the replica, they reproduced the fault: a classic priority inversion in VxWorks RTOS. A low-priority meteorological task held a mutex on the system bus, while a high-priority data distribution task waited for that mutex. A medium-priority communications task would then preempt the low-priority task — preventing it from releasing the mutex. The watchdog timer eventually fired, resetting the entire system.
The fix: Engineers uploaded a 6KB patch (yes, over radio to Mars) that enabled VxWorks’ priority inheritance protocol on the shared mutex. The low-priority task would temporarily inherit the high-priority task’s priority, preventing preemption. The flag MUT_INHERIT had existed in VxWorks all along — it just wasn’t enabled during initial configuration.
Bring-up lesson: During firmware bring-up, always stress-test RTOS configurations under load. Run all tasks simultaneously at high frequency and monitor for watchdog resets, stack overflows, and mutex timeouts. The Pathfinder bug was easily reproducible on the ground — it just wasn’t tested under realistic concurrent load.
Multimeter Diagnostics
- Continuity: Verify GND connections between all ground pins. Check power rail continuity from connector to IC supply pins.
- Resistance: Measure between VCC and GND — should be >1kΩ (low resistance = short circuit). Check pull-up/pull-down values.
- Diode check: Verify protection diodes in correct orientation. Check ESD diode clamp voltages.
- Voltage: DC rails with 4.5-digit accuracy. Use relative mode to measure small differences between rails.
Bring-Up Log Tool
Generate a structured bring-up checklist and test log for your board, documenting each step from visual inspection through functional testing.
Board Bring-Up Log
Document your board bring-up procedure and results. Download as Word, Excel, or PDF.
Exercises
Exercise 1: Diagnose a Failed Bring-Up
You power on a new STM32F4 board with a current-limited supply at 5V/100mA. The supply immediately hits the 100mA current limit and the voltage drops to 2.1V. The board draws 100mA even before firmware runs.
- What are the three most likely causes? (Think: shorts, wrong components, assembly defects)
- Describe the exact measurement sequence you would use to isolate the fault (which test points, which instruments)
- You find the 3.3V rail measures 0.15V. What does this tell you?
- Using a thermal camera, you find the LDO regulator is at 95°C. The LDO is rated for 150mA with 5V→3.3V dropout. What is the most likely root cause?
Hint: A 0.15V rail means the LDO is in current-limit or short-circuit protection. Check for solder bridges on the 3.3V net, especially near decoupling capacitors. The LDO thermal dissipation is P = (5.0 - 0.15) × I_SC — at 95°C, calculate the current.
Exercise 2: UART Debug Investigation
You’ve flashed firmware that should print “Hello World” at 115200 baud on UART2 (PA2/PA3). Your USB-UART adapter shows garbage characters in the terminal.
- List all possible causes in order of likelihood (baud rate, pin mapping, voltage levels, clock source)
- You measure the TX pin with a scope and see: bit period = 9.6µs, voltage swings 0–3.3V. Is the baud rate correct? Calculate the actual baud rate.
- The 9.6µs bit period suggests what clock misconfiguration? (HSI vs HSE, PLL settings)
- Write the debug steps to verify the HSE oscillator is running correctly
Hint: 9.6µs/bit = 104,167 baud, not 115,200. The ratio is 115200/104167 = 1.106, which is very close to 8/7.2 = 1.111. This suggests the MCU is running from the 8MHz HSI internal oscillator instead of an expected HSE crystal, causing UART clock to be off by ~10%.
Exercise 3: Power Rail Noise Analysis
Your 3.3V rail measures 3.31V DC with a multimeter (✓ PASS). But the ADC readings on PA0 (connected to a voltage divider from a potentiometer) show random ±50 LSB fluctuations on a 12-bit ADC.
- Convert the ±50 LSB fluctuation to millivolts (VREF = 3.3V, 12-bit = 4096 steps)
- Set up your oscilloscope to measure the 3.3V rail ripple. What coupling mode, bandwidth limit, and probe grounding technique should you use?
- You find 120mV peak-to-peak ripple at 500kHz on the 3.3V rail. What is the most likely source of this noise?
- Propose three hardware fixes to reduce the ADC noise to ±5 LSB
Hint: ±50 LSB = ±(50 × 3.3/4096) = ±40.3mV. The 500kHz ripple frequency is characteristic of a switching regulator’s switching frequency. Fixes: add ferrite bead + capacitor LC filter on VDDA, add a separate LDO for the ADC reference, increase decoupling (10µF + 100nF + 10nF) close to VDDA pin.
Conclusion & Next Steps
Board bring-up is where your design meets reality. By following a systematic approach — visual inspection, current-limited power-on, rail verification, firmware flash, and peripheral testing — you’ll catch issues early and build confidence in your hardware. Document everything in your bring-up log for future revisions.
Next in the Series
In Part 10: Embedded Firmware Integration, we’ll write production firmware using STM32 HAL, configure peripherals, set up FreeRTOS, and implement communication protocols.