Back to Technology

ARM Assembly Part 25: Cross-Compilation & Build Systems

June 25, 2026 Wasil Zafar 21 min read

Cross-compiling for ARM64 on an x86 host seems straightforward until ABI mismatches, missing sysroots, and LLVM triple confusion bite you. This part covers correct toolchain selection for both bare-metal and Linux targets, CMake toolchain files, LLVM/Clang cross setup, and automated firmware generation pipelines.

Table of Contents

  1. Toolchain Taxonomy
  2. GCC Cross-Toolchain Setup
  3. Clang/LLVM Cross-Compilation
  4. Sysroots & ABI Selection
  5. CMake Toolchain Files
  6. Bare-Metal Firmware Build
  7. CI Pipeline for ARM64
  8. Case Study: Real-World Cross-Compilation
  9. Hands-On Exercises
  10. Conclusion & Next Steps

Toolchain Taxonomy

Series Overview: Part 25 of 28. Related: Part 20 (bare-metal kernel build), Part 24 (linkers/loaders), Part 26 (real systems that consume these builds).

ARM Assembly Mastery

Your 28-step learning path • Currently on Step 25
1
Architecture History & Core Concepts
ARMv1→v9, RISC philosophy
2
ARM32 Instruction Set Fundamentals
ARM vs Thumb, CPSR
3
AArch64 Registers, Addressing & Data Movement
X/W regs, addressing modes
4
Arithmetic, Logic & Bit Manipulation
ADD/SUB, bitfield, CLZ
5
Branching, Loops & Conditional Execution
Branch types, jump tables
6
Stack, Subroutines & AAPCS
Calling conventions
7
Memory Model, Caches & Barriers
Weak ordering, DMB/DSB/ISB
8
NEON & Advanced SIMD
Vector ops, intrinsics
9
SVE & SVE2 Scalable Vectors
Predicate regs, HPC/ML
10
Floating-Point & VFP Instructions
IEEE-754, rounding modes
11
Exception Levels, Interrupts & Vectors
EL0–EL3, GIC
12
MMU, Page Tables & Virtual Memory
Stage-1 translation
13
TrustZone & Security Extensions
Secure monitor, TF-A
14
Cortex-M Assembly & Bare-Metal
NVIC, SysTick, linker scripts
15
Cortex-A System Programming & Boot
EL3→EL1, MMU setup, PSCI
16
Apple Silicon & macOS ABI
ARM64e PAC, Mach-O, dyld
17
Inline Assembly & C Interop
Constraints, clobbers
18
Performance Profiling & Micro-Opt
Pipeline hazards, PMU
19
Reverse Engineering & Binary Analysis
ELF, disassembly, CFR
20
Building a Bare-Metal OS Kernel
Bootloader, UART, scheduler
21
ARM Microarchitecture Deep Dive
OOO pipelines, branch predict
22
Virtualization Extensions
EL2 hypervisor, stage-2, KVM
23
Debugging & Tooling Ecosystem
GDB, OpenOCD/JTAG, ETM/ITM
24
Linkers, Loaders & Binary Format Internals
ELF deep dive, relocations, PIC
25
Cross-Compilation & Build Systems
GCC/Clang toolchains, CMake, firmware gen
You Are Here
26
ARM in Real Systems
Android, FreeRTOS/Zephyr, U-Boot, TF-A
27
Security Research & Exploitation
ASLR, PAC attacks, ROP/JOP, kernel exploit
28
Emerging ARMv9 & Future Directions
MTE, SME, confidential compute, AI accel
Analogy — The Foreign Kitchen: Cross-compilation is like cooking a meal for someone with a completely different kitchen. Your local kitchen (x86 host) has electric stoves, metric measuring cups, and different power outlets. The recipient's kitchen (ARM target) has gas burners, imperial measuring cups, and 220V outlets. The toolchain is your recipe translator — it takes your ingredients (source code) and adjusts every instruction for the target kitchen. The sysroot is a sample of their ingredients and utensils, so you can test that your recipe fits their equipment. The target triple (aarch64-linux-gnu) is the address label specifying exactly which kitchen you're shipping to. A CMake toolchain file is a master instruction sheet that tells every sous chef (compiler, linker, assembler) exactly which translation rules to follow. And CI pipelines are like having a robot taste-tester that runs QEMU to try your recipe in a simulated version of their kitchen before you ship.
Toolchain Target Triples — Quick Reference:
aarch64-linux-gnu-gcc — Linux user-space with glibc (most common)
aarch64-linux-musl-gcc — Linux user-space with musl libc (static-friendly)
aarch64-none-elf-gcc — Bare-metal, no OS, no C library
aarch64-none-linux-gnu-gcc — ARM's own GNU toolchain distribution
clang --target=aarch64-linux-gnu — LLVM Clang, same target triples
Triple format: <arch>-<vendor>-<os>-<libc/ABI>

GCC Cross-Toolchain Setup

# ── Install GCC cross-toolchain on Ubuntu/Debian ──
sudo apt-get install -y \
    gcc-aarch64-linux-gnu \
    g++-aarch64-linux-gnu \
    binutils-aarch64-linux-gnu \
    libgcc-12-dev-arm64-cross \
    linux-libc-dev-arm64-cross

# Verify installation:
aarch64-linux-gnu-gcc --version
# aarch64-linux-gnu-gcc (Ubuntu 12.3.0-6ubuntu4) 12.3.0

# ── Build a simple program ──
cat > hello.c <<'EOF'
#include <stdio.h>
int main(void) { printf("Hello ARM64!\n"); return 0; }
EOF

aarch64-linux-gnu-gcc -O2 -march=armv8-a -o hello hello.c
file hello
# hello: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV)

# Run on x86 using QEMU user-mode emulation:
sudo apt-get install -y qemu-user
qemu-aarch64 -L /usr/aarch64-linux-gnu ./hello
# Hello ARM64!
# ── ARM Architecture-specific flags ──
# Target a specific Cortex core for optimal code generation:
aarch64-linux-gnu-gcc \
    -march=armv8.2-a+crypto+fp16+rcpc \   # ARMv8.2 + crypto extensions
    -mtune=cortex-a78 \                     # Tune scheduling for A78 pipeline
    -O3 -fvectorize \
    -o app app.c

# ARMv9 / Neoverse:
aarch64-linux-gnu-gcc \
    -march=armv9-a+sve2 \                  # ARMv9 + SVE2
    -mtune=neoverse-n2 \
    -O3 -o server_app server_app.c

# Check what features the compiler knows about:
aarch64-linux-gnu-gcc -march=armv8-a -Q --help=target | grep -E "march|mtune|mfpu"

Clang/LLVM Cross-Compilation

# ── LLVM Clang cross-compilation (single binary, multi-target) ──
sudo apt-get install -y clang lld llvm

# Clang uses --target= flag instead of a prefixed binary:
clang \
    --target=aarch64-linux-gnu \
    -march=armv8.2-a \
    --sysroot=/usr/aarch64-linux-gnu \
    -fuse-ld=lld \
    -O2 -o hello hello.c

# Key difference: Clang links against host libraries by default.
# Always explicitly set --sysroot for cross-compilation.

# Verify target:
llvm-readobj --file-headers hello | grep Machine
# Machine: AArch64
# ── Clang bare-metal (no libc, no OS) ──
clang \
    --target=aarch64-none-elf \
    -march=armv8-a \
    -nostdlib -nostartfiles -ffreestanding \
    -fuse-ld=lld \
    -T linker.ld \
    -O2 -o kernel.elf boot.S kernel.c

# Generate assembly listing to inspect output:
clang \
    --target=aarch64-none-elf \
    -march=armv8-a -O2 \
    -S -masm=att \     # AT&T syntax (optional, GAS-compatible)
    -o kernel.s kernel.c

Sysroots & ABI Selection

# ── Build a proper sysroot for cross-compilation ──
# Option 1: Use the system sysroot from the Debian cross-tools package
ls /usr/aarch64-linux-gnu/
# bin/  include/  lib/  lib64/  libexec/  share/

# Option 2: Extract sysroot from a Raspberry Pi OS rootfs image
sudo apt-get install -y debootstrap qemu-debootstrap
sudo qemu-debootstrap \
    --arch=arm64 \
    bookworm \
    /opt/aarch64-sysroot \
    http://deb.debian.org/debian

# Compile against custom sysroot:
aarch64-linux-gnu-gcc \
    --sysroot=/opt/aarch64-sysroot \
    -I/opt/aarch64-sysroot/usr/include \
    -L/opt/aarch64-sysroot/usr/lib/aarch64-linux-gnu \
    -O2 -o app app.c
# ── ABI variants on ARM64 ──
# AAPCS64 (default): hard-float, 64-bit pointers, LP64 data model
# ILP32 (AArch64-ILP32 / "ARM64_32"): 32-bit pointers on 64-bit ISA (Apple watchOS)
# Soft-float: no FP hardware (rare on AArch64, common on ARM32)

# Check data model of existing binary:
aarch64-linux-gnu-objdump -d app | head -5
aarch64-linux-gnu-readelf -h app | grep "Class\|Data\|OS"
# Class:                             ELF64
# Data:                              2's complement, little endian
# OS/ABI:                            UNIX - System V

CMake Toolchain Files

# aarch64-linux-gnu.cmake — CMake toolchain file for Linux target
cat > aarch64-linux-gnu.cmake <<'EOF'
# Cross-compilation toolchain for AArch64 Linux (GCC)
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR aarch64)

# Toolchain prefix
set(CROSS_COMPILE aarch64-linux-gnu-)
set(CMAKE_C_COMPILER   ${CROSS_COMPILE}gcc)
set(CMAKE_CXX_COMPILER ${CROSS_COMPILE}g++)
set(CMAKE_ASM_COMPILER ${CROSS_COMPILE}gcc)
set(CMAKE_OBJCOPY      ${CROSS_COMPILE}objcopy)
set(CMAKE_STRIP        ${CROSS_COMPILE}strip)

# Sysroot
set(CMAKE_SYSROOT /usr/aarch64-linux-gnu)
set(CMAKE_FIND_ROOT_PATH /usr/aarch64-linux-gnu)

# Search rules: headers/libs from target, programs from host
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)

# Architecture flags
set(CMAKE_C_FLAGS_INIT   "-march=armv8-a -mtune=cortex-a72")
set(CMAKE_CXX_FLAGS_INIT "-march=armv8-a -mtune=cortex-a72")
EOF

# Use the toolchain file:
cmake -B build \
    -DCMAKE_TOOLCHAIN_FILE=aarch64-linux-gnu.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    ..
cmake --build build -j$(nproc)
# aarch64-none-elf.cmake — CMake toolchain file for bare-metal
cat > aarch64-none-elf.cmake <<'EOF'
set(CMAKE_SYSTEM_NAME Generic)    # "Generic" = no OS
set(CMAKE_SYSTEM_PROCESSOR aarch64)

set(CMAKE_C_COMPILER   aarch64-none-elf-gcc)
set(CMAKE_CXX_COMPILER aarch64-none-elf-g++)
set(CMAKE_ASM_COMPILER aarch64-none-elf-gcc)

# Tell CMake linking bare-metal programs works without a test link step:
set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)

set(CMAKE_C_FLAGS_INIT   "-march=armv8-a -ffreestanding -nostdlib")
set(CMAKE_EXE_LINKER_FLAGS_INIT "-T${CMAKE_SOURCE_DIR}/linker.ld -nostartfiles")
EOF

Bare-Metal Firmware Build

# Complete Makefile for bare-metal AArch64 kernel + firmware image
cat > Makefile <<'EOF'
CROSS   := aarch64-none-elf-
CC      := $(CROSS)gcc
LD      := $(CROSS)ld
OBJCOPY := $(CROSS)objcopy
OBJDUMP := $(CROSS)objdump

ARCH_FLAGS := -march=armv8-a -mcpu=cortex-a72
OPT_FLAGS  := -O2 -pipe
WARN_FLAGS := -Wall -Wextra -Wshadow
C_FLAGS    := $(ARCH_FLAGS) $(OPT_FLAGS) $(WARN_FLAGS) \
              -ffreestanding -nostdlib -fno-stack-protector \
              -fno-common -ffunction-sections -fdata-sections

SRCS := boot.S uart.S vectors.S context.S kernel.c mm.c sched.c
OBJS := $(patsubst %.S,%.o,$(patsubst %.c,%.o,$(SRCS)))

.PHONY: all clean flash

all: kernel.elf kernel.bin kernel.lst

kernel.elf: $(OBJS) linker.ld
	$(LD) -T linker.ld --gc-sections -o $@ $(OBJS)

kernel.bin: kernel.elf
	$(OBJCOPY) -O binary $< $@

kernel.lst: kernel.elf
	$(OBJDUMP) -D $< > $@

%.o: %.c
	$(CC) $(C_FLAGS) -c -o $@ $<

%.o: %.S
	$(CC) $(C_FLAGS) -c -o $@ $<

clean:
	rm -f *.o kernel.elf kernel.bin kernel.lst

flash: kernel.bin
	openocd -f board/my_board.cfg -c "program kernel.bin 0x40000000 verify reset exit"
EOF
# linker.ld — minimal AArch64 bare-metal linker script
cat > linker.ld <<'EOF'
OUTPUT_FORMAT("elf64-littleaarch64")
OUTPUT_ARCH(aarch64)
ENTRY(_start)

MEMORY {
    RAM (rwx) : ORIGIN = 0x40000000, LENGTH = 128M
}

SECTIONS {
    . = 0x40000000;

    .text.boot : { *(.text.boot) }   /* boot.S must be first */
    .text       : { *(.text .text.*) }
    .rodata     : { *(.rodata .rodata.*) }

    . = ALIGN(4096);
    .data       : { *(.data .data.*) }

    . = ALIGN(16);
    .bss (NOLOAD) : {
        _bss_start = .;
        *(.bss .bss.*)
        *(COMMON)
        _bss_end = .;
    }

    . = ALIGN(4096);
    _heap_start = .;

    . = ORIGIN(RAM) + LENGTH(RAM) - 0x8000;
    _stack_top = .;    /* 32 KB stack at end of RAM */
}
EOF

CI Pipeline for ARM64

# .github/workflows/arm64-build.yml — GitHub Actions cross-build
cat > .github/workflows/arm64-build.yml <<'EOF'
name: ARM64 Build & Test

on: [push, pull_request]

jobs:
  build-linux:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4

      - name: Install cross-toolchain
        run: |
          sudo apt-get update -qq
          sudo apt-get install -y gcc-aarch64-linux-gnu qemu-user

      - name: Build
        run: |
          aarch64-linux-gnu-gcc -O2 -march=armv8-a \
              -o app src/main.c src/lib.c
          file app

      - name: Test with QEMU user-mode
        run: |
          qemu-aarch64 -L /usr/aarch64-linux-gnu ./app

  build-bare-metal:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4

      - name: Install bare-metal toolchain
        run: |
          sudo apt-get update -qq
          sudo apt-get install -y gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu \
              qemu-system-arm

      - name: Build kernel
        run: make -C firmware/ CROSS=aarch64-linux-gnu-

      - name: Boot test in QEMU
        run: |
          timeout 10 qemu-system-aarch64 \
              -machine virt -cpu cortex-a57 -m 64M \
              -kernel firmware/kernel.elf \
              -serial stdio -display none \
              -no-reboot
          echo "QEMU boot test passed"
EOF

Case Study: Cross-Compilation in the Real World

Industry Case Study

Yocto Project & the Automotive Industry

Modern cars contain 100+ ECUs (Electronic Control Units), many running on ARM Cortex-A and Cortex-R processors. The Yocto Project is the dominant build system for creating custom embedded Linux distributions, and cross-compilation is its foundation. Companies like Tesla, Toyota, and BMW use Yocto to cross-compile entire Linux distributions — kernel, drivers, middleware, and applications — from x86 build servers targeting AArch64 SoCs like the TI Jacinto (TDA4VM) and NXP S32G.

A typical automotive Yocto build cross-compiles 5,000+ packages from source, producing a complete bootable image. The build system manages multiple sysroots (one per target architecture), resolves cross-compilation dependencies automatically, and generates SDK tarballs so application developers can cross-compile without setting up the full Yocto environment. Build times that would take 48+ hours natively on ARM hardware complete in 2-3 hours on a 64-core x86 build server.

Historical Evolution

From Manual Makefiles to Modern Build Systems (1990–2024)

The history of ARM cross-compilation mirrors the evolution of embedded software engineering:

1990s — The Wild West: ARM's own SDT (Software Development Toolkit) ran on Windows only. Cross-compiling meant purchasing expensive proprietary tools from ARM Ltd or Metrowerks CodeWarrior. Build scripts were hand-written shell scripts with hardcoded paths. No package managers, no sysroot concept — developers manually copied headers from target boards.

2000s — GCC Matures: The ARM GCC port became production-quality. Buildroot (2001) and OpenEmbedded (2003, later becoming Yocto's foundation) automated cross-compilation of entire Linux distributions. ARM released their own GNU toolchain packages. CMake 2.6 (2008) introduced the CMAKE_TOOLCHAIN_FILE concept that standardized cross-compilation configuration.

2010s — LLVM Revolution: Clang's multi-target architecture eliminated the need for separate compiler binaries per target. A single clang binary could target AArch64, ARM32, x86, RISC-V, and more. Linaro (founded 2010) standardized ARM toolchain releases with quarterly GNU Toolchain packages. Docker containers made reproducible cross-compilation environments trivial to share.

2020s — Cloud-Scale Cross-Builds: GitHub Actions and GitLab CI made ARM64 cross-compilation a standard CI/CD step. Rust's cross tool automated Docker-based cross-compilation. The Arm GNU Toolchain replaced Linaro's releases. Meson and Bazel introduced declarative cross-compilation configuration that surpassed CMake's approach in clarity. QEMU user-mode emulation became fast enough for test suites, and Apple's Rosetta 2 proved that binary translation could be nearly transparent.

Hands-On Exercises

Exercise 1 — Beginner

Cross-Compile & Run with QEMU User-Mode

Install the GCC AArch64 cross-toolchain and QEMU user-mode emulator. Write a C program that prints your system's sizeof() for common types (int, long, pointer, double) and the endianness. Cross-compile it, run with qemu-aarch64, and compare output to native x86 compilation.

# Install tools
sudo apt-get install -y gcc-aarch64-linux-gnu qemu-user

# Create test program
cat > abi_probe.c <<'EOF'
#include <stdio.h>
#include <stdint.h>

int main(void) {
    printf("=== ABI Probe ===\n");
    printf("sizeof(int)     = %zu\n", sizeof(int));
    printf("sizeof(long)    = %zu\n", sizeof(long));
    printf("sizeof(void*)   = %zu\n", sizeof(void*));
    printf("sizeof(double)  = %zu\n", sizeof(double));
    printf("sizeof(size_t)  = %zu\n", sizeof(size_t));

    // Endianness check
    uint32_t val = 0x01020304;
    uint8_t *p = (uint8_t *)&val;
    printf("Endian: %s\n",
        p[0] == 0x04 ? "Little-endian" : "Big-endian");

    return 0;
}
EOF

# Cross-compile and run
aarch64-linux-gnu-gcc -O2 -o abi_probe_arm64 abi_probe.c
qemu-aarch64 -L /usr/aarch64-linux-gnu ./abi_probe_arm64

# Compare with native
gcc -O2 -o abi_probe_x86 abi_probe.c
./abi_probe_x86

Expected Learning: Both architectures use LP64 data model (long=8, pointer=8), little-endian. Differences appear in alignment and struct padding.

Exercise 2 — Intermediate

CMake Multi-Target Build System

Create a CMake project with a simple math library (add, multiply, dot product) that builds for both native x86 and cross-compiled AArch64. Write two toolchain files (GCC and Clang) and compare the generated assembly for the dot product function.

# Project structure
mkdir -p cross-cmake/{src,toolchains,build-x86,build-arm64-gcc,build-arm64-clang}

# Create CMakeLists.txt
cat > cross-cmake/CMakeLists.txt <<'EOF'
cmake_minimum_required(VERSION 3.20)
project(MathLib C)

add_library(mathlib STATIC src/mathlib.c)
target_include_directories(mathlib PUBLIC src/)

add_executable(bench src/bench.c)
target_link_libraries(bench mathlib)

# Generate assembly listing
add_custom_command(TARGET bench POST_BUILD
    COMMAND ${CMAKE_OBJDUMP} -d $<TARGET_FILE:bench> > bench.lst
    COMMENT "Generating disassembly listing"
)
EOF

# Build native, then cross-compile with both toolchains
cd cross-cmake
cmake -B build-x86 -DCMAKE_BUILD_TYPE=Release
cmake --build build-x86

cmake -B build-arm64-gcc \
    -DCMAKE_TOOLCHAIN_FILE=toolchains/aarch64-linux-gnu.cmake
cmake --build build-arm64-gcc

cmake -B build-arm64-clang \
    -DCMAKE_TOOLCHAIN_FILE=toolchains/aarch64-clang.cmake
cmake --build build-arm64-clang

# Compare generated assembly for dot_product():
diff <(grep -A30 'dot_product' build-arm64-gcc/bench.lst) \
     <(grep -A30 'dot_product' build-arm64-clang/bench.lst)

Expected Learning: GCC and Clang generate different instruction sequences for the same function. Clang often auto-vectorizes more aggressively with NEON/ASIMD instructions.

Exercise 3 — Advanced

Docker-Based Reproducible Cross-Compilation Pipeline

Create a complete Docker-based cross-compilation environment that builds a bare-metal AArch64 kernel image, runs it in QEMU system-mode inside the container, captures UART output, and verifies it passes a boot test. Package it as a GitHub Actions workflow.

# Dockerfile for reproducible ARM64 cross-compilation
cat > Dockerfile.arm64-builder <<'EOF'
FROM ubuntu:22.04 AS builder
ENV DEBIAN_FRONTEND=noninteractive

# Install cross-toolchain + QEMU
RUN apt-get update && apt-get install -y \
    gcc-aarch64-linux-gnu \
    binutils-aarch64-linux-gnu \
    qemu-system-arm \
    make cmake \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY . .

# Build bare-metal firmware
RUN make CROSS=aarch64-linux-gnu- kernel.bin

# Boot test: run QEMU, capture UART output, check for success string
RUN timeout 10 qemu-system-aarch64 \
    -machine virt -cpu cortex-a72 -m 64M \
    -kernel kernel.elf \
    -serial file:/tmp/uart.log \
    -display none -no-reboot || true \
    && grep -q "Boot successful" /tmp/uart.log \
    && echo "BOOT TEST PASSED"
EOF

# Build and run the Docker image
docker build -f Dockerfile.arm64-builder -t arm64-firmware .

# Extract artifacts
docker create --name extract arm64-firmware
docker cp extract:/build/kernel.bin ./kernel.bin
docker cp extract:/build/kernel.elf ./kernel.elf
docker rm extract

Expected Learning: Docker containers ensure every developer and CI runner uses identical toolchain versions, sysroots, and QEMU releases — eliminating "works on my machine" across the entire team.

Conclusion & Next Steps

A well-structured cross-compilation setup removes the "works on my machine" problem from ARM64 firmware and embedded Linux development. The CMake toolchain file approach is especially valuable: it encodes all target-specific knowledge in one place, making CI pipelines, Docker containers, and new developer onboarding trivially reproducible.

From the Yocto Project building entire automotive Linux distributions to Docker-based CI pipelines running QEMU boot tests, cross-compilation is the backbone of modern ARM development. The evolution from hand-written Makefiles with hardcoded paths to declarative toolchain files and multi-target Clang binaries reflects the broader maturation of embedded software engineering. Mastering these tools turns any x86 workstation into a full ARM development environment.

Next in the Series

In Part 26: ARM in Real Systems, we see how everything in this series manifests in production: Android's NDK and ART JIT, FreeRTOS and Zephyr RTOS task scheduling, U-Boot bootloader, and Trusted Firmware-A's role in the secure boot chain.

Technology