Back to Technology

ARM Assembly Part 12: MMU, Page Tables & Virtual Memory

March 26, 2026 Wasil Zafar 25 min read

The AArch64 MMU implements a four-level page table walk (or three/two levels with large granules) driven by TTBR0_EL1 and TTBR1_EL1, configured via TCR_EL1 and MAIR_EL1. This part walks through descriptor formats, memory attributes, TLB maintenance, and building a minimal identity-map at boot.

Table of Contents

  1. Introduction & Translation Overview
  2. TCR_EL1 — Translation Control
  3. MAIR_EL1 — Memory Attributes
  4. Page Table Descriptors
  5. Building an Identity Map at Boot
  6. Huge Pages & Contiguous Hint
  7. TLB Maintenance
  8. Hands-On Exercises
  9. Page Table Config Generator
  10. Conclusion & Next Steps

Introduction & Translation Overview

Series Overview: This is Part 12 of our 28-part ARM Assembly Mastery Series. In Part 11 we covered the exception model. Now we configure what every AArch64 OS must set up before running user space: the Memory Management Unit. Without the MMU there is no process isolation, no kernel/user separation, and no demand paging.

ARM Assembly Mastery

Your 28-step learning path • Currently on Step 12
1
Architecture History & Core Concepts
ARMv1→v9, RISC philosophy, profiles
2
ARM32 Instruction Set Fundamentals
ARM vs Thumb, registers, CPSR, barrel shifter
3
AArch64 Registers, Addressing & Data Movement
X/W regs, addressing modes, load/store pairs
4
Arithmetic, Logic & Bit Manipulation
ADD/SUB, bitfield extract/insert, CLZ
5
Branching, Loops & Conditional Execution
Branch types, link register, jump tables
6
Stack, Subroutines & AAPCS
Calling conventions, prologue/epilogue
7
Memory Model, Caches & Barriers
Weak ordering, DMB/DSB/ISB, TLB
8
NEON & Advanced SIMD
Vector ops, intrinsics, media processing
9
SVE & SVE2 Scalable Vector Extensions
Predicate regs, gather/scatter, HPC/ML
10
Floating-Point & VFP Instructions
IEEE-754, scalar FP, rounding modes
11
Exception Levels, Interrupts & Vector Tables
EL0–EL3, GIC, fault debugging
12
MMU, Page Tables & Virtual Memory
Stage-1 translation, permissions, huge pages
You Are Here
13
TrustZone & ARM Security Extensions
Secure monitor, world switching, TF-A
14
Cortex-M Assembly & Bare-Metal Embedded
NVIC, SysTick, linker scripts, low-power
15
Cortex-A System Programming & Boot
EL3→EL1 transitions, MMU setup, PSCI
16
Apple Silicon & macOS ABI
ARM64e PAC, Mach-O, dyld, perf counters
17
Inline Assembly, GCC/Clang & C Interop
Constraints, clobbers, compiler interaction
18
Performance Profiling & Micro-Optimization
Pipeline hazards, PMU, benchmarking
19
Reverse Engineering & ARM Binary Analysis
ELF, disassembly, CFR, iOS/Android quirks
20
Building a Bare-Metal OS Kernel
Bootloader, UART, scheduler, context switch
21
ARM Microarchitecture Deep Dive
OOO pipelines, reorder buffers, branch predict
22
Virtualization Extensions
EL2 hypervisor, stage-2 translation, KVM
23
Debugging & Tooling Ecosystem
GDB, OpenOCD/JTAG, ETM/ITM, QEMU
24
Linkers, Loaders & Binary Format Internals
ELF deep dive, relocations, PIC, crt0
25
Cross-Compilation & Build Systems
GCC/Clang toolchains, CMake, firmware gen
26
ARM in Real Systems
Android, FreeRTOS/Zephyr, U-Boot, TF-A
27
Security Research & Exploitation
ASLR, PAC attacks, ROP/JOP, kernel exploit
28
Emerging ARMv9 & Future Directions
MTE, SME, confidential compute, AI accel

AArch64 virtual address translation converts a 64-bit virtual address to a physical address through a multi-level page table walk. Two independent address spaces exist: TTBR0_EL1 covers user space (VA bits[63:56] = 0x00) and TTBR1_EL1 covers kernel space (VA bits[63:56] = 0xFF). Both ranges and granule choices are encoded in TCR_EL1. Memory attributes (cacheability, shareability, device type) come from MAIR_EL1, indexed by a 3-bit AttrIndx field in each page table descriptor.

Real-World Analogy — The Library Index Card System: Think of the MMU as a library card catalogue. Books (data) are stored on physical shelves at specific physical addresses. But patrons never navigate by shelf number — they look up a virtual reference (call number) in the catalogue. The catalogue has multiple levels: first you find the floor directory (L0 table), then the aisle (L1), then the shelf section (L2), then the exact shelf position (L3/page). Two separate catalogues (TTBR0 and TTBR1) exist — one for public readers (user space) and a restricted one for librarians (kernel space). Access permissions on each card (AP bits) determine who can read which books — and the PXN/UXN bits prevent patrons from executing the library code manuals. The TLB is the librarian's memory of recently looked-up books, avoiding the full catalogue walk every time.
Walkthrough Step-by-Step Address Translation (4KB Granule)

Consider translating virtual address 0x0000_0040_1234_5678 with T0SZ=25 and 4KB granule:

Step 1 — Top-bit check: Bits[63:56] = 0x00 → use TTBR0_EL1 (user space).

Step 2 — L0 index: VA[47:39] = 0b000000010 = 2 → read entry at TTBR0 + 2×8 bytes. This is a Table entry pointing to L1 table base.

Step 3 — L1 index: VA[38:30] = 0b000000000 = 0 → read L1[0]. If this is a 1GB Block entry, translation is done (PA = Block base | VA[29:0]). Otherwise, it's a Table entry pointing to L2.

Step 4 — L2 index: VA[29:21] = 0b010010001 = 145 → read L2[145]. If 2MB Block entry, PA = Block base | VA[20:0]. Otherwise, Table → L3.

Step 5 — L3 index: VA[20:12] = 0b001000101 = 69 → read L3[69]. This is a 4KB Page entry. Final PA = Page base | VA[11:0] (offset 0x678).

The total walk costs 4 sequential memory reads — in the worst case ~40 cycles without TLB caching. The TLB caches the final PA, making subsequent accesses to the same page a single-cycle lookup.

Kernel/User Isolation via TTBR0/TTBR1

ARM's split address space design provides natural kernel/user isolation. User processes use TTBR0_EL1 (lower VA range, 0x0000...) and can only access pages marked with AP=01 (EL1/EL0 RW) or AP=11 (EL1/EL0 RO). The kernel's TTBR1_EL1 (upper VA range, 0xFFFF...) maps with AP=00 (EL1 only). On context switch, the kernel changes TTBR0_EL1 to the new process's page table — TTBR1_EL1 stays constant since kernel mappings are shared. Post-Spectre, Linux on ARM64 implements KPTI (Kernel Page Table Isolation): when running in EL0, the kernel TTBR1 page tables are replaced with a minimal "trampoline" table that only maps the exception vectors. On EL1 entry, the handler quickly switches to the full kernel tables. This prevents Meltdown-style attacks where EL0 speculatively reads kernel memory through TTBR1.

TCR_EL1 — Translation Control

T0SZ / T1SZ Address Range

TCR_EL1.T0SZ defines the size of the TTBR0 address space as 2^(64-T0SZ) bytes. Linux typically uses T0SZ=25 giving a 512 GB user address space (0x0000_0000_0000_0000 to 0x0000_007F_FFFF_FFFF). TCR_EL1.T1SZ mirrors this for the kernel TTBR1 range. Addresses outside both ranges generate an address size fault.

Granule Selection (TG0/TG1)

TG0/TG1 select the page granule: 00=4KB, 01=64KB, 10=16KB. The 4KB granule is the most common (used by Linux) and gives 512-entry tables at each level. The 64KB granule reduces walk depth and TLB pressure for workloads with large contiguous allocations (common in HPC/NUMA systems like Cavium Thunder).

Walk Levels Derived from T0SZ

// TCR_EL1 setup for Linux-style config:
// T0SZ=25, T1SZ=25, 4KB granule, inner shareable, write-back cacheable
LDR  x0, =0x00000015B5194515
// Bit breakdown:
//  [5:0]  T0SZ = 25  (0x19)
//  [7]    EPD0 = 0   (TTBR0 walk enabled)
//  [9:8]  IRGN0 = 01 (inner WB WA cacheable)
//  [11:10] ORGN0 = 01 (outer WB WA cacheable)
//  [13:12] SH0  = 11  (inner shareable)
//  [15:14] TG0  = 00  (4KB granule)
//  [21:16] T1SZ = 25  (0x19)
//  [23]   EPD1 = 0
//  [29:28] TG1  = 10  (4KB granule for TTBR1, encoding differs!)
//  [35:32] IPS  = 100 (48-bit phys addr space)
MSR  TCR_EL1, x0
ISB

MAIR_EL1 — Memory Attributes

Attribute Index (AttrIndx)

MAIR_EL1 holds eight 8-bit attribute encodings (indices 0–7). Each page table descriptor references one via its 3-bit AttrIndx field. A typical kernel setup uses: index 0 = Device-nGnRnE (MMIO, strictly ordered), index 1 = Normal, Inner/Outer Write-Back, Read-Allocate, Write-Allocate (cached RAM), index 2 = Normal, Non-Cacheable (uncached DMA buffers).

// MAIR_EL1 encoding for three common attribute types
// Attr[0] = 0x00 = Device-nGnRnE (strictly-ordered MMIO)
// Attr[1] = 0xFF = Normal, Inner/Outer WB/WA/RA (cacheable)
// Attr[2] = 0x44 = Normal, Inner/Outer Non-Cacheable

LDR  x0, =0x000000000044FF00
MSR  MAIR_EL1, x0
ISB

Device vs Normal Memory

Device memory (Device-nGnRnE / nGnRE / nGRE / GRE) is used for memory-mapped I/O registers. Accesses to Device memory are always in program order relative to other Device accesses to the same peripheral, preventing the hardware from reordering reads or writes. Normal memory allows full out-of-order and speculative access; without explicit barriers it may be reordered by both the CPU and memory system.

Page Table Descriptors

Block vs Page vs Table Entries

A 4KB-granule descriptor is 8 bytes. Bits[1:0] identify the type: 00/01 = invalid, 11 = table (points to next-level table), 01 at leaf = page (4 KB), 01 at L1/L2 = block (1 GB or 2 MB respectively). Block entries eliminate one or two levels of walk, reducing TLB miss latency for large contiguous regions. Linux uses 2 MB huge pages extensively for the kernel image and large anonymous mappings.

AP, UXN, PXN, AF, SH bits

// 4KB page descriptor bit fields (simplified):
// [0]    Valid bit
// [1]    Type (1=block/page, 0=table at non-leaf)
// [4:2]  AttrIndx — index into MAIR_EL1 (bits[4:2])
// [5]    NS  — Non-Secure (EL3 only)
// [7:6]  AP  — Access Permissions
//               00 = EL1 RW, EL0 no access
//               01 = EL1/EL0 RW
//               10 = EL1 RO, EL0 no access
//               11 = EL1/EL0 RO
// [9:8]  SH  — Shareability (10=outer, 11=inner shareable)
// [10]   AF  — Access Flag (must be 1 or take AF fault on first access)
// [11]   nG  — Not Global (ASID-tagged if set)
// [47:12] OA — Output (physical) address[47:12]
// [51:48] (RES0 for 48-bit PA)
// [53]   PXN — Privileged Execute Never
// [54]   UXN — Unprivileged Execute Never

// Example: EL1 RW, EL0 no access, cached, inner shareable, AF=1
// AttrIndx=1 (Normal cached), AP=00, SH=11, AF=1
// Descriptor bits: AF|SH|AP|AttrIndx | OA | valid+type
// = 0x0060000000000703  (for OA=0x0, type=page)

MOV x0, #0x703             // bits[11:0]: AF=1, SH=11, AttrIndx=001, valid+type
ORR x0, x0, #(1 << 10)    // AF bit already included above
ORR x0, x0, x5             // OR in physical page address (x5 = page_phys_addr)

Building an Identity Map at Boot

Level-0 & Level-1 Tables

An identity map (VA == PA) is the simplest starting point for boot: the MMU is enabled while code is still running at physical addresses. With T0SZ=25 and 4KB granule, the walk is L0→L1→L2→L3 for 4KB pages, or terminates at L1 for 1 GB blocks. A minimal boot identity-map uses a single L0 table (4KB, 512×8-byte entries) pointing to a few L1 block entries.

// Minimal identity map: map 0x0–0x40000000 (1 GB) as Device,
// and 0x40000000–0x80000000 (1 GB) as Normal cached.
// Page table base in x9 (4KB aligned, zero-filled before use)

// L0[0] → L1 table
ADR   x0, l1_table
ORR   x0, x0, #3           // Table descriptor (valid=1, type=table)
STR   x0, [x9]             // L0 entry 0

// L1[0] = 1 GB block: PA 0x0, Device-nGnRnE (AttrIndx=0, AP=00, AF=1)
MOV   x1, #(1 << 10)       // AF=1
ORR   x1, x1, #(3 << 8)   // SH=11 (inner shareable)
ORR   x1, x1, #1           // valid=1, block (type=01 at L1)
STR   x1, [x10]            // l1_table[0]

// L1[1] = 1 GB block: PA 0x40000000, Normal cached (AttrIndx=1)
MOV   x2, #0x40000000      // physical base
ORR   x2, x2, #(1 << 10)  // AF
ORR   x2, x2, #(3 << 8)   // SH inner
ORR   x2, x2, #(1 << 2)   // AttrIndx=1
ORR   x2, x2, #1           // valid+block
STR   x2, [x10, #8]        // l1_table[1]

Enabling the MMU (SCTLR_EL1.M)

// Enable MMU: write TTBR0, TTBR1, TCR, MAIR, then set SCTLR_EL1.M
ADR   x0, l0_table
MSR   TTBR0_EL1, x0
ADR   x0, kernel_l0_table
MSR   TTBR1_EL1, x0

// Ensure all page table writes are visible before enabling MMU
DSB   ISH
TLBI  VMALLE1IS              // Invalidate all TLB entries
DSB   ISH
ISB

MRS   x0, SCTLR_EL1
ORR   x0, x0, #(1 << 0)    // M=1: enable MMU
ORR   x0, x0, #(1 << 2)    // C=1: data cache enable
ORR   x0, x0, #(1 << 12)   // I=1: instruction cache enable
MSR   SCTLR_EL1, x0
ISB                          // Instruction barrier after MMU enable

Huge Pages & Contiguous Hint

Beyond 2 MB L2 blocks, AArch64 supports a Contiguous Hint (bit[52] in descriptors): when 16 consecutive naturally-aligned L3 page descriptors all have Contiguous=1, the TLB may cache them as a single 64 KB entry, reducing TLB pressure for hot large-page regions. Linux uses this for anonymous huge pages on systems without 2 MB aligned physical memory. The kernel also supports 1 GB L1 block mappings for areas like device RAM on server SoCs where the full gigabyte is contiguous.

TLB Maintenance

TLBI Operations

// TLBI syntax: TLBI <op>{IS}{nXS}, {Xt}
// IS = inner-shareable broadcast (affects all cores in shareability domain)
// nXS = not extended shareability

// Invalidate all EL1 entries (broadcast to all PEs)
TLBI  VMALLE1IS

// Invalidate a specific VA in EL1, non-broadcast (current PE only)
// Xt = VA[55:12] shifted — bits[63:44]=ASID, bits[43:0]=VA>>12
MOV   x0, x5, LSR #12      // VA page number
TLBI  VAE1, x0             // Invalidate by VA in EL1

// Invalidate all entries for a specific ASID
MOV   x0, #(1234 << 48)    // ASID in bits[63:48]
TLBI  ASIDE1IS, x0

// Always use DSB before and after TLBI in critical sections
DSB   ISHST                 // Ensure preceding stores (page table writes) complete
TLBI  VMALLE1IS
DSB   ISH                  // Ensure TLBI completes before continuing
ISB

Break-Before-Make Rule

When changing a live page table entry from one valid mapping to another (e.g., changing permissions or PA), the ARM architecture requires Break-Before-Make: first write an invalid descriptor, execute a DSB, issue the appropriate TLBI, issue another DSB, then write the new valid descriptor. Skipping this can expose a window where both old and new mappings coexist in different cores' TLBs, causing undefined behaviour. For read-only promotions (adding a restriction) the rule is relaxed, but erring on the side of BBM is the safe default.

Key Insight: The Access Flag (AF, bit[10]) is deliberately omitted from descriptors at first mapping creation in some OS designs. The first access to the page causes an AF fault taken to EL1. The kernel's AF fault handler sets the AF bit and returns — this records which pages have been recently accessed for the page replacement algorithm. It's a hardware-assisted LRU approximation that costs zero kernel-side tracking overhead.

Hands-On Exercises

Practice Exercises: Test your understanding of ARM virtual memory and page tables.
Exercise 1 Identity Map with Permissions

Task: Extend the boot identity map to cover 4 regions: 0x00000000–0x3FFFFFFF as Device-nGnRnE (MMIO), 0x40000000–0x7FFFFFFF as Normal RW cached (kernel data), 0x80000000–0xBFFFFFFF as Normal RO+PXN cached (user code mapped as read-only), and 0xC0000000–0xFFFFFFFF as Normal RW Non-Cacheable (DMA buffers). Use L1 block entries and verify each region's attributes by reading back the descriptors in GDB.

Hint: You need four L1 entries. Set AttrIndx=0 for Device, AttrIndx=1 for cached Normal, AttrIndx=2 for non-cacheable. Use AP[7:6] to control read/write permissions.

Exercise 2 TLB Shootdown Sequence

Task: Write a function that implements the Break-Before-Make pattern: given a pointer to a live L3 page table entry and a new descriptor value, (1) store an invalid entry, (2) DSB ISH, (3) TLBI VAE1IS with the correct VA, (4) DSB ISH, (5) store the new valid entry, (6) DSB ISH + ISB. Test by changing a page from RW to RO while another thread reads it — verify the reader takes a permission fault after the change.

Hint: The TLBI operand format is: bits[63:48]=ASID, bits[43:0]=VA≡12. Zero the ASID field for global pages.

Exercise 3 Demand Paging via AF Faults

Task: Create page table entries with AF=0 for a 64KB region. Write an AF fault handler (ESR_EL1 EC=0b100100, ISS DFSC=0b001001) that sets the AF bit in the faulting descriptor, issues TLBI+DSB, and returns to the faulting instruction. Count how many AF faults you take when linearly scanning the region — verify it equals exactly 16 (one per 4KB page).

Hint: The FAR_EL1 gives the faulting VA. Walk the page table from TTBR0 to find the L3 entry, then OR in bit[10]. Remember to use atomic ORR if other cores might access the same descriptor.

Page Table Configuration Generator

Use this tool to plan and document your MMU configuration — address space layout, granule sizes, memory regions with attributes, and TLBI maintenance strategy. Download as Word, Excel, or PDF for your system design documentation.

Page Table Configuration Planner

Document your MMU setup: address ranges, attributes, and TLB strategy. Download as Word, Excel, or PDF.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Conclusion & Next Steps

We configured the AArch64 MMU from scratch: TCR_EL1 T0SZ/T1SZ/TG0/TG1/IPS, MAIR_EL1 attribute indices for Device and Normal memory, page table descriptor fields (AP, UXN, PXN, AF, SH, AttrIndx), the step-by-step address translation walk, kernel/user isolation via split TTBR0/TTBR1 address spaces and KPTI, building a boot identity map with L1 block entries, enabling the MMU via SCTLR_EL1.M with proper cache and barrier sequencing, huge pages and the Contiguous Hint, and TLBI operations with the Break-Before-Make rule.

Next in the Series

In Part 13: TrustZone & ARM Security Extensions, we explore how ARM hardware partitions the system into Secure and Non-Secure worlds — the secure monitor, world switching via SMC, and how Trusted Firmware-A (TF-A) implements EL3 services.

Technology