Back to Technology

x86 Assembly Series Part 7: Memory Addressing Modes

February 6, 2026 Wasil Zafar 28 min read

Master all x86/x64 memory addressing modes including immediate, register, direct, indirect, indexed, base+displacement, RIP-relative, and learn effective address calculation using LEA.

Table of Contents

  1. Addressing Modes Overview
  2. Immediate Addressing
  3. Register Addressing
  4. Direct Memory Addressing
  5. Indirect Addressing
  6. RIP-Relative Addressing
  7. LEA Instruction
  8. Effective Address Calculation

Addressing Modes Overview

Key Concept: Addressing modes determine how the CPU calculates the location of operands. x86 offers rich addressing capabilities that enable efficient array access, struct navigation, and position-independent code.

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 8
Development Environment, Tooling & Workflow
IDEs, debuggers, build tools, workflow setup
Assembly Language Fundamentals & Toolchain Setup
Syntax basics, assemblers, linkers, object files
x86 CPU Architecture Overview
Instruction pipeline, execution units, microarchitecture
Registers – Complete Deep Dive
GPRs, segment, control, flags, MSRs
Instruction Encoding & Binary Layout
Opcode bytes, ModR/M, SIB, prefixes, encoding schemes
NASM Syntax, Directives & Macros
Sections, labels, EQU, %macro, conditional assembly
Complete Assembler Comparison
NASM vs MASM vs GAS vs FASM, syntax differences
8
Memory Addressing Modes
Direct, indirect, indexed, base+displacement, RIP-relative
You Are Here
9
Stack Internals & Calling Conventions
Push/pop, stack frames, cdecl, System V ABI, fastcall
10
Control Flow & Procedures
Jumps, loops, conditionals, CALL/RET, function design
11
Integer, Bitwise & Arithmetic Operations
ADD, SUB, MUL, DIV, AND, OR, XOR, shifts, rotates
12
Floating Point & SIMD Foundations
x87 FPU, IEEE 754, SSE scalar, precision control
13
SIMD, Vectorization & Performance
SSE, AVX, AVX-512, data-parallel processing
14
System Calls, Interrupts & Privilege Transitions
INT, SYSCALL, IDT, ring transitions, exception handling
15
Debugging & Reverse Engineering
GDB, breakpoints, disassembly, binary analysis, IDA
16
Linking, Relocation & Loader Behavior
ELF/PE formats, symbol resolution, dynamic linking, GOT/PLT
17
x86-64 Long Mode & Advanced Features
64-bit extensions, RIP addressing, canonical addresses
18
Assembly + C/C++ Interoperability
Inline assembly, calling C from ASM, ABI compliance
19
Memory Protection & Security Concepts
DEP, ASLR, stack canaries, ROP, mitigations
20
Bootloaders & Bare-Metal Programming
BIOS/UEFI, MBR, real mode, protected mode transition
21
Kernel-Level Assembly
Context switching, interrupt handlers, TSS, GDT/LDT
22
Complete Emulator & Simulator Guide
QEMU, Bochs, instruction-level simulation, debugging VMs
23
Advanced Optimization & CPU Internals
Pipeline hazards, branch prediction, cache optimization, ILP
24
Real-World Assembly Projects
Shellcode, drivers, cryptography, signal processing
25
Assembly Mastery Capstone
Final project, comprehensive review, advanced techniques
Formula

General Effective Address Formula

Effective Address = Base + (Index × Scale) + Displacement

Where:
- Base: Any general-purpose register
- Index: Any GP register except RSP
- Scale: 1, 2, 4, or 8
- Displacement: 8, 16, or 32-bit signed constant

Immediate Addressing

mov rax, 42              ; Immediate value in instruction
mov rbx, 0xDEADBEEF      ; Hex immediate
add rcx, 100             ; Add immediate to register

Register Addressing

mov rax, rbx             ; Register to register
add rcx, rdx             ; Both operands are registers
xor rax, rax             ; Clear register (common idiom)

Direct Memory Addressing

Direct memory addressing uses a fixed address encoded directly in the instruction. Think of it like having a hardcoded street address—simple but inflexible.

Diagram showing direct memory addressing with a fixed address encoded in the instruction
Direct memory addressing — the operand address is encoded directly in the instruction, pointing to a fixed memory location.
section .data
    my_var dq 42           ; 8-byte variable
    buffer times 64 db 0   ; 64-byte buffer

section .text
    ; Direct addressing (32-bit mode style)
    mov eax, [my_var]      ; Load from fixed address
    mov [buffer], bl       ; Store byte at buffer

    ; In 64-bit mode, this becomes RIP-relative!
    ; Assembler converts [my_var] to [rip + offset_to_my_var]
64-bit Gotcha: In x86-64, "direct" addressing to labels compiles to RIP-relative. True absolute addresses require mov rax, QWORD [abs address] or using a register as base.

Real-World Use Case

section .data
    ; Global configuration values
    debug_mode   db 1
    buffer_size  dq 4096
    error_count  dd 0

section .text
global _start
_start:
    ; Check debug flag
    cmp byte [debug_mode], 0
    je .no_debug
    ; ... debug output ...
.no_debug:

    ; Increment error counter
    inc dword [error_count]

    ; Exit
    mov rax, 60
    xor edi, edi
    syscall
Save & Compile: direct_addressing.asm

Linux

nasm -f elf64 direct_addressing.asm -o direct_addressing.o
ld direct_addressing.o -o direct_addressing
./direct_addressing

macOS (change _start_main, use macOS syscall numbers)

nasm -f macho64 direct_addressing.asm -o direct_addressing.o
ld -macos_version_min 10.13 -e _main -static direct_addressing.o -o direct_addressing

Windows (use Win64 API instead of Linux syscalls)

nasm -f win64 direct_addressing.asm -o direct_addressing.obj
link /subsystem:console /entry:_start direct_addressing.obj /out:direct_addressing.exe

Indirect Addressing

Diagram showing indirect addressing modes: register indirect, base plus index, and base plus displacement
Indirect addressing modes — base register indirect, indexed (base + index × scale), and base + displacement for structs and arrays.

Base Register Indirect

mov rax, [rbx]           ; Load from address in RBX
mov [rcx], rdx           ; Store RDX at address in RCX

Indexed Addressing (Base + Index × Scale)

; Array access: array[i] where element size = 8 bytes
mov rax, [rbx + rcx*8]   ; rbx = base, rcx = index, 8 = scale

; Common scales:
; *1 = byte array
; *2 = word array (int16)
; *4 = dword array (int32, float)
; *8 = qword array (int64, double, pointers)

Base + Displacement

Displacement is a constant offset added to the base—perfect for struct field access. It's like knowing "the kitchen is 20 feet from the front door."

; C struct equivalent:
; struct Person {
;     char name[32];   ; offset 0
;     int age;         ; offset 32
;     double salary;   ; offset 36 (assume packed)
; };

; RBX points to struct Person
mov eax, [rbx + 32]       ; Load age field
mov [rbx + 36], xmm0      ; Store salary (as double)

; Stack local variables (negative displacement from RBP)
mov rax, [rbp - 8]        ; First local variable
mov [rbp - 16], rcx       ; Second local variable

Complete Addressing Form

The most general x86 addressing mode combines all elements:

Effective Address = Base + (Index × Scale) + Displacement

┌─────────────────────────────────────────────────────────┐
│ [base + index*scale + displacement]                    │
│                                                         │
│ Base:         RBX, RSI, RDI, R8-R15, RBP, RSP          │
│ Index:        RAX-RDI, R8-R15 (NOT RSP!)               │
│ Scale:        1, 2, 4, or 8                            │
│ Displacement: 8-bit or 32-bit signed constant          │
└─────────────────────────────────────────────────────────┘

Exercise: 2D Array Access

Access element matrix[row][col] in a 10×10 integer (4-byte) matrix:

section .bss
    matrix resd 100           ; 10x10 int array

section .text
    ; row in RCX, col in RDX
    ; address = matrix + (row * 10 + col) * 4
    
    lea rax, [rcx + rcx*4]    ; rax = row * 5
    lea rax, [rax*2]          ; rax = row * 10
    add rax, rdx              ; rax = row * 10 + col
    mov eax, [matrix + rax*4] ; Load matrix[row][col]

RIP-Relative Addressing (x86-64)

Diagram showing RIP-relative addressing in x86-64 with instruction pointer offset calculation
RIP-relative addressing in x86-64 — data is referenced as an offset from the current instruction pointer, enabling position-independent code.
Position-Independent Code: In 64-bit mode, RIP-relative addressing is the default for accessing global data, enabling position-independent executables (PIE).
section .data
    global_var dq 12345

section .text
    mov rax, [rel global_var]    ; RIP-relative (explicit)
    mov rax, [global_var]        ; RIP-relative (default in x64)

LEA Instruction

Load Effective Address computes an address but stores the address itself, not the memory contents. It's like getting directions to a restaurant instead of the food.

Comparison of LEA computing an address versus MOV loading memory contents
LEA vs MOV — LEA computes and stores the effective address itself, while MOV dereferences the address to load memory contents.
Key Insight: LEA doesn't access memory! It's a pure arithmetic operation that uses the address calculation hardware. This makes it perfect for fast multiply-add math.

Address Calculation Use

section .data
    array dq 10, 20, 30, 40, 50

section .text
    mov rcx, 3                      ; index = 3
    lea rax, [array + rcx*8]        ; rax = address of array[3]
    mov rbx, [rax]                  ; rbx = array[3] = 40

    ; Get address of struct field
    ; RDI points to struct, salary at offset 36
    lea rsi, [rdi + 36]             ; rsi = address of person->salary

Arithmetic "Trick"

LEA performs up to 2 adds and 1 shift in a single instruction—faster than separate operations:

; Multiply by constants using LEA
lea rax, [rbx + rbx]              ; rax = rbx * 2
lea rax, [rbx + rbx*2]            ; rax = rbx * 3
lea rax, [rbx*4]                  ; rax = rbx * 4
lea rax, [rbx + rbx*4]            ; rax = rbx * 5
lea rax, [rbx + rbx*8]            ; rax = rbx * 9

; Add two registers plus constant (3-operand addition!)
lea rax, [rbx + rcx + 10]         ; rax = rbx + rcx + 10

; Combine for complex expressions
; rax = rbx * 5 + 7
lea rax, [rbx + rbx*4 + 7]

LEA vs MOV Performance

Task Using MOV/ADD/IMUL Using LEA
rax = rbx * 5 imul rax, rbx, 5 (3 cycles) lea rax, [rbx+rbx*4] (1 cycle)
rax = rbx + rcx mov rax, rbx
add rax, rcx
lea rax, [rbx+rcx]
rax = rax + 1 inc rax (affects flags) lea rax, [rax+1] (no flags)
Pro Tip: Use LEA when you need the result without affecting FLAGS, or when combining operations. Compilers often use LEA for x = a + b + constant patterns.

Effective Address Calculation

When the CPU decodes a memory operand, dedicated Address Generation Units (AGUs) compute the effective address in parallel with other operations.

Diagram of CPU effective address calculation through AGU, TLB lookup, and cache hierarchy
Effective address calculation pipeline — the Address Generation Unit (AGU) computes the address, followed by TLB translation and cache lookup.

Hardware Pipeline

Instruction: mov rax, [rbx + rcx*4 + 16]

┌──────────────────────────────────────────────────────┐
│ 1. DECODE: Extract base=RBX, index=RCX, scale=4,    │
│            displacement=16                           │
├──────────────────────────────────────────────────────┤
│ 2. AGU CALCULATION:                                  │
│    ┌─────┐   ┌─────────┐   ┌────────────┐           │
│    │ RCX │──▶│ × 4     │──▶│            │           │
│    └─────┘   └─────────┘   │   ADDER    │──▶ EA     │
│    ┌─────┐                 │   CIRCUIT  │           │
│    │ RBX │────────────────▶│            │           │
│    └─────┘                 │            │           │
│    ┌─────┐                 │            │           │
│    │ 16  │────────────────▶│            │           │
│    └─────┘                 └────────────┘           │
├──────────────────────────────────────────────────────┤
│ 3. TLB LOOKUP: Virtual address → Physical address   │
├──────────────────────────────────────────────────────┤
│ 4. CACHE CHECK: L1 → L2 → L3 → RAM                  │
└──────────────────────────────────────────────────────┘

AGU Latency Considerations

Addressing Mode Typical AGU Latency Notes
[reg] 0-1 cycles Base only, fastest
[reg + disp] 0-1 cycles Simple addition
[reg + reg*scale] 1 cycle Requires SIB decode
[reg + reg*scale + disp] 1 cycle Full form
[rip + disp32] 1 cycle 64-bit RIP-relative

Cache and Memory Hierarchy Impact

Memory Access Latency (approximate):

┌─────────────┬───────────────┬────────────────┐
│ Level       │ Latency       │ Typical Size   │
├─────────────┼───────────────┼────────────────┤
│ Register    │ 0 cycles      │ 16 × 64-bit    │
│ L1 Cache    │ 4-5 cycles    │ 32-64 KB       │
│ L2 Cache    │ 12-14 cycles  │ 256 KB - 1 MB  │
│ L3 Cache    │ 30-50 cycles  │ 8-32 MB        │
│ Main RAM    │ 100-300 cycles│ GBs           │
│ SSD         │ ~10,000 cycles│ TBs           │
└─────────────┴───────────────┴────────────────┘

Exercise: Prefetch Optimization

When processing large arrays, use explicit prefetch hints:

; Process array with prefetch
process_array:
    mov rcx, 1000           ; array length
    xor rsi, rsi            ; index = 0
.loop:
    ; Prefetch data 64 bytes ahead (next cache line)
    prefetcht0 [rdi + rsi + 64]
    
    ; Process current element
    mov rax, [rdi + rsi]
    ; ... operations on rax ...
    mov [rdi + rsi], rax
    
    add rsi, 8
    dec rcx
    jnz .loop
    ret

Benchmark this with and without prefetch on arrays larger than L3 cache!