Back to Technology

x86 Assembly Series Part 7: Memory Addressing Modes

February 6, 2026 Wasil Zafar 28 min read

Master all x86/x64 memory addressing modes including immediate, register, direct, indirect, indexed, base+displacement, RIP-relative, and learn effective address calculation using LEA.

Table of Contents

  1. Addressing Modes Overview
  2. Immediate Addressing
  3. Register Addressing
  4. Direct Memory Addressing
  5. Indirect Addressing
  6. RIP-Relative Addressing
  7. LEA Instruction
  8. Effective Address Calculation

Addressing Modes Overview

Key Concept: Addressing modes determine how the CPU calculates the location of operands. x86 offers rich addressing capabilities that enable efficient array access, struct navigation, and position-independent code.
Formula

General Effective Address Formula

Effective Address = Base + (Index × Scale) + Displacement

Where:
- Base: Any general-purpose register
- Index: Any GP register except RSP
- Scale: 1, 2, 4, or 8
- Displacement: 8, 16, or 32-bit signed constant

Immediate Addressing

mov rax, 42              ; Immediate value in instruction
mov rbx, 0xDEADBEEF      ; Hex immediate
add rcx, 100             ; Add immediate to register

Register Addressing

mov rax, rbx             ; Register to register
add rcx, rdx             ; Both operands are registers
xor rax, rax             ; Clear register (common idiom)

Direct Memory Addressing

Direct memory addressing uses a fixed address encoded directly in the instruction. Think of it like having a hardcoded street address—simple but inflexible.

section .data
    my_var dq 42           ; 8-byte variable
    buffer times 64 db 0   ; 64-byte buffer

section .text
    ; Direct addressing (32-bit mode style)
    mov eax, [my_var]      ; Load from fixed address
    mov [buffer], bl       ; Store byte at buffer

    ; In 64-bit mode, this becomes RIP-relative!
    ; Assembler converts [my_var] to [rip + offset_to_my_var]
64-bit Gotcha: In x86-64, "direct" addressing to labels compiles to RIP-relative. True absolute addresses require mov rax, QWORD [abs address] or using a register as base.

Real-World Use Case

section .data
    ; Global configuration values
    debug_mode   db 1
    buffer_size  dq 4096
    error_count  dd 0

section .text
global _start
_start:
    ; Check debug flag
    cmp byte [debug_mode], 0
    je .no_debug
    ; ... debug output ...
.no_debug:

    ; Increment error counter
    inc dword [error_count]

    ; Exit
    mov rax, 60
    xor edi, edi
    syscall
Save & Compile: direct_addressing.asm

Linux

nasm -f elf64 direct_addressing.asm -o direct_addressing.o
ld direct_addressing.o -o direct_addressing
./direct_addressing

macOS (change _start_main, use macOS syscall numbers)

nasm -f macho64 direct_addressing.asm -o direct_addressing.o
ld -macos_version_min 10.13 -e _main -static direct_addressing.o -o direct_addressing

Windows (use Win64 API instead of Linux syscalls)

nasm -f win64 direct_addressing.asm -o direct_addressing.obj
link /subsystem:console /entry:_start direct_addressing.obj /out:direct_addressing.exe

Indirect Addressing

Base Register Indirect

mov rax, [rbx]           ; Load from address in RBX
mov [rcx], rdx           ; Store RDX at address in RCX

Indexed Addressing (Base + Index × Scale)

; Array access: array[i] where element size = 8 bytes
mov rax, [rbx + rcx*8]   ; rbx = base, rcx = index, 8 = scale

; Common scales:
; *1 = byte array
; *2 = word array (int16)
; *4 = dword array (int32, float)
; *8 = qword array (int64, double, pointers)

Base + Displacement

Displacement is a constant offset added to the base—perfect for struct field access. It's like knowing "the kitchen is 20 feet from the front door."

; C struct equivalent:
; struct Person {
;     char name[32];   ; offset 0
;     int age;         ; offset 32
;     double salary;   ; offset 36 (assume packed)
; };

; RBX points to struct Person
mov eax, [rbx + 32]       ; Load age field
mov [rbx + 36], xmm0      ; Store salary (as double)

; Stack local variables (negative displacement from RBP)
mov rax, [rbp - 8]        ; First local variable
mov [rbp - 16], rcx       ; Second local variable

Complete Addressing Form

The most general x86 addressing mode combines all elements:

Effective Address = Base + (Index × Scale) + Displacement

┌─────────────────────────────────────────────────────────┐
│ [base + index*scale + displacement]                    │
│                                                         │
│ Base:         RBX, RSI, RDI, R8-R15, RBP, RSP          │
│ Index:        RAX-RDI, R8-R15 (NOT RSP!)               │
│ Scale:        1, 2, 4, or 8                            │
│ Displacement: 8-bit or 32-bit signed constant          │
└─────────────────────────────────────────────────────────┘

Exercise: 2D Array Access

Access element matrix[row][col] in a 10×10 integer (4-byte) matrix:

section .bss
    matrix resd 100           ; 10x10 int array

section .text
    ; row in RCX, col in RDX
    ; address = matrix + (row * 10 + col) * 4
    
    lea rax, [rcx + rcx*4]    ; rax = row * 5
    lea rax, [rax*2]          ; rax = row * 10
    add rax, rdx              ; rax = row * 10 + col
    mov eax, [matrix + rax*4] ; Load matrix[row][col]

RIP-Relative Addressing (x86-64)

Position-Independent Code: In 64-bit mode, RIP-relative addressing is the default for accessing global data, enabling position-independent executables (PIE).
section .data
    global_var dq 12345

section .text
    mov rax, [rel global_var]    ; RIP-relative (explicit)
    mov rax, [global_var]        ; RIP-relative (default in x64)

LEA Instruction

Load Effective Address computes an address but stores the address itself, not the memory contents. It's like getting directions to a restaurant instead of the food.

Key Insight: LEA doesn't access memory! It's a pure arithmetic operation that uses the address calculation hardware. This makes it perfect for fast multiply-add math.

Address Calculation Use

section .data
    array dq 10, 20, 30, 40, 50

section .text
    mov rcx, 3                      ; index = 3
    lea rax, [array + rcx*8]        ; rax = address of array[3]
    mov rbx, [rax]                  ; rbx = array[3] = 40

    ; Get address of struct field
    ; RDI points to struct, salary at offset 36
    lea rsi, [rdi + 36]             ; rsi = address of person->salary

Arithmetic "Trick"

LEA performs up to 2 adds and 1 shift in a single instruction—faster than separate operations:

; Multiply by constants using LEA
lea rax, [rbx + rbx]              ; rax = rbx * 2
lea rax, [rbx + rbx*2]            ; rax = rbx * 3
lea rax, [rbx*4]                  ; rax = rbx * 4
lea rax, [rbx + rbx*4]            ; rax = rbx * 5
lea rax, [rbx + rbx*8]            ; rax = rbx * 9

; Add two registers plus constant (3-operand addition!)
lea rax, [rbx + rcx + 10]         ; rax = rbx + rcx + 10

; Combine for complex expressions
; rax = rbx * 5 + 7
lea rax, [rbx + rbx*4 + 7]

LEA vs MOV Performance

Task Using MOV/ADD/IMUL Using LEA
rax = rbx * 5 imul rax, rbx, 5 (3 cycles) lea rax, [rbx+rbx*4] (1 cycle)
rax = rbx + rcx mov rax, rbx
add rax, rcx
lea rax, [rbx+rcx]
rax = rax + 1 inc rax (affects flags) lea rax, [rax+1] (no flags)
Pro Tip: Use LEA when you need the result without affecting FLAGS, or when combining operations. Compilers often use LEA for x = a + b + constant patterns.

Effective Address Calculation

When the CPU decodes a memory operand, dedicated Address Generation Units (AGUs) compute the effective address in parallel with other operations.

Hardware Pipeline

Instruction: mov rax, [rbx + rcx*4 + 16]

┌──────────────────────────────────────────────────────┐
│ 1. DECODE: Extract base=RBX, index=RCX, scale=4,    │
│            displacement=16                           │
├──────────────────────────────────────────────────────┤
│ 2. AGU CALCULATION:                                  │
│    ┌─────┐   ┌─────────┐   ┌────────────┐           │
│    │ RCX │──▶│ × 4     │──▶│            │           │
│    └─────┘   └─────────┘   │   ADDER    │──▶ EA     │
│    ┌─────┐                 │   CIRCUIT  │           │
│    │ RBX │────────────────▶│            │           │
│    └─────┘                 │            │           │
│    ┌─────┐                 │            │           │
│    │ 16  │────────────────▶│            │           │
│    └─────┘                 └────────────┘           │
├──────────────────────────────────────────────────────┤
│ 3. TLB LOOKUP: Virtual address → Physical address   │
├──────────────────────────────────────────────────────┤
│ 4. CACHE CHECK: L1 → L2 → L3 → RAM                  │
└──────────────────────────────────────────────────────┘

AGU Latency Considerations

Addressing Mode Typical AGU Latency Notes
[reg] 0-1 cycles Base only, fastest
[reg + disp] 0-1 cycles Simple addition
[reg + reg*scale] 1 cycle Requires SIB decode
[reg + reg*scale + disp] 1 cycle Full form
[rip + disp32] 1 cycle 64-bit RIP-relative

Cache and Memory Hierarchy Impact

Memory Access Latency (approximate):

┌─────────────┬───────────────┬────────────────┐
│ Level       │ Latency       │ Typical Size   │
├─────────────┼───────────────┼────────────────┤
│ Register    │ 0 cycles      │ 16 × 64-bit    │
│ L1 Cache    │ 4-5 cycles    │ 32-64 KB       │
│ L2 Cache    │ 12-14 cycles  │ 256 KB - 1 MB  │
│ L3 Cache    │ 30-50 cycles  │ 8-32 MB        │
│ Main RAM    │ 100-300 cycles│ GBs           │
│ SSD         │ ~10,000 cycles│ TBs           │
└─────────────┴───────────────┴────────────────┘

Exercise: Prefetch Optimization

When processing large arrays, use explicit prefetch hints:

; Process array with prefetch
process_array:
    mov rcx, 1000           ; array length
    xor rsi, rsi            ; index = 0
.loop:
    ; Prefetch data 64 bytes ahead (next cache line)
    prefetcht0 [rdi + rsi + 64]
    
    ; Process current element
    mov rax, [rdi + rsi]
    ; ... operations on rax ...
    mov [rdi + rsi], rax
    
    add rsi, 8
    dec rcx
    jnz .loop
    ret

Benchmark this with and without prefetch on arrays larger than L3 cache!