x86 Assembly Series Part 7: Memory Addressing Modes

Addressing Modes Overview

                        
                        Key Concept: Addressing modes determine how the CPU calculates the location of operands. x86 offers rich addressing capabilities that enable efficient array access, struct navigation, and position-independent code.
                    

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 8

8

Memory Addressing Modes

Direct, indirect, indexed, base+displacement, RIP-relative

You Are Here

9

Stack Internals & Calling Conventions

Push/pop, stack frames, cdecl, System V ABI, fastcall

10

Control Flow & Procedures

Jumps, loops, conditionals, CALL/RET, function design

11

Integer, Bitwise & Arithmetic Operations

ADD, SUB, MUL, DIV, AND, OR, XOR, shifts, rotates

12

Floating Point & SIMD Foundations

x87 FPU, IEEE 754, SSE scalar, precision control

13

SIMD, Vectorization & Performance

SSE, AVX, AVX-512, data-parallel processing

14

System Calls, Interrupts & Privilege Transitions

INT, SYSCALL, IDT, ring transitions, exception handling

15

Debugging & Reverse Engineering

GDB, breakpoints, disassembly, binary analysis, IDA

16

Linking, Relocation & Loader Behavior

ELF/PE formats, symbol resolution, dynamic linking, GOT/PLT

17

x86-64 Long Mode & Advanced Features

64-bit extensions, RIP addressing, canonical addresses

18

Assembly + C/C++ Interoperability

Inline assembly, calling C from ASM, ABI compliance

19

Memory Protection & Security Concepts

DEP, ASLR, stack canaries, ROP, mitigations

20

Bootloaders & Bare-Metal Programming

BIOS/UEFI, MBR, real mode, protected mode transition

21

Kernel-Level Assembly

Context switching, interrupt handlers, TSS, GDT/LDT

22

Complete Emulator & Simulator Guide

QEMU, Bochs, instruction-level simulation, debugging VMs

23

Advanced Optimization & CPU Internals

Pipeline hazards, branch prediction, cache optimization, ILP

24

Real-World Assembly Projects

Shellcode, drivers, cryptography, signal processing

25

Assembly Mastery Capstone

Final project, comprehensive review, advanced techniques

Formula

General Effective Address Formula

Effective Address = Base + (Index × Scale) + Displacement

Where:
- Base: Any general-purpose register
- Index: Any GP register except RSP
- Scale: 1, 2, 4, or 8
- Displacement: 8, 16, or 32-bit signed constant

Immediate Addressing

mov rax, 42              ; Immediate value in instruction
mov rbx, 0xDEADBEEF      ; Hex immediate
add rcx, 100             ; Add immediate to register

Register Addressing

mov rax, rbx             ; Register to register
add rcx, rdx             ; Both operands are registers
xor rax, rax             ; Clear register (common idiom)

Direct Memory Addressing

Direct memory addressing uses a fixed address encoded directly in the instruction. Think of it like having a hardcoded street address—simple but inflexible.

section .data
    my_var dq 42           ; 8-byte variable
    buffer times 64 db 0   ; 64-byte buffer

section .text
    ; Direct addressing (32-bit mode style)
    mov eax, [my_var]      ; Load from fixed address
    mov [buffer], bl       ; Store byte at buffer

    ; In 64-bit mode, this becomes RIP-relative!
    ; Assembler converts [my_var] to [rip + offset_to_my_var]

                        
                        64-bit Gotcha: In x86-64, "direct" addressing to labels compiles to RIP-relative. True absolute addresses require mov rax, QWORD [abs address] or using a register as base.
                    

Real-World Use Case

section .data
    ; Global configuration values
    debug_mode   db 1
    buffer_size  dq 4096
    error_count  dd 0

section .text
global _start
_start:
    ; Check debug flag
    cmp byte [debug_mode], 0
    je .no_debug
    ; ... debug output ...
.no_debug:

    ; Increment error counter
    inc dword [error_count]

    ; Exit
    mov rax, 60
    xor edi, edi
    syscall

Save & Compile: `direct_addressing.asm`

Linux

nasm -f elf64 direct_addressing.asm -o direct_addressing.o
ld direct_addressing.o -o direct_addressing
./direct_addressing

macOS (change _start → _main, use macOS syscall numbers)

nasm -f macho64 direct_addressing.asm -o direct_addressing.o
ld -macos_version_min 10.13 -e _main -static direct_addressing.o -o direct_addressing

Windows (use Win64 API instead of Linux syscalls)

nasm -f win64 direct_addressing.asm -o direct_addressing.obj
link /subsystem:console /entry:_start direct_addressing.obj /out:direct_addressing.exe

Indirect Addressing

Base Register Indirect

mov rax, [rbx]           ; Load from address in RBX
mov [rcx], rdx           ; Store RDX at address in RCX

Indexed Addressing (Base + Index × Scale)

; Array access: array[i] where element size = 8 bytes
mov rax, [rbx + rcx*8]   ; rbx = base, rcx = index, 8 = scale

; Common scales:
; *1 = byte array
; *2 = word array (int16)
; *4 = dword array (int32, float)
; *8 = qword array (int64, double, pointers)

Base + Displacement

Displacement is a constant offset added to the base—perfect for struct field access. It's like knowing "the kitchen is 20 feet from the front door."

; C struct equivalent:
; struct Person {
;     char name[32];   ; offset 0
;     int age;         ; offset 32
;     double salary;   ; offset 36 (assume packed)
; };

; RBX points to struct Person
mov eax, [rbx + 32]       ; Load age field
mov [rbx + 36], xmm0      ; Store salary (as double)

; Stack local variables (negative displacement from RBP)
mov rax, [rbp - 8]        ; First local variable
mov [rbp - 16], rcx       ; Second local variable

Complete Addressing Form

The most general x86 addressing mode combines all elements:

Effective Address = Base + (Index × Scale) + Displacement

┌─────────────────────────────────────────────────────────┐
│ [base + index*scale + displacement]                    │
│                                                         │
│ Base:         RBX, RSI, RDI, R8-R15, RBP, RSP          │
│ Index:        RAX-RDI, R8-R15 (NOT RSP!)               │
│ Scale:        1, 2, 4, or 8                            │
│ Displacement: 8-bit or 32-bit signed constant          │
└─────────────────────────────────────────────────────────┘

Exercise: 2D Array Access

Access element matrix[row][col] in a 10×10 integer (4-byte) matrix:

section .bss
    matrix resd 100           ; 10x10 int array

section .text
    ; row in RCX, col in RDX
    ; address = matrix + (row * 10 + col) * 4
    
    lea rax, [rcx + rcx*4]    ; rax = row * 5
    lea rax, [rax*2]          ; rax = row * 10
    add rax, rdx              ; rax = row * 10 + col
    mov eax, [matrix + rax*4] ; Load matrix[row][col]

RIP-Relative Addressing (x86-64)

Diagram showing RIP-relative addressing in x86-64 with instruction pointer offset calculation — RIP-relative addressing in x86-64 — data is referenced as an offset from the current instruction pointer, enabling position-independent code.

                        
                        Position-Independent Code: In 64-bit mode, RIP-relative addressing is the default for accessing global data, enabling position-independent executables (PIE).
                    

section .data
    global_var dq 12345

section .text
    mov rax, [rel global_var]    ; RIP-relative (explicit)
    mov rax, [global_var]        ; RIP-relative (default in x64)

LEA Instruction

Load Effective Address computes an address but stores the address itself, not the memory contents. It's like getting directions to a restaurant instead of the food.

Comparison of LEA computing an address versus MOV loading memory contents — LEA vs MOV — LEA computes and stores the effective address itself, while MOV dereferences the address to load memory contents.

                        
                        Key Insight: LEA doesn't access memory! It's a pure arithmetic operation that uses the address calculation hardware. This makes it perfect for fast multiply-add math.
                    

Address Calculation Use

section .data
    array dq 10, 20, 30, 40, 50

section .text
    mov rcx, 3                      ; index = 3
    lea rax, [array + rcx*8]        ; rax = address of array[3]
    mov rbx, [rax]                  ; rbx = array[3] = 40

    ; Get address of struct field
    ; RDI points to struct, salary at offset 36
    lea rsi, [rdi + 36]             ; rsi = address of person->salary

Arithmetic "Trick"

LEA performs up to 2 adds and 1 shift in a single instruction—faster than separate operations:

; Multiply by constants using LEA
lea rax, [rbx + rbx]              ; rax = rbx * 2
lea rax, [rbx + rbx*2]            ; rax = rbx * 3
lea rax, [rbx*4]                  ; rax = rbx * 4
lea rax, [rbx + rbx*4]            ; rax = rbx * 5
lea rax, [rbx + rbx*8]            ; rax = rbx * 9

; Add two registers plus constant (3-operand addition!)
lea rax, [rbx + rcx + 10]         ; rax = rbx + rcx + 10

; Combine for complex expressions
; rax = rbx * 5 + 7
lea rax, [rbx + rbx*4 + 7]

LEA vs MOV Performance

Task	Using MOV/ADD/IMUL	Using LEA
rax = rbx * 5	`imul rax, rbx, 5` (3 cycles)	`lea rax, [rbx+rbx*4]` (1 cycle)
rax = rbx + rcx	`mov rax, rbx` `add rax, rcx`	`lea rax, [rbx+rcx]`
rax = rax + 1	`inc rax` (affects flags)	`lea rax, [rax+1]` (no flags)

                        
                        Pro Tip: Use LEA when you need the result without affecting FLAGS, or when combining operations. Compilers often use LEA for x = a + b + constant patterns.
                    

Effective Address Calculation

When the CPU decodes a memory operand, dedicated Address Generation Units (AGUs) compute the effective address in parallel with other operations.

Hardware Pipeline

Instruction: mov rax, [rbx + rcx*4 + 16]

┌──────────────────────────────────────────────────────┐
│ 1. DECODE: Extract base=RBX, index=RCX, scale=4,    │
│            displacement=16                           │
├──────────────────────────────────────────────────────┤
│ 2. AGU CALCULATION:                                  │
│    ┌─────┐   ┌─────────┐   ┌────────────┐           │
│    │ RCX │──▶│ × 4     │──▶│            │           │
│    └─────┘   └─────────┘   │   ADDER    │──▶ EA     │
│    ┌─────┐                 │   CIRCUIT  │           │
│    │ RBX │────────────────▶│            │           │
│    └─────┘                 │            │           │
│    ┌─────┐                 │            │           │
│    │ 16  │────────────────▶│            │           │
│    └─────┘                 └────────────┘           │
├──────────────────────────────────────────────────────┤
│ 3. TLB LOOKUP: Virtual address → Physical address   │
├──────────────────────────────────────────────────────┤
│ 4. CACHE CHECK: L1 → L2 → L3 → RAM                  │
└──────────────────────────────────────────────────────┘

AGU Latency Considerations

Addressing Mode	Typical AGU Latency	Notes
`[reg]`	0-1 cycles	Base only, fastest
`[reg + disp]`	0-1 cycles	Simple addition
`[reg + reg*scale]`	1 cycle	Requires SIB decode
`[reg + reg*scale + disp]`	1 cycle	Full form
`[rip + disp32]`	1 cycle	64-bit RIP-relative

Cache and Memory Hierarchy Impact

Memory Access Latency (approximate):

┌─────────────┬───────────────┬────────────────┐
│ Level       │ Latency       │ Typical Size   │
├─────────────┼───────────────┼────────────────┤
│ Register    │ 0 cycles      │ 16 × 64-bit    │
│ L1 Cache    │ 4-5 cycles    │ 32-64 KB       │
│ L2 Cache    │ 12-14 cycles  │ 256 KB - 1 MB  │
│ L3 Cache    │ 30-50 cycles  │ 8-32 MB        │
│ Main RAM    │ 100-300 cycles│ GBs           │
│ SSD         │ ~10,000 cycles│ TBs           │
└─────────────┴───────────────┴────────────────┘

Exercise: Prefetch Optimization

When processing large arrays, use explicit prefetch hints:

; Process array with prefetch
process_array:
    mov rcx, 1000           ; array length
    xor rsi, rsi            ; index = 0
.loop:
    ; Prefetch data 64 bytes ahead (next cache line)
    prefetcht0 [rdi + rsi + 64]
    
    ; Process current element
    mov rax, [rdi + rsi]
    ; ... operations on rax ...
    mov [rdi + rsi], rax
    
    add rsi, 8
    dec rcx
    jnz .loop
    ret

Benchmark this with and without prefetch on arrays larger than L3 cache!