Register Addressing
mov rax, rbx ; Register to register
add rcx, rdx ; Both operands are registers
xor rax, rax ; Clear register (common idiom)
Direct Memory Addressing
Direct memory addressing uses a fixed address encoded directly in the instruction. Think of it like having a hardcoded street address—simple but inflexible.
section .data
my_var dq 42 ; 8-byte variable
buffer times 64 db 0 ; 64-byte buffer
section .text
; Direct addressing (32-bit mode style)
mov eax, [my_var] ; Load from fixed address
mov [buffer], bl ; Store byte at buffer
; In 64-bit mode, this becomes RIP-relative!
; Assembler converts [my_var] to [rip + offset_to_my_var]
64-bit Gotcha: In x86-64, "direct" addressing to labels compiles to RIP-relative. True absolute addresses require mov rax, QWORD [abs address] or using a register as base.
Real-World Use Case
section .data
; Global configuration values
debug_mode db 1
buffer_size dq 4096
error_count dd 0
section .text
global _start
_start:
; Check debug flag
cmp byte [debug_mode], 0
je .no_debug
; ... debug output ...
.no_debug:
; Increment error counter
inc dword [error_count]
; Exit
mov rax, 60
xor edi, edi
syscall
Save & Compile: direct_addressing.asm
Linux
nasm -f elf64 direct_addressing.asm -o direct_addressing.o
ld direct_addressing.o -o direct_addressing
./direct_addressing
macOS (change _start → _main, use macOS syscall numbers)
nasm -f macho64 direct_addressing.asm -o direct_addressing.o
ld -macos_version_min 10.13 -e _main -static direct_addressing.o -o direct_addressing
Windows (use Win64 API instead of Linux syscalls)
nasm -f win64 direct_addressing.asm -o direct_addressing.obj
link /subsystem:console /entry:_start direct_addressing.obj /out:direct_addressing.exe
Indirect Addressing
Base Register Indirect
mov rax, [rbx] ; Load from address in RBX
mov [rcx], rdx ; Store RDX at address in RCX
Indexed Addressing (Base + Index × Scale)
; Array access: array[i] where element size = 8 bytes
mov rax, [rbx + rcx*8] ; rbx = base, rcx = index, 8 = scale
; Common scales:
; *1 = byte array
; *2 = word array (int16)
; *4 = dword array (int32, float)
; *8 = qword array (int64, double, pointers)
Base + Displacement
Displacement is a constant offset added to the base—perfect for struct field access. It's like knowing "the kitchen is 20 feet from the front door."
; C struct equivalent:
; struct Person {
; char name[32]; ; offset 0
; int age; ; offset 32
; double salary; ; offset 36 (assume packed)
; };
; RBX points to struct Person
mov eax, [rbx + 32] ; Load age field
mov [rbx + 36], xmm0 ; Store salary (as double)
; Stack local variables (negative displacement from RBP)
mov rax, [rbp - 8] ; First local variable
mov [rbp - 16], rcx ; Second local variable
Complete Addressing Form
The most general x86 addressing mode combines all elements:
Effective Address = Base + (Index × Scale) + Displacement
┌─────────────────────────────────────────────────────────┐
│ [base + index*scale + displacement] │
│ │
│ Base: RBX, RSI, RDI, R8-R15, RBP, RSP │
│ Index: RAX-RDI, R8-R15 (NOT RSP!) │
│ Scale: 1, 2, 4, or 8 │
│ Displacement: 8-bit or 32-bit signed constant │
└─────────────────────────────────────────────────────────┘
Exercise: 2D Array Access
Access element matrix[row][col] in a 10×10 integer (4-byte) matrix:
section .bss
matrix resd 100 ; 10x10 int array
section .text
; row in RCX, col in RDX
; address = matrix + (row * 10 + col) * 4
lea rax, [rcx + rcx*4] ; rax = row * 5
lea rax, [rax*2] ; rax = row * 10
add rax, rdx ; rax = row * 10 + col
mov eax, [matrix + rax*4] ; Load matrix[row][col]
RIP-Relative Addressing (x86-64)
Position-Independent Code: In 64-bit mode, RIP-relative addressing is the default for accessing global data, enabling position-independent executables (PIE).
section .data
global_var dq 12345
section .text
mov rax, [rel global_var] ; RIP-relative (explicit)
mov rax, [global_var] ; RIP-relative (default in x64)
LEA Instruction
Load Effective Address computes an address but stores the address itself, not the memory contents. It's like getting directions to a restaurant instead of the food.
Key Insight: LEA doesn't access memory! It's a pure arithmetic operation that uses the address calculation hardware. This makes it perfect for fast multiply-add math.
Address Calculation Use
section .data
array dq 10, 20, 30, 40, 50
section .text
mov rcx, 3 ; index = 3
lea rax, [array + rcx*8] ; rax = address of array[3]
mov rbx, [rax] ; rbx = array[3] = 40
; Get address of struct field
; RDI points to struct, salary at offset 36
lea rsi, [rdi + 36] ; rsi = address of person->salary
Arithmetic "Trick"
LEA performs up to 2 adds and 1 shift in a single instruction—faster than separate operations:
; Multiply by constants using LEA
lea rax, [rbx + rbx] ; rax = rbx * 2
lea rax, [rbx + rbx*2] ; rax = rbx * 3
lea rax, [rbx*4] ; rax = rbx * 4
lea rax, [rbx + rbx*4] ; rax = rbx * 5
lea rax, [rbx + rbx*8] ; rax = rbx * 9
; Add two registers plus constant (3-operand addition!)
lea rax, [rbx + rcx + 10] ; rax = rbx + rcx + 10
; Combine for complex expressions
; rax = rbx * 5 + 7
lea rax, [rbx + rbx*4 + 7]
LEA vs MOV Performance
| Task |
Using MOV/ADD/IMUL |
Using LEA |
| rax = rbx * 5 |
imul rax, rbx, 5 (3 cycles) |
lea rax, [rbx+rbx*4] (1 cycle) |
| rax = rbx + rcx |
mov rax, rbx
add rax, rcx |
lea rax, [rbx+rcx] |
| rax = rax + 1 |
inc rax (affects flags) |
lea rax, [rax+1] (no flags) |
Pro Tip: Use LEA when you need the result without affecting FLAGS, or when combining operations. Compilers often use LEA for x = a + b + constant patterns.
Effective Address Calculation
When the CPU decodes a memory operand, dedicated Address Generation Units (AGUs) compute the effective address in parallel with other operations.
Hardware Pipeline
Instruction: mov rax, [rbx + rcx*4 + 16]
┌──────────────────────────────────────────────────────┐
│ 1. DECODE: Extract base=RBX, index=RCX, scale=4, │
│ displacement=16 │
├──────────────────────────────────────────────────────┤
│ 2. AGU CALCULATION: │
│ ┌─────┐ ┌─────────┐ ┌────────────┐ │
│ │ RCX │──▶│ × 4 │──▶│ │ │
│ └─────┘ └─────────┘ │ ADDER │──▶ EA │
│ ┌─────┐ │ CIRCUIT │ │
│ │ RBX │────────────────▶│ │ │
│ └─────┘ │ │ │
│ ┌─────┐ │ │ │
│ │ 16 │────────────────▶│ │ │
│ └─────┘ └────────────┘ │
├──────────────────────────────────────────────────────┤
│ 3. TLB LOOKUP: Virtual address → Physical address │
├──────────────────────────────────────────────────────┤
│ 4. CACHE CHECK: L1 → L2 → L3 → RAM │
└──────────────────────────────────────────────────────┘
AGU Latency Considerations
| Addressing Mode |
Typical AGU Latency |
Notes |
[reg] |
0-1 cycles |
Base only, fastest |
[reg + disp] |
0-1 cycles |
Simple addition |
[reg + reg*scale] |
1 cycle |
Requires SIB decode |
[reg + reg*scale + disp] |
1 cycle |
Full form |
[rip + disp32] |
1 cycle |
64-bit RIP-relative |
Cache and Memory Hierarchy Impact
Memory Access Latency (approximate):
┌─────────────┬───────────────┬────────────────┐
│ Level │ Latency │ Typical Size │
├─────────────┼───────────────┼────────────────┤
│ Register │ 0 cycles │ 16 × 64-bit │
│ L1 Cache │ 4-5 cycles │ 32-64 KB │
│ L2 Cache │ 12-14 cycles │ 256 KB - 1 MB │
│ L3 Cache │ 30-50 cycles │ 8-32 MB │
│ Main RAM │ 100-300 cycles│ GBs │
│ SSD │ ~10,000 cycles│ TBs │
└─────────────┴───────────────┴────────────────┘
Exercise: Prefetch Optimization
When processing large arrays, use explicit prefetch hints:
; Process array with prefetch
process_array:
mov rcx, 1000 ; array length
xor rsi, rsi ; index = 0
.loop:
; Prefetch data 64 bytes ahead (next cache line)
prefetcht0 [rdi + rsi + 64]
; Process current element
mov rax, [rdi + rsi]
; ... operations on rax ...
mov [rdi + rsi], rax
add rsi, 8
dec rcx
jnz .loop
ret
Benchmark this with and without prefetch on arrays larger than L3 cache!