Stack Fundamentals
Stack Direction: The x86 stack grows downward (toward lower addresses). PUSH decrements RSP, POP increments RSP. The stack must be 16-byte aligned before CALL instructions.
PUSH & POP Instructions
Basics
Stack Operations
; PUSH: Decrement RSP, store value
push rax ; RSP -= 8; [RSP] = RAX
push qword 42 ; Push immediate
push qword [var] ; Push memory value
; POP: Load value, increment RSP
pop rbx ; RBX = [RSP]; RSP += 8
pop qword [var] ; Pop to memory
; Equivalent operations:
sub rsp, 8 ; Equivalent to...
mov [rsp], rax ; ...push rax
mov rbx, [rsp] ; Equivalent to...
add rsp, 8 ; ...pop rbx
RSP Management & Alignment
The x86-64 ABI requires the stack to be 16-byte aligned before a CALL instruction. Since CALL pushes an 8-byte return address, RSP is misaligned inside the called function.
Before CALL: RSP = 0x7FFF0010 (16-byte aligned: last nibble is 0)
After CALL: RSP = 0x7FFF0008 (misaligned by 8 due to return address)
After PUSH: RSP = 0x7FFF0000 (aligned again)
Alignment Rule: Before calling any function (including library functions), ensure RSP is 16-byte aligned. Misalignment can cause SEGFAULT on SSE instructions that require aligned memory!
; Proper stack alignment patterns:
; Pattern 1: Push an odd number of 8-byte values
my_function:
push rbp ; RSP now 16-byte aligned
push rbx ; (if needed, saves callee-saved)
push r12
sub rsp, 24 ; Allocate + ensure 16-byte alignment
; Pattern 2: Use AND to force alignment
mov rbp, rsp
and rsp, -16 ; Force 16-byte alignment (clear low 4 bits)
; Pattern 3: Adjust allocation size
sub rsp, 40 ; 32 for shadow space + 8 for alignment
Stack Frames
Function Prologue
; Standard prologue (with frame pointer)
my_function:
push rbp ; Save caller's frame pointer
mov rbp, rsp ; Establish new frame pointer
sub rsp, 32 ; Allocate local variables + alignment
; Function body...
Function Epilogue
; End of function body...
mov rsp, rbp ; Deallocate locals
pop rbp ; Restore caller's frame pointer
ret ; Return to caller
; Or use LEAVE instruction:
leave ; Equivalent to: mov rsp,rbp + pop rbp
ret
System V AMD64 ABI (Linux/macOS)
Parameter Passing
Reference
System V Register Usage
| Purpose | Registers |
| Integer Args (1-6) | RDI, RSI, RDX, RCX, R8, R9 |
| Float Args (1-8) | XMM0-XMM7 |
| Return Value | RAX (int), XMM0 (float) |
| Callee-Saved | RBX, RBP, R12-R15 |
| Caller-Saved | RAX, RCX, RDX, RSI, RDI, R8-R11 |
Return Values
Functions return values in registers based on type:
; Integer return (up to 64 bits)
my_int_func:
mov rax, 42 ; Return value in RAX
ret
; 128-bit integer return
my_128bit_func:
mov rax, low_part ; Low 64 bits in RAX
mov rdx, high_part ; High 64 bits in RDX
ret
; Floating-point return
my_float_func:
movsd xmm0, [my_double] ; Return double in XMM0
ret
; Struct return (small structs)
; Structs ≤ 16 bytes returned in RAX:RDX
; Larger structs: caller passes hidden pointer as first arg
return_large_struct:
; RDI contains hidden pointer to return buffer
mov [rdi], rax ; Store struct data
mov [rdi+8], rbx
mov rax, rdi ; Return the pointer in RAX
ret
Callee-Saved Registers
If your function uses callee-saved registers, you must preserve their values:
; Function that uses callee-saved registers
process_data:
; Save registers we'll use
push rbx ; Must preserve
push r12 ; Must preserve
push r13 ; Must preserve
; Now safe to use RBX, R12, R13
mov rbx, rdi ; Save parameter
mov r12, rsi
xor r13, r13 ; Counter
.loop:
; ... processing ...
inc r13
cmp r13, r12
jl .loop
mov rax, r13 ; Return value
; Restore in reverse order!
pop r13
pop r12
pop rbx
ret
Memory Trick: Callee-saved = "Belongs to caller, Borrowed by callee." Think of it as borrowing a book—you must return it exactly as received.
Windows x64 Calling Convention
Shadow Space (Home Space)
Windows Requirement: Callers must allocate 32 bytes (4 × 8) of "shadow space" on the stack before every CALL, even if the function takes fewer than 4 arguments.
; Calling a Windows function
sub rsp, 40 ; 32 bytes shadow + 8 for alignment
mov rcx, param1 ; First parameter
mov rdx, param2 ; Second parameter
call SomeFunction
add rsp, 40 ; Clean up
Parameter Passing
Windows x64 uses a four-register calling convention:
| Argument |
Integer/Pointer |
Float/Double |
| 1st | RCX | XMM0 |
| 2nd | RDX | XMM1 |
| 3rd | R8 | XMM2 |
| 4th | R9 | XMM3 |
| 5th+ | Stack (RSP+40, RSP+48, ...) |
; Call Windows API: MessageBoxA(hWnd, lpText, lpCaption, uType)
extern MessageBoxA
section .data
msg db "Hello, Windows!", 0
title db "My App", 0
section .text
main:
sub rsp, 40 ; Shadow space (32) + alignment (8) = 40
xor ecx, ecx ; hWnd = NULL
lea rdx, [msg] ; lpText
lea r8, [title] ; lpCaption
mov r9d, 0 ; uType = MB_OK
call MessageBoxA
add rsp, 40
ret
Key Differences: System V vs Windows x64
| Aspect | System V (Linux) | Windows x64 |
| Args in registers | 6 (RDI, RSI, RDX, RCX, R8, R9) | 4 (RCX, RDX, R8, R9) |
| Shadow space | No | Yes (32 bytes mandatory) |
| Red zone | Yes (128 bytes) | No |
| Callee-saved XMM | None | XMM6-XMM15 |
Leaf Functions
A leaf function is one that doesn't call other functions. These can be heavily optimized:
- No need to set up a stack frame
- No need for frame pointer (RBP)
- Can use red zone on System V
- Minimal prologue/epilogue
; Non-leaf function (calls printf)
; Must follow full convention
print_value:
push rbp
mov rbp, rsp
sub rsp, 16 ; Alignment + locals
mov rsi, rdi ; Arg for printf
lea rdi, [fmt]
xor eax, eax ; 0 vector args
call printf
leave
ret
; Leaf function (optimized, no calls)
; Can skip frame setup entirely
add_three:
lea rax, [rdi + rsi] ; rax = a + b
add rax, rdx ; rax += c
ret ; No prologue/epilogue needed!
Compiler Optimization: Modern compilers automatically detect leaf functions and omit frame pointers. Use -fomit-frame-pointer to enable this for non-leaf functions too (harder to debug).
Red Zone (System V)
The red zone is a 128-byte area below RSP that leaf functions can use without adjusting RSP. It's "protected" from signal handlers and interrupts.
Stack Layout with Red Zone:
│ │
├─────────────┤
│ Return Addr │ ← RSP points here
├─────────────┤
│ │
│ Red Zone │ ← 128 bytes, usable without SUB RSP
│ (128 bytes)│
│ │
├─────────────┤
│ Danger Zone │ ← Below red zone, may be clobbered
└─────────────┘
Using the Red Zone
; Leaf function using red zone (System V only!)
compute_hash:
; No stack frame needed - use red zone for locals
mov [rsp - 8], rbx ; Save RBX in red zone
mov [rsp - 16], r12 ; Save R12 in red zone
; Use RBX and R12 freely...
mov rbx, rdi
xor r12, r12
; ... computation ...
; Restore and return
mov rbx, [rsp - 8]
mov r12, [rsp - 16]
ret
Windows Warning: Windows x64 has NO red zone! The area below RSP can be clobbered by interrupts, exception handlers, or debuggers. Always allocate stack space explicitly on Windows.
Exercise: Cross-Platform Function
Write a function that works on both Linux and Windows:
; Cross-platform compatible function
multiply_add:
; Works on both Linux and Windows
; Linux: args in RDI, RSI, RDX
; Windows: args in RCX, RDX, R8
%ifdef WINDOWS
mov rax, rcx
imul rax, rdx
add rax, r8
%else
mov rax, rdi
imul rax, rsi
add rax, rdx
%endif
ret
Continue the Series
Part 7: Memory Addressing Modes
Master all x86 addressing modes including RIP-relative addressing.
Read Article
Part 9: Control Flow Instructions
Master jumps, branches, loops, and conditional execution.
Read Article
Part 16: C Interoperability
Integrate assembly with C code using proper calling conventions.
Read Article