x86 Assembly Series Part 8: Stack Operations & Calling Conventions
February 6, 2026Wasil Zafar35 min read
Master stack operations (push/pop), stack frames, function prologues and epilogues, System V AMD64 ABI for Linux, Windows x64 calling convention with shadow space, and register preservation rules.
x86 stack fundamentals — the stack grows downward in memory, with PUSH decrementing RSP and POP incrementing it.
Stack Direction: The x86 stack grows downward (toward lower addresses). PUSH decrements RSP, POP increments RSP. The stack must be 16-byte aligned before CALL instructions.
; PUSH: Decrement RSP, store value
push rax ; RSP -= 8; [RSP] = RAX
push qword 42 ; Push immediate
push qword [var] ; Push memory value
; POP: Load value, increment RSP
pop rbx ; RBX = [RSP]; RSP += 8
pop qword [var] ; Pop to memory
; Equivalent operations:
sub rsp, 8 ; Equivalent to...
mov [rsp], rax ; ...push rax
mov rbx, [rsp] ; Equivalent to...
add rsp, 8 ; ...pop rbx
RSP Management & Alignment
The x86-64 ABI requires the stack to be 16-byte aligned before a CALL instruction. Since CALL pushes an 8-byte return address, RSP is misaligned inside the called function.
Before CALL: RSP = 0x7FFF0010 (16-byte aligned: last nibble is 0)
After CALL: RSP = 0x7FFF0008 (misaligned by 8 due to return address)
After PUSH: RSP = 0x7FFF0000 (aligned again)
Alignment Rule: Before calling any function (including library functions), ensure RSP is 16-byte aligned. Misalignment can cause SEGFAULT on SSE instructions that require aligned memory!
; Proper stack alignment patterns:
; Pattern 1: Push an odd number of 8-byte values
my_function:
push rbp ; RSP now 16-byte aligned
push rbx ; (if needed, saves callee-saved)
push r12
sub rsp, 24 ; Allocate + ensure 16-byte alignment
; Pattern 2: Use AND to force alignment
mov rbp, rsp
and rsp, -16 ; Force 16-byte alignment (clear low 4 bits)
; Pattern 3: Adjust allocation size
sub rsp, 40 ; 32 for shadow space + 8 for alignment
Stack Frames
Stack frame layout — return address, saved frame pointer (RBP), and local variables arranged in the function’s stack frame.
Function Prologue
; Standard prologue (with frame pointer)
my_function:
push rbp ; Save caller's frame pointer
mov rbp, rsp ; Establish new frame pointer
sub rsp, 32 ; Allocate local variables + alignment
; Function body...
Function Epilogue
; End of function body...
mov rsp, rbp ; Deallocate locals
pop rbp ; Restore caller's frame pointer
ret ; Return to caller
; Or use LEAVE instruction:
leave ; Equivalent to: mov rsp,rbp + pop rbp
ret
System V AMD64 ABI (Linux/macOS)
System V AMD64 ABI calling convention — integer arguments passed in RDI, RSI, RDX, RCX, R8, R9 with callee-saved registers RBX, RBP, R12–R15.
Parameter Passing
Reference
System V Register Usage
Purpose
Registers
Integer Args (1-6)
RDI, RSI, RDX, RCX, R8, R9
Float Args (1-8)
XMM0-XMM7
Return Value
RAX (int), XMM0 (float)
Callee-Saved
RBX, RBP, R12-R15
Caller-Saved
RAX, RCX, RDX, RSI, RDI, R8-R11
Return Values
Functions return values in registers based on type:
; Integer return (up to 64 bits)
my_int_func:
mov rax, 42 ; Return value in RAX
ret
; 128-bit integer return
my_128bit_func:
mov rax, low_part ; Low 64 bits in RAX
mov rdx, high_part ; High 64 bits in RDX
ret
; Floating-point return
my_float_func:
movsd xmm0, [my_double] ; Return double in XMM0
ret
; Struct return (small structs)
; Structs ≤ 16 bytes returned in RAX:RDX
; Larger structs: caller passes hidden pointer as first arg
return_large_struct:
; RDI contains hidden pointer to return buffer
mov [rdi], rax ; Store struct data
mov [rdi+8], rbx
mov rax, rdi ; Return the pointer in RAX
ret
Callee-Saved Registers
If your function uses callee-saved registers, you must preserve their values:
; Function that uses callee-saved registers
process_data:
; Save registers we'll use
push rbx ; Must preserve
push r12 ; Must preserve
push r13 ; Must preserve
; Now safe to use RBX, R12, R13
mov rbx, rdi ; Save parameter
mov r12, rsi
xor r13, r13 ; Counter
.loop:
; ... processing ...
inc r13
cmp r13, r12
jl .loop
mov rax, r13 ; Return value
; Restore in reverse order!
pop r13
pop r12
pop rbx
ret
Memory Trick: Callee-saved = "Belongs to caller, Borrowed by callee." Think of it as borrowing a book—you must return it exactly as received.
Windows x64 Calling Convention
Shadow Space (Home Space)
Windows Requirement: Callers must allocate 32 bytes (4 × 8) of "shadow space" on the stack before every CALL, even if the function takes fewer than 4 arguments.
; Calling a Windows function
sub rsp, 40 ; 32 bytes shadow + 8 for alignment
mov rcx, param1 ; First parameter
mov rdx, param2 ; Second parameter
call SomeFunction
add rsp, 40 ; Clean up
Parameter Passing
Windows x64 uses a four-register calling convention:
Argument
Integer/Pointer
Float/Double
1st
RCX
XMM0
2nd
RDX
XMM1
3rd
R8
XMM2
4th
R9
XMM3
5th+
Stack (RSP+40, RSP+48, ...)
; Call Windows API: MessageBoxA(hWnd, lpText, lpCaption, uType)
extern MessageBoxA
section .data
msg db "Hello, Windows!", 0
title db "My App", 0
section .text
main:
sub rsp, 40 ; Shadow space (32) + alignment (8) = 40
xor ecx, ecx ; hWnd = NULL
lea rdx, [msg] ; lpText
lea r8, [title] ; lpCaption
mov r9d, 0 ; uType = MB_OK
call MessageBoxA
add rsp, 40
ret
Key Differences: System V vs Windows x64
Aspect
System V (Linux)
Windows x64
Args in registers
6 (RDI, RSI, RDX, RCX, R8, R9)
4 (RCX, RDX, R8, R9)
Shadow space
No
Yes (32 bytes mandatory)
Red zone
Yes (128 bytes)
No
Callee-saved XMM
None
XMM6-XMM15
Leaf Functions
A leaf function is one that doesn't call other functions. These can be heavily optimized:
Leaf vs non-leaf functions — leaf functions that make no calls can skip stack frame setup for significant performance gains.
No need to set up a stack frame
No need for frame pointer (RBP)
Can use red zone on System V
Minimal prologue/epilogue
; Non-leaf function (calls printf)
; Must follow full convention
print_value:
push rbp
mov rbp, rsp
sub rsp, 16 ; Alignment + locals
mov rsi, rdi ; Arg for printf
lea rdi, [fmt]
xor eax, eax ; 0 vector args
call printf
leave
ret
; Leaf function (optimized, no calls)
; Can skip frame setup entirely
add_three:
lea rax, [rdi + rsi] ; rax = a + b
add rax, rdx ; rax += c
ret ; No prologue/epilogue needed!
Compiler Optimization: Modern compilers automatically detect leaf functions and omit frame pointers. Use -fomit-frame-pointer to enable this for non-leaf functions too (harder to debug).
Red Zone (System V)
The red zone is a 128-byte area below RSP that leaf functions can use without adjusting RSP. It's "protected" from signal handlers and interrupts.
The System V red zone — a 128-byte area below RSP that leaf functions can use without adjusting the stack pointer.
Stack Layout with Red Zone:
│ │
├─────────────┤
│ Return Addr │ ← RSP points here
├─────────────┤
│ │
│ Red Zone │ ← 128 bytes, usable without SUB RSP
│ (128 bytes)│
│ │
├─────────────┤
│ Danger Zone │ ← Below red zone, may be clobbered
└─────────────┘
Using the Red Zone
; Leaf function using red zone (System V only!)
compute_hash:
; No stack frame needed - use red zone for locals
mov [rsp - 8], rbx ; Save RBX in red zone
mov [rsp - 16], r12 ; Save R12 in red zone
; Use RBX and R12 freely...
mov rbx, rdi
xor r12, r12
; ... computation ...
; Restore and return
mov rbx, [rsp - 8]
mov r12, [rsp - 16]
ret
Windows Warning: Windows x64 has NO red zone! The area below RSP can be clobbered by interrupts, exception handlers, or debuggers. Always allocate stack space explicitly on Windows.
Exercise: Cross-Platform Function
Write a function that works on both Linux and Windows:
; Cross-platform compatible function
multiply_add:
; Works on both Linux and Windows
; Linux: args in RDI, RSI, RDX
; Windows: args in RCX, RDX, R8
%ifdef WINDOWS
mov rax, rcx
imul rax, rdx
add rax, r8
%else
mov rax, rdi
imul rax, rsi
add rax, rdx
%endif
ret
Continue the Series
Part 7: Memory Addressing Modes
Master all x86 addressing modes including RIP-relative addressing.