Back to Technology

x86 Assembly Series Part 8: Stack Operations & Calling Conventions

February 6, 2026 Wasil Zafar 35 min read

Master stack operations (push/pop), stack frames, function prologues and epilogues, System V AMD64 ABI for Linux, Windows x64 calling convention with shadow space, and register preservation rules.

Table of Contents

  1. Stack Fundamentals
  2. Stack Frames
  3. System V AMD64 ABI
  4. Windows x64 Convention
  5. Leaf Functions
  6. Red Zone (System V)

Stack Fundamentals

Diagram showing x86 stack growing downward with PUSH decrementing and POP incrementing RSP
x86 stack fundamentals — the stack grows downward in memory, with PUSH decrementing RSP and POP incrementing it.
Stack Direction: The x86 stack grows downward (toward lower addresses). PUSH decrements RSP, POP increments RSP. The stack must be 16-byte aligned before CALL instructions.

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 9
Development Environment, Tooling & Workflow
IDEs, debuggers, build tools, workflow setup
Assembly Language Fundamentals & Toolchain Setup
Syntax basics, assemblers, linkers, object files
x86 CPU Architecture Overview
Instruction pipeline, execution units, microarchitecture
Registers – Complete Deep Dive
GPRs, segment, control, flags, MSRs
Instruction Encoding & Binary Layout
Opcode bytes, ModR/M, SIB, prefixes, encoding schemes
NASM Syntax, Directives & Macros
Sections, labels, EQU, %macro, conditional assembly
Complete Assembler Comparison
NASM vs MASM vs GAS vs FASM, syntax differences
Memory Addressing Modes
Direct, indirect, indexed, base+displacement, RIP-relative
9
Stack Internals & Calling Conventions
Push/pop, stack frames, cdecl, System V ABI, fastcall
You Are Here
10
Control Flow & Procedures
Jumps, loops, conditionals, CALL/RET, function design
11
Integer, Bitwise & Arithmetic Operations
ADD, SUB, MUL, DIV, AND, OR, XOR, shifts, rotates
12
Floating Point & SIMD Foundations
x87 FPU, IEEE 754, SSE scalar, precision control
13
SIMD, Vectorization & Performance
SSE, AVX, AVX-512, data-parallel processing
14
System Calls, Interrupts & Privilege Transitions
INT, SYSCALL, IDT, ring transitions, exception handling
15
Debugging & Reverse Engineering
GDB, breakpoints, disassembly, binary analysis, IDA
16
Linking, Relocation & Loader Behavior
ELF/PE formats, symbol resolution, dynamic linking, GOT/PLT
17
x86-64 Long Mode & Advanced Features
64-bit extensions, RIP addressing, canonical addresses
18
Assembly + C/C++ Interoperability
Inline assembly, calling C from ASM, ABI compliance
19
Memory Protection & Security Concepts
DEP, ASLR, stack canaries, ROP, mitigations
20
Bootloaders & Bare-Metal Programming
BIOS/UEFI, MBR, real mode, protected mode transition
21
Kernel-Level Assembly
Context switching, interrupt handlers, TSS, GDT/LDT
22
Complete Emulator & Simulator Guide
QEMU, Bochs, instruction-level simulation, debugging VMs
23
Advanced Optimization & CPU Internals
Pipeline hazards, branch prediction, cache optimization, ILP
24
Real-World Assembly Projects
Shellcode, drivers, cryptography, signal processing
25
Assembly Mastery Capstone
Final project, comprehensive review, advanced techniques

PUSH & POP Instructions

Basics

Stack Operations

; PUSH: Decrement RSP, store value
push rax                ; RSP -= 8; [RSP] = RAX
push qword 42           ; Push immediate
push qword [var]        ; Push memory value

; POP: Load value, increment RSP
pop rbx                 ; RBX = [RSP]; RSP += 8
pop qword [var]         ; Pop to memory

; Equivalent operations:
sub rsp, 8              ; Equivalent to...
mov [rsp], rax          ; ...push rax

mov rbx, [rsp]          ; Equivalent to...
add rsp, 8              ; ...pop rbx

RSP Management & Alignment

The x86-64 ABI requires the stack to be 16-byte aligned before a CALL instruction. Since CALL pushes an 8-byte return address, RSP is misaligned inside the called function.

Before CALL:  RSP = 0x7FFF0010  (16-byte aligned: last nibble is 0)
After CALL:   RSP = 0x7FFF0008  (misaligned by 8 due to return address)
After PUSH:   RSP = 0x7FFF0000  (aligned again)
Alignment Rule: Before calling any function (including library functions), ensure RSP is 16-byte aligned. Misalignment can cause SEGFAULT on SSE instructions that require aligned memory!
; Proper stack alignment patterns:

; Pattern 1: Push an odd number of 8-byte values
my_function:
    push rbp            ; RSP now 16-byte aligned
    push rbx            ; (if needed, saves callee-saved)
    push r12
    sub rsp, 24         ; Allocate + ensure 16-byte alignment
    
; Pattern 2: Use AND to force alignment
    mov rbp, rsp
    and rsp, -16        ; Force 16-byte alignment (clear low 4 bits)
    
; Pattern 3: Adjust allocation size
    sub rsp, 40         ; 32 for shadow space + 8 for alignment

Stack Frames

Diagram of a stack frame showing return address, saved RBP, local variables, and frame pointer
Stack frame layout — return address, saved frame pointer (RBP), and local variables arranged in the function’s stack frame.

Function Prologue

; Standard prologue (with frame pointer)
my_function:
    push rbp            ; Save caller's frame pointer
    mov rbp, rsp        ; Establish new frame pointer
    sub rsp, 32         ; Allocate local variables + alignment

    ; Function body...

Function Epilogue

    ; End of function body...
    mov rsp, rbp        ; Deallocate locals
    pop rbp             ; Restore caller's frame pointer
    ret                 ; Return to caller

; Or use LEAVE instruction:
    leave               ; Equivalent to: mov rsp,rbp + pop rbp
    ret

System V AMD64 ABI (Linux/macOS)

Diagram of System V AMD64 ABI register usage showing argument registers RDI, RSI, RDX, RCX, R8, R9
System V AMD64 ABI calling convention — integer arguments passed in RDI, RSI, RDX, RCX, R8, R9 with callee-saved registers RBX, RBP, R12–R15.

Parameter Passing

Reference

System V Register Usage

PurposeRegisters
Integer Args (1-6)RDI, RSI, RDX, RCX, R8, R9
Float Args (1-8)XMM0-XMM7
Return ValueRAX (int), XMM0 (float)
Callee-SavedRBX, RBP, R12-R15
Caller-SavedRAX, RCX, RDX, RSI, RDI, R8-R11

Return Values

Functions return values in registers based on type:

; Integer return (up to 64 bits)
my_int_func:
    mov rax, 42         ; Return value in RAX
    ret

; 128-bit integer return
my_128bit_func:
    mov rax, low_part   ; Low 64 bits in RAX
    mov rdx, high_part  ; High 64 bits in RDX
    ret

; Floating-point return
my_float_func:
    movsd xmm0, [my_double]  ; Return double in XMM0
    ret

; Struct return (small structs)
; Structs ≤ 16 bytes returned in RAX:RDX
; Larger structs: caller passes hidden pointer as first arg
return_large_struct:
    ; RDI contains hidden pointer to return buffer
    mov [rdi], rax      ; Store struct data
    mov [rdi+8], rbx
    mov rax, rdi        ; Return the pointer in RAX
    ret

Callee-Saved Registers

If your function uses callee-saved registers, you must preserve their values:

; Function that uses callee-saved registers
process_data:
    ; Save registers we'll use
    push rbx            ; Must preserve
    push r12            ; Must preserve
    push r13            ; Must preserve
    
    ; Now safe to use RBX, R12, R13
    mov rbx, rdi        ; Save parameter
    mov r12, rsi
    xor r13, r13        ; Counter
    
.loop:
    ; ... processing ...
    inc r13
    cmp r13, r12
    jl .loop
    
    mov rax, r13        ; Return value
    
    ; Restore in reverse order!
    pop r13
    pop r12
    pop rbx
    ret
Memory Trick: Callee-saved = "Belongs to caller, Borrowed by callee." Think of it as borrowing a book—you must return it exactly as received.

Windows x64 Calling Convention

Shadow Space (Home Space)

Windows Requirement: Callers must allocate 32 bytes (4 × 8) of "shadow space" on the stack before every CALL, even if the function takes fewer than 4 arguments.
; Calling a Windows function
sub rsp, 40             ; 32 bytes shadow + 8 for alignment
mov rcx, param1         ; First parameter
mov rdx, param2         ; Second parameter
call SomeFunction
add rsp, 40             ; Clean up

Parameter Passing

Windows x64 uses a four-register calling convention:

Argument Integer/Pointer Float/Double
1stRCXXMM0
2ndRDXXMM1
3rdR8XMM2
4thR9XMM3
5th+Stack (RSP+40, RSP+48, ...)
; Call Windows API: MessageBoxA(hWnd, lpText, lpCaption, uType)
extern MessageBoxA

section .data
    msg db "Hello, Windows!", 0
    title db "My App", 0

section .text
main:
    sub rsp, 40             ; Shadow space (32) + alignment (8) = 40
    
    xor ecx, ecx            ; hWnd = NULL
    lea rdx, [msg]          ; lpText
    lea r8, [title]         ; lpCaption
    mov r9d, 0              ; uType = MB_OK
    call MessageBoxA
    
    add rsp, 40
    ret

Key Differences: System V vs Windows x64

AspectSystem V (Linux)Windows x64
Args in registers6 (RDI, RSI, RDX, RCX, R8, R9)4 (RCX, RDX, R8, R9)
Shadow spaceNoYes (32 bytes mandatory)
Red zoneYes (128 bytes)No
Callee-saved XMMNoneXMM6-XMM15

Leaf Functions

A leaf function is one that doesn't call other functions. These can be heavily optimized:

Comparison of leaf function with no calls and optimized path versus non-leaf function with full stack frame
Leaf vs non-leaf functions — leaf functions that make no calls can skip stack frame setup for significant performance gains.
  • No need to set up a stack frame
  • No need for frame pointer (RBP)
  • Can use red zone on System V
  • Minimal prologue/epilogue
; Non-leaf function (calls printf)
; Must follow full convention
print_value:
    push rbp
    mov rbp, rsp
    sub rsp, 16         ; Alignment + locals
    
    mov rsi, rdi        ; Arg for printf
    lea rdi, [fmt]
    xor eax, eax        ; 0 vector args
    call printf
    
    leave
    ret

; Leaf function (optimized, no calls)
; Can skip frame setup entirely
add_three:
    lea rax, [rdi + rsi]    ; rax = a + b
    add rax, rdx            ; rax += c
    ret                     ; No prologue/epilogue needed!
Compiler Optimization: Modern compilers automatically detect leaf functions and omit frame pointers. Use -fomit-frame-pointer to enable this for non-leaf functions too (harder to debug).

Red Zone (System V)

The red zone is a 128-byte area below RSP that leaf functions can use without adjusting RSP. It's "protected" from signal handlers and interrupts.

Diagram showing the 128-byte red zone below RSP in System V ABI
The System V red zone — a 128-byte area below RSP that leaf functions can use without adjusting the stack pointer.
Stack Layout with Red Zone:

        │             │
        ├─────────────┤
        │ Return Addr │ ← RSP points here
        ├─────────────┤
        │             │
        │  Red Zone   │ ← 128 bytes, usable without SUB RSP
        │  (128 bytes)│
        │             │
        ├─────────────┤
        │ Danger Zone │ ← Below red zone, may be clobbered
        └─────────────┘

Using the Red Zone

; Leaf function using red zone (System V only!)
compute_hash:
    ; No stack frame needed - use red zone for locals
    mov [rsp - 8], rbx      ; Save RBX in red zone
    mov [rsp - 16], r12     ; Save R12 in red zone
    
    ; Use RBX and R12 freely...
    mov rbx, rdi
    xor r12, r12
    
    ; ... computation ...
    
    ; Restore and return
    mov rbx, [rsp - 8]
    mov r12, [rsp - 16]
    ret
Windows Warning: Windows x64 has NO red zone! The area below RSP can be clobbered by interrupts, exception handlers, or debuggers. Always allocate stack space explicitly on Windows.

Exercise: Cross-Platform Function

Write a function that works on both Linux and Windows:

; Cross-platform compatible function
multiply_add:
    ; Works on both Linux and Windows
    ; Linux: args in RDI, RSI, RDX
    ; Windows: args in RCX, RDX, R8
%ifdef WINDOWS
    mov rax, rcx
    imul rax, rdx
    add rax, r8
%else
    mov rax, rdi
    imul rax, rsi
    add rax, rdx
%endif
    ret