Back to Technology

x86 Assembly Series Part 8: Stack Operations & Calling Conventions

February 6, 2026 Wasil Zafar 35 min read

Master stack operations (push/pop), stack frames, function prologues and epilogues, System V AMD64 ABI for Linux, Windows x64 calling convention with shadow space, and register preservation rules.

Table of Contents

  1. Stack Fundamentals
  2. Stack Frames
  3. System V AMD64 ABI
  4. Windows x64 Convention
  5. Leaf Functions
  6. Red Zone (System V)

Stack Fundamentals

Stack Direction: The x86 stack grows downward (toward lower addresses). PUSH decrements RSP, POP increments RSP. The stack must be 16-byte aligned before CALL instructions.

PUSH & POP Instructions

Basics

Stack Operations

; PUSH: Decrement RSP, store value
push rax                ; RSP -= 8; [RSP] = RAX
push qword 42           ; Push immediate
push qword [var]        ; Push memory value

; POP: Load value, increment RSP
pop rbx                 ; RBX = [RSP]; RSP += 8
pop qword [var]         ; Pop to memory

; Equivalent operations:
sub rsp, 8              ; Equivalent to...
mov [rsp], rax          ; ...push rax

mov rbx, [rsp]          ; Equivalent to...
add rsp, 8              ; ...pop rbx

RSP Management & Alignment

The x86-64 ABI requires the stack to be 16-byte aligned before a CALL instruction. Since CALL pushes an 8-byte return address, RSP is misaligned inside the called function.

Before CALL:  RSP = 0x7FFF0010  (16-byte aligned: last nibble is 0)
After CALL:   RSP = 0x7FFF0008  (misaligned by 8 due to return address)
After PUSH:   RSP = 0x7FFF0000  (aligned again)
Alignment Rule: Before calling any function (including library functions), ensure RSP is 16-byte aligned. Misalignment can cause SEGFAULT on SSE instructions that require aligned memory!
; Proper stack alignment patterns:

; Pattern 1: Push an odd number of 8-byte values
my_function:
    push rbp            ; RSP now 16-byte aligned
    push rbx            ; (if needed, saves callee-saved)
    push r12
    sub rsp, 24         ; Allocate + ensure 16-byte alignment
    
; Pattern 2: Use AND to force alignment
    mov rbp, rsp
    and rsp, -16        ; Force 16-byte alignment (clear low 4 bits)
    
; Pattern 3: Adjust allocation size
    sub rsp, 40         ; 32 for shadow space + 8 for alignment

Stack Frames

Function Prologue

; Standard prologue (with frame pointer)
my_function:
    push rbp            ; Save caller's frame pointer
    mov rbp, rsp        ; Establish new frame pointer
    sub rsp, 32         ; Allocate local variables + alignment

    ; Function body...

Function Epilogue

    ; End of function body...
    mov rsp, rbp        ; Deallocate locals
    pop rbp             ; Restore caller's frame pointer
    ret                 ; Return to caller

; Or use LEAVE instruction:
    leave               ; Equivalent to: mov rsp,rbp + pop rbp
    ret

System V AMD64 ABI (Linux/macOS)

Parameter Passing

Reference

System V Register Usage

PurposeRegisters
Integer Args (1-6)RDI, RSI, RDX, RCX, R8, R9
Float Args (1-8)XMM0-XMM7
Return ValueRAX (int), XMM0 (float)
Callee-SavedRBX, RBP, R12-R15
Caller-SavedRAX, RCX, RDX, RSI, RDI, R8-R11

Return Values

Functions return values in registers based on type:

; Integer return (up to 64 bits)
my_int_func:
    mov rax, 42         ; Return value in RAX
    ret

; 128-bit integer return
my_128bit_func:
    mov rax, low_part   ; Low 64 bits in RAX
    mov rdx, high_part  ; High 64 bits in RDX
    ret

; Floating-point return
my_float_func:
    movsd xmm0, [my_double]  ; Return double in XMM0
    ret

; Struct return (small structs)
; Structs ≤ 16 bytes returned in RAX:RDX
; Larger structs: caller passes hidden pointer as first arg
return_large_struct:
    ; RDI contains hidden pointer to return buffer
    mov [rdi], rax      ; Store struct data
    mov [rdi+8], rbx
    mov rax, rdi        ; Return the pointer in RAX
    ret

Callee-Saved Registers

If your function uses callee-saved registers, you must preserve their values:

; Function that uses callee-saved registers
process_data:
    ; Save registers we'll use
    push rbx            ; Must preserve
    push r12            ; Must preserve
    push r13            ; Must preserve
    
    ; Now safe to use RBX, R12, R13
    mov rbx, rdi        ; Save parameter
    mov r12, rsi
    xor r13, r13        ; Counter
    
.loop:
    ; ... processing ...
    inc r13
    cmp r13, r12
    jl .loop
    
    mov rax, r13        ; Return value
    
    ; Restore in reverse order!
    pop r13
    pop r12
    pop rbx
    ret
Memory Trick: Callee-saved = "Belongs to caller, Borrowed by callee." Think of it as borrowing a book—you must return it exactly as received.

Windows x64 Calling Convention

Shadow Space (Home Space)

Windows Requirement: Callers must allocate 32 bytes (4 × 8) of "shadow space" on the stack before every CALL, even if the function takes fewer than 4 arguments.
; Calling a Windows function
sub rsp, 40             ; 32 bytes shadow + 8 for alignment
mov rcx, param1         ; First parameter
mov rdx, param2         ; Second parameter
call SomeFunction
add rsp, 40             ; Clean up

Parameter Passing

Windows x64 uses a four-register calling convention:

Argument Integer/Pointer Float/Double
1stRCXXMM0
2ndRDXXMM1
3rdR8XMM2
4thR9XMM3
5th+Stack (RSP+40, RSP+48, ...)
; Call Windows API: MessageBoxA(hWnd, lpText, lpCaption, uType)
extern MessageBoxA

section .data
    msg db "Hello, Windows!", 0
    title db "My App", 0

section .text
main:
    sub rsp, 40             ; Shadow space (32) + alignment (8) = 40
    
    xor ecx, ecx            ; hWnd = NULL
    lea rdx, [msg]          ; lpText
    lea r8, [title]         ; lpCaption
    mov r9d, 0              ; uType = MB_OK
    call MessageBoxA
    
    add rsp, 40
    ret

Key Differences: System V vs Windows x64

AspectSystem V (Linux)Windows x64
Args in registers6 (RDI, RSI, RDX, RCX, R8, R9)4 (RCX, RDX, R8, R9)
Shadow spaceNoYes (32 bytes mandatory)
Red zoneYes (128 bytes)No
Callee-saved XMMNoneXMM6-XMM15

Leaf Functions

A leaf function is one that doesn't call other functions. These can be heavily optimized:

  • No need to set up a stack frame
  • No need for frame pointer (RBP)
  • Can use red zone on System V
  • Minimal prologue/epilogue
; Non-leaf function (calls printf)
; Must follow full convention
print_value:
    push rbp
    mov rbp, rsp
    sub rsp, 16         ; Alignment + locals
    
    mov rsi, rdi        ; Arg for printf
    lea rdi, [fmt]
    xor eax, eax        ; 0 vector args
    call printf
    
    leave
    ret

; Leaf function (optimized, no calls)
; Can skip frame setup entirely
add_three:
    lea rax, [rdi + rsi]    ; rax = a + b
    add rax, rdx            ; rax += c
    ret                     ; No prologue/epilogue needed!
Compiler Optimization: Modern compilers automatically detect leaf functions and omit frame pointers. Use -fomit-frame-pointer to enable this for non-leaf functions too (harder to debug).

Red Zone (System V)

The red zone is a 128-byte area below RSP that leaf functions can use without adjusting RSP. It's "protected" from signal handlers and interrupts.

Stack Layout with Red Zone:

        │             │
        ├─────────────┤
        │ Return Addr │ ← RSP points here
        ├─────────────┤
        │             │
        │  Red Zone   │ ← 128 bytes, usable without SUB RSP
        │  (128 bytes)│
        │             │
        ├─────────────┤
        │ Danger Zone │ ← Below red zone, may be clobbered
        └─────────────┘

Using the Red Zone

; Leaf function using red zone (System V only!)
compute_hash:
    ; No stack frame needed - use red zone for locals
    mov [rsp - 8], rbx      ; Save RBX in red zone
    mov [rsp - 16], r12     ; Save R12 in red zone
    
    ; Use RBX and R12 freely...
    mov rbx, rdi
    xor r12, r12
    
    ; ... computation ...
    
    ; Restore and return
    mov rbx, [rsp - 8]
    mov r12, [rsp - 16]
    ret
Windows Warning: Windows x64 has NO red zone! The area below RSP can be clobbered by interrupts, exception handlers, or debuggers. Always allocate stack space explicitly on Windows.

Exercise: Cross-Platform Function

Write a function that works on both Linux and Windows:

; Cross-platform compatible function
multiply_add:
    ; Works on both Linux and Windows
    ; Linux: args in RDI, RSI, RDX
    ; Windows: args in RCX, RDX, R8
%ifdef WINDOWS
    mov rax, rcx
    imul rax, rdx
    add rax, r8
%else
    mov rax, rdi
    imul rax, rsi
    add rax, rdx
%endif
    ret