Back to Technology

x86 Assembly Series Part 9: Control Flow Instructions

February 6, 2026 Wasil Zafar 30 min read

Master x86 control flow including unconditional jumps, conditional branches based on flags, CMP/TEST instructions, loop constructs, and understanding branch prediction for performance.

Table of Contents

  1. Unconditional Jumps
  2. CMP & TEST
  3. Conditional Jumps
  4. Common Patterns
  5. LOOP Instructions
  6. Branch Prediction
  7. CMOVcc (Branchless)

Unconditional Jumps

jmp label           ; Jump to label (relative)
jmp rax             ; Jump to address in RAX (indirect)
jmp [table + rax*8] ; Jump table access

CMP & TEST Instructions

Diagram showing CMP performing subtraction and TEST performing AND to set CPU flags
CMP vs TEST — CMP sets flags based on subtraction (dst − src) while TEST uses bitwise AND, both without storing the result.
Key Difference: CMP performs subtraction (sets flags based on dst-src), TEST performs AND (commonly used to check if register is zero or test specific bits).

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 10
Development Environment, Tooling & Workflow
IDEs, debuggers, build tools, workflow setup
Assembly Language Fundamentals & Toolchain Setup
Syntax basics, assemblers, linkers, object files
x86 CPU Architecture Overview
Instruction pipeline, execution units, microarchitecture
Registers – Complete Deep Dive
GPRs, segment, control, flags, MSRs
Instruction Encoding & Binary Layout
Opcode bytes, ModR/M, SIB, prefixes, encoding schemes
NASM Syntax, Directives & Macros
Sections, labels, EQU, %macro, conditional assembly
Complete Assembler Comparison
NASM vs MASM vs GAS vs FASM, syntax differences
Memory Addressing Modes
Direct, indirect, indexed, base+displacement, RIP-relative
Stack Internals & Calling Conventions
Push/pop, stack frames, cdecl, System V ABI, fastcall
10
Control Flow & Procedures
Jumps, loops, conditionals, CALL/RET, function design
You Are Here
11
Integer, Bitwise & Arithmetic Operations
ADD, SUB, MUL, DIV, AND, OR, XOR, shifts, rotates
12
Floating Point & SIMD Foundations
x87 FPU, IEEE 754, SSE scalar, precision control
13
SIMD, Vectorization & Performance
SSE, AVX, AVX-512, data-parallel processing
14
System Calls, Interrupts & Privilege Transitions
INT, SYSCALL, IDT, ring transitions, exception handling
15
Debugging & Reverse Engineering
GDB, breakpoints, disassembly, binary analysis, IDA
16
Linking, Relocation & Loader Behavior
ELF/PE formats, symbol resolution, dynamic linking, GOT/PLT
17
x86-64 Long Mode & Advanced Features
64-bit extensions, RIP addressing, canonical addresses
18
Assembly + C/C++ Interoperability
Inline assembly, calling C from ASM, ABI compliance
19
Memory Protection & Security Concepts
DEP, ASLR, stack canaries, ROP, mitigations
20
Bootloaders & Bare-Metal Programming
BIOS/UEFI, MBR, real mode, protected mode transition
21
Kernel-Level Assembly
Context switching, interrupt handlers, TSS, GDT/LDT
22
Complete Emulator & Simulator Guide
QEMU, Bochs, instruction-level simulation, debugging VMs
23
Advanced Optimization & CPU Internals
Pipeline hazards, branch prediction, cache optimization, ILP
24
Real-World Assembly Projects
Shellcode, drivers, cryptography, signal processing
25
Assembly Mastery Capstone
Final project, comprehensive review, advanced techniques
cmp rax, rbx        ; Set flags based on RAX - RBX
cmp rax, 10         ; Compare RAX to immediate

test rax, rax       ; Is RAX zero? (sets ZF if RAX==0)
test rax, 1         ; Is bit 0 set? (check odd/even)

Conditional Jumps

Decision tree showing signed and unsigned conditional jump instructions based on flag states
Conditional jump instruction families — signed (JG, JGE, JL, JLE) and unsigned (JA, JAE, JB, JBE) variants based on CPU flag states.

Signed Comparisons

Reference

Signed Jump Instructions

InstructionConditionFlags
JG / JNLEGreater thanZF=0 AND SF=OF
JGE / JNLGreater or equalSF=OF
JL / JNGELess thanSF≠OF
JLE / JNGLess or equalZF=1 OR SF≠OF

Unsigned Comparisons

Reference

Unsigned Jump Instructions

InstructionConditionFlags
JA / JNBEAboveCF=0 AND ZF=0
JAE / JNB / JNCAbove or equalCF=0
JB / JNAE / JCBelowCF=1
JBE / JNABelow or equalCF=1 OR ZF=1

Flag-Based Jumps

These jumps test individual CPU flags directly:

InstructionMeaningFlag TestCommon Use
JZ / JEJump if Zero/EqualZF=1After CMP, TEST
JNZ / JNEJump if Not ZeroZF=0Loop until zero
JSJump if Sign (negative)SF=1Check negative result
JNSJump if Not SignSF=0Check positive result
JOJump if OverflowOF=1Signed overflow check
JNOJump if No OverflowOF=0Safe arithmetic path
JP / JPEJump if Parity EvenPF=1Rarely used (FP compares)
JNP / JPOJump if Parity OddPF=0Rarely used
JCJump if CarryCF=1Unsigned overflow
JNCJump if No CarryCF=0No borrow in subtraction
; Check for negative result
    sub rax, rbx
    js .handle_negative     ; Jump if result is negative (SF=1)
    ; ... positive path ...
.handle_negative:
    neg rax                 ; Make it positive
    
; Check for overflow in addition
    add eax, ebx
    jo .overflow_error      ; Signed overflow occurred!
    ; ... continue normally ...
.overflow_error:
    ; Handle overflow...

Common Control Flow Patterns

Assembly control flow patterns showing if-else branches, loops, and switch jump tables
Common control flow patterns in assembly — if/else branching, counted loops, and switch/jump table implementations.

If/Else in Assembly

; C: if (rax == 10) { rbx = 1; } else { rbx = 0; }
    cmp rax, 10
    jne .else
    mov rbx, 1
    jmp .endif
.else:
    mov rbx, 0
.endif:

Switch / Jump Tables

For switch statements with dense, sequential cases, jump tables are much faster than chained if-else:

section .data
; Jump table: array of code addresses
jump_table:
    dq .case_0
    dq .case_1
    dq .case_2
    dq .case_3

section .text
; RDI contains switch value (0-3)
my_switch:
    ; Bounds check first!
    cmp rdi, 3
    ja .default             ; If > 3, go to default
    
    ; Load address from jump table and jump
    lea rax, [rel jump_table]
    jmp [rax + rdi*8]       ; Jump to case handler

.case_0:
    mov rax, 100
    jmp .end_switch
.case_1:
    mov rax, 200
    jmp .end_switch
.case_2:
    mov rax, 300
    jmp .end_switch
.case_3:
    mov rax, 400
    jmp .end_switch
.default:
    mov rax, -1
.end_switch:
    ret
Compiler Insight: GCC/Clang automatically generate jump tables for dense switch statements. Check with -S flag to see the assembly output!

For/While Loops

; C: for (int i = 0; i < 10; i++) { sum += array[i]; }
    xor ecx, ecx        ; i = 0
    xor eax, eax        ; sum = 0
.loop:
    cmp ecx, 10
    jge .done           ; if i >= 10, exit
    add eax, [array + rcx*4]
    inc ecx
    jmp .loop
.done:

LOOP Instructions

The LOOP family decrements RCX/ECX and jumps if non-zero:

InstructionActionEquivalent To
LOOP labelDecrement RCX, jump if RCX ≠ 0dec rcx; jnz label
LOOPE / LOOPZLoop while equal (ZF=1) AND RCX ≠ 0Find first non-match
LOOPNE / LOOPNZLoop while not equal (ZF=0) AND RCX ≠ 0Find first match
; Sum array using LOOP (simple but slow!)
    lea rsi, [array]
    mov rcx, 10             ; Loop count
    xor eax, eax            ; sum = 0
.sum_loop:
    add eax, [rsi]
    add rsi, 4
    loop .sum_loop          ; Decrement RCX, jump if != 0

; LOOPE example: Count leading zeros in array
    lea rsi, [array]
    mov rcx, 10
    xor eax, eax
.count_zeros:
    cmp dword [rsi], 0      ; Sets ZF if element is 0
    jne .done_counting
    inc eax                 ; Count this zero
    add rsi, 4
    loope .count_zeros      ; Continue while ZF=1 and RCX>0
.done_counting:
Performance Warning: LOOP instructions are slower than dec rcx; jnz on modern CPUs! They exist for legacy compatibility. Use explicit counter decrements in performance-critical code.

Branch Prediction

Modern CPUs predict branch outcomes to keep the pipeline full. A misprediction flushes the pipeline—costing 15-20 cycles!

Diagram showing CPU branch prediction with correct prediction maintaining pipeline flow versus misprediction causing a flush
Branch prediction impact — correct prediction keeps the pipeline flowing, while misprediction causes a costly pipeline flush (15–20 cycles).
Branch Prediction Pipeline Impact:

Correct Prediction (typical):
  Fetch → Decode → Execute → Retire
    T1     T2       T3        T4      (seamless flow)

Misprediction (costly!):
  Fetch → Decode → Execute → WRONG! → Flush → Refetch
    T1     T2       T3         T4       T5-T20  (15-20 cycles lost)

Branch Prediction Tips

  • Favor predictable branches: Loops are highly predictable (taken until exit)
  • Avoid data-dependent branches: if (array[i] > 50) is hard to predict
  • Use CMOV: Conditional moves don't branch at all
  • Sort data when possible: Sorted data makes comparisons predictable

Branch Predictor Demo

; Unpredictable branch (random data)
    cmp byte [rsi], 128
    jg .greater             ; 50% taken - hard to predict!
    inc r8                  ; Count <= 128
    jmp .next
.greater:
    inc r9                  ; Count > 128
.next:

; Hint: Use sorted data!
; If array is sorted, branch becomes 100% predictable:
; All values < 128 first, then all > 128
Profile with perf: Use perf stat -e branch-misses ./program to measure branch mispredictions. High numbers indicate optimization opportunities!

CMOVcc (Branchless Conditional Move)

CMOV moves data only if the condition is true—no branching, no misprediction penalty!

Comparison of branching conditional code versus branchless CMOVcc instruction
CMOVcc branchless conditional move — eliminates branch misprediction by conditionally moving data without changing control flow.
; Traditional branch (can mispredict)
    cmp rax, rbx
    jle .else
    mov rcx, rax            ; if (rax > rbx) rcx = rax
    jmp .endif
.else:
    mov rcx, rbx            ; else rcx = rbx
.endif:

; Branchless with CMOV (no misprediction possible!)
    cmp rax, rbx
    mov rcx, rbx            ; Assume rcx = rbx
    cmovg rcx, rax          ; If rax > rbx, override with rax

CMOVcc Variants

SignedUnsignedCondition
CMOVG, CMOVNLECMOVA, CMOVNBEGreater / Above
CMOVGE, CMOVNLCMOVAE, CMOVNB≥ / Above-Equal
CMOVL, CMOVNGECMOVB, CMOVNAELess / Below
CMOVLE, CMOVNGCMOVBE, CMOVNA≤ / Below-Equal
CMOVE, CMOVZEqual / Zero
CMOVNE, CMOVNZNot Equal

When to Use CMOV vs Branch

; CMOV is best when:
; 1. Branch is unpredictable (random data)
; 2. Both paths are simple (just a move)

; Branch is best when:
; 1. Branch is predictable (loops, sorted data)
; 2. One path has expensive computation

; Example: Compute absolute value
; CMOV version (always good):
    mov rbx, rax
    neg rax                 ; rax = -rax
    test rbx, rbx
    cmovs rax, rbx          ; If original was negative, use negated

; Alternative using AND (also branchless):
    mov rbx, rax
    sar rbx, 63             ; All 1s if negative, all 0s if positive
    xor rax, rbx            ; Flip bits if negative
    sub rax, rbx            ; Add 1 if negative (two's complement magic)

Exercise: Branchless Min/Max

Write branchless functions for min(a, b) and max(a, b):

; min(rdi, rsi) -> rax
min_func:
    cmp rdi, rsi
    mov rax, rsi            ; Assume rsi is smaller
    cmovl rax, rdi          ; If rdi < rsi, use rdi
    ret

; max(rdi, rsi) -> rax
max_func:
    cmp rdi, rsi
    mov rax, rdi            ; Assume rdi is larger
    cmovl rax, rsi          ; If rdi < rsi, use rsi instead
    ret