x86 Assembly Series Part 9: Control Flow Instructions

Unconditional Jumps

jmp label           ; Jump to label (relative)
jmp rax             ; Jump to address in RAX (indirect)
jmp [table + rax*8] ; Jump table access

CMP & TEST Instructions

Diagram showing CMP performing subtraction and TEST performing AND to set CPU flags — CMP vs TEST — CMP sets flags based on subtraction (dst − src) while TEST uses bitwise AND, both without storing the result.

                        
                        Key Difference: CMP performs subtraction (sets flags based on dst-src), TEST performs AND (commonly used to check if register is zero or test specific bits).
                    

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 10

10

Control Flow & Procedures

Jumps, loops, conditionals, CALL/RET, function design

You Are Here

11

Integer, Bitwise & Arithmetic Operations

ADD, SUB, MUL, DIV, AND, OR, XOR, shifts, rotates

12

Floating Point & SIMD Foundations

x87 FPU, IEEE 754, SSE scalar, precision control

13

SIMD, Vectorization & Performance

SSE, AVX, AVX-512, data-parallel processing

14

System Calls, Interrupts & Privilege Transitions

INT, SYSCALL, IDT, ring transitions, exception handling

15

Debugging & Reverse Engineering

GDB, breakpoints, disassembly, binary analysis, IDA

16

Linking, Relocation & Loader Behavior

ELF/PE formats, symbol resolution, dynamic linking, GOT/PLT

17

x86-64 Long Mode & Advanced Features

64-bit extensions, RIP addressing, canonical addresses

18

Assembly + C/C++ Interoperability

Inline assembly, calling C from ASM, ABI compliance

19

Memory Protection & Security Concepts

DEP, ASLR, stack canaries, ROP, mitigations

20

Bootloaders & Bare-Metal Programming

BIOS/UEFI, MBR, real mode, protected mode transition

21

Kernel-Level Assembly

Context switching, interrupt handlers, TSS, GDT/LDT

22

Complete Emulator & Simulator Guide

QEMU, Bochs, instruction-level simulation, debugging VMs

23

Advanced Optimization & CPU Internals

Pipeline hazards, branch prediction, cache optimization, ILP

24

Real-World Assembly Projects

Shellcode, drivers, cryptography, signal processing

25

Assembly Mastery Capstone

Final project, comprehensive review, advanced techniques

cmp rax, rbx        ; Set flags based on RAX - RBX
cmp rax, 10         ; Compare RAX to immediate

test rax, rax       ; Is RAX zero? (sets ZF if RAX==0)
test rax, 1         ; Is bit 0 set? (check odd/even)

Conditional Jumps

Decision tree showing signed and unsigned conditional jump instructions based on flag states — Conditional jump instruction families — signed (JG, JGE, JL, JLE) and unsigned (JA, JAE, JB, JBE) variants based on CPU flag states.

Signed Comparisons

Reference

Signed Jump Instructions

Instruction	Condition	Flags
JG / JNLE	Greater than	ZF=0 AND SF=OF
JGE / JNL	Greater or equal	SF=OF
JL / JNGE	Less than	SF≠OF
JLE / JNG	Less or equal	ZF=1 OR SF≠OF

Unsigned Comparisons

Reference

Unsigned Jump Instructions

Instruction	Condition	Flags
JA / JNBE	Above	CF=0 AND ZF=0
JAE / JNB / JNC	Above or equal	CF=0
JB / JNAE / JC	Below	CF=1
JBE / JNA	Below or equal	CF=1 OR ZF=1

Flag-Based Jumps

These jumps test individual CPU flags directly:

Instruction	Meaning	Flag Test	Common Use
JZ / JE	Jump if Zero/Equal	ZF=1	After CMP, TEST
JNZ / JNE	Jump if Not Zero	ZF=0	Loop until zero
JS	Jump if Sign (negative)	SF=1	Check negative result
JNS	Jump if Not Sign	SF=0	Check positive result
JO	Jump if Overflow	OF=1	Signed overflow check
JNO	Jump if No Overflow	OF=0	Safe arithmetic path
JP / JPE	Jump if Parity Even	PF=1	Rarely used (FP compares)
JNP / JPO	Jump if Parity Odd	PF=0	Rarely used
JC	Jump if Carry	CF=1	Unsigned overflow
JNC	Jump if No Carry	CF=0	No borrow in subtraction

; Check for negative result
    sub rax, rbx
    js .handle_negative     ; Jump if result is negative (SF=1)
    ; ... positive path ...
.handle_negative:
    neg rax                 ; Make it positive
    
; Check for overflow in addition
    add eax, ebx
    jo .overflow_error      ; Signed overflow occurred!
    ; ... continue normally ...
.overflow_error:
    ; Handle overflow...

Common Control Flow Patterns

Assembly control flow patterns showing if-else branches, loops, and switch jump tables — Common control flow patterns in assembly — if/else branching, counted loops, and switch/jump table implementations.

If/Else in Assembly

; C: if (rax == 10) { rbx = 1; } else { rbx = 0; }
    cmp rax, 10
    jne .else
    mov rbx, 1
    jmp .endif
.else:
    mov rbx, 0
.endif:

Switch / Jump Tables

For switch statements with dense, sequential cases, jump tables are much faster than chained if-else:

section .data
; Jump table: array of code addresses
jump_table:
    dq .case_0
    dq .case_1
    dq .case_2
    dq .case_3

section .text
; RDI contains switch value (0-3)
my_switch:
    ; Bounds check first!
    cmp rdi, 3
    ja .default             ; If > 3, go to default
    
    ; Load address from jump table and jump
    lea rax, [rel jump_table]
    jmp [rax + rdi*8]       ; Jump to case handler

.case_0:
    mov rax, 100
    jmp .end_switch
.case_1:
    mov rax, 200
    jmp .end_switch
.case_2:
    mov rax, 300
    jmp .end_switch
.case_3:
    mov rax, 400
    jmp .end_switch
.default:
    mov rax, -1
.end_switch:
    ret

                        
                        Compiler Insight: GCC/Clang automatically generate jump tables for dense switch statements. Check with -S flag to see the assembly output!
                    

For/While Loops

; C: for (int i = 0; i < 10; i++) { sum += array[i]; }
    xor ecx, ecx        ; i = 0
    xor eax, eax        ; sum = 0
.loop:
    cmp ecx, 10
    jge .done           ; if i >= 10, exit
    add eax, [array + rcx*4]
    inc ecx
    jmp .loop
.done:

LOOP Instructions

The LOOP family decrements RCX/ECX and jumps if non-zero:

Instruction	Action	Equivalent To
LOOP label	Decrement RCX, jump if RCX ≠ 0	`dec rcx; jnz label`
LOOPE / LOOPZ	Loop while equal (ZF=1) AND RCX ≠ 0	Find first non-match
LOOPNE / LOOPNZ	Loop while not equal (ZF=0) AND RCX ≠ 0	Find first match

; Sum array using LOOP (simple but slow!)
    lea rsi, [array]
    mov rcx, 10             ; Loop count
    xor eax, eax            ; sum = 0
.sum_loop:
    add eax, [rsi]
    add rsi, 4
    loop .sum_loop          ; Decrement RCX, jump if != 0

; LOOPE example: Count leading zeros in array
    lea rsi, [array]
    mov rcx, 10
    xor eax, eax
.count_zeros:
    cmp dword [rsi], 0      ; Sets ZF if element is 0
    jne .done_counting
    inc eax                 ; Count this zero
    add rsi, 4
    loope .count_zeros      ; Continue while ZF=1 and RCX>0
.done_counting:

                        
                        Performance Warning: LOOP instructions are slower than dec rcx; jnz on modern CPUs! They exist for legacy compatibility. Use explicit counter decrements in performance-critical code.
                    

Branch Prediction

Modern CPUs predict branch outcomes to keep the pipeline full. A misprediction flushes the pipeline—costing 15-20 cycles!

Branch Prediction Pipeline Impact:

Correct Prediction (typical):
  Fetch → Decode → Execute → Retire
    T1     T2       T3        T4      (seamless flow)

Misprediction (costly!):
  Fetch → Decode → Execute → WRONG! → Flush → Refetch
    T1     T2       T3         T4       T5-T20  (15-20 cycles lost)

Branch Prediction Tips

Favor predictable branches: Loops are highly predictable (taken until exit)
Avoid data-dependent branches: if (array[i] > 50) is hard to predict
Use CMOV: Conditional moves don't branch at all
Sort data when possible: Sorted data makes comparisons predictable

Branch Predictor Demo

; Unpredictable branch (random data)
    cmp byte [rsi], 128
    jg .greater             ; 50% taken - hard to predict!
    inc r8                  ; Count <= 128
    jmp .next
.greater:
    inc r9                  ; Count > 128
.next:

; Hint: Use sorted data!
; If array is sorted, branch becomes 100% predictable:
; All values < 128 first, then all > 128

                        
                        Profile with perf: Use perf stat -e branch-misses ./program to measure branch mispredictions. High numbers indicate optimization opportunities!
                    

CMOVcc (Branchless Conditional Move)

CMOV moves data only if the condition is true—no branching, no misprediction penalty!

Comparison of branching conditional code versus branchless CMOVcc instruction — CMOVcc branchless conditional move — eliminates branch misprediction by conditionally moving data without changing control flow.

; Traditional branch (can mispredict)
    cmp rax, rbx
    jle .else
    mov rcx, rax            ; if (rax > rbx) rcx = rax
    jmp .endif
.else:
    mov rcx, rbx            ; else rcx = rbx
.endif:

; Branchless with CMOV (no misprediction possible!)
    cmp rax, rbx
    mov rcx, rbx            ; Assume rcx = rbx
    cmovg rcx, rax          ; If rax > rbx, override with rax

CMOVcc Variants

Signed	Unsigned	Condition
CMOVG, CMOVNLE	CMOVA, CMOVNBE	Greater / Above
CMOVGE, CMOVNL	CMOVAE, CMOVNB	≥ / Above-Equal
CMOVL, CMOVNGE	CMOVB, CMOVNAE	Less / Below
CMOVLE, CMOVNG	CMOVBE, CMOVNA	≤ / Below-Equal
CMOVE, CMOVZ		Equal / Zero
CMOVNE, CMOVNZ		Not Equal

When to Use CMOV vs Branch

; CMOV is best when:
; 1. Branch is unpredictable (random data)
; 2. Both paths are simple (just a move)

; Branch is best when:
; 1. Branch is predictable (loops, sorted data)
; 2. One path has expensive computation

; Example: Compute absolute value
; CMOV version (always good):
    mov rbx, rax
    neg rax                 ; rax = -rax
    test rbx, rbx
    cmovs rax, rbx          ; If original was negative, use negated

; Alternative using AND (also branchless):
    mov rbx, rax
    sar rbx, 63             ; All 1s if negative, all 0s if positive
    xor rax, rbx            ; Flip bits if negative
    sub rax, rbx            ; Add 1 if negative (two's complement magic)

Exercise: Branchless Min/Max

Write branchless functions for min(a, b) and max(a, b):

; min(rdi, rsi) -> rax
min_func:
    cmp rdi, rsi
    mov rax, rsi            ; Assume rsi is smaller
    cmovl rax, rdi          ; If rdi < rsi, use rdi
    ret

; max(rdi, rsi) -> rax
max_func:
    cmp rdi, rsi
    mov rax, rdi            ; Assume rdi is larger
    cmovl rax, rsi          ; If rdi < rsi, use rsi instead
    ret