Back to Technology

x86 Assembly Series Part 4: Instruction Encoding & Binary Layout

February 6, 2026 Wasil Zafar 30 min read

Dive deep into how x86 assembly instructions are encoded into machine code. Learn about opcodes, prefixes, ModRM byte, SIB byte, displacements, immediates, and how disassemblers decode variable-length instructions.

Table of Contents

  1. Instruction Format
  2. ModRM & SIB
  3. Displacement & Immediate
  4. Little-Endian Representation
  5. REX Prefixes (x86-64)
  6. Variable-Length Decoding
  7. How Disassemblers Work

Instruction Format

Key Concept: x86 instructions are variable-length (1-15 bytes). Understanding encoding is essential for writing shellcode, analyzing malware, and understanding compiler output.

Format Overview

Structure

x86 Instruction Layout

[Prefixes] [REX] [Opcode] [ModRM] [SIB] [Displacement] [Immediate]
 0-4 bytes  0-1   1-3      0-1     0-1    0,1,2,4        0,1,2,4

Each component is optional depending on the instruction. Only the opcode is always present.

Opcodes

The opcode identifies the operation to perform. Opcodes can be 1, 2, or 3 bytes:

1-byte opcodes: Most common instructions
  90       = NOP
  C3       = RET
  50-57    = PUSH reg (reg encoded in opcode itself)
  B8-BF    = MOV reg, imm32 (reg encoded in low 3 bits)

2-byte opcodes: 0F prefix
  0F 84    = JE rel32 (conditional jump)
  0F AF    = IMUL r32, r/m32
  0F B6    = MOVZX

3-byte opcodes: 0F 38 or 0F 3A prefix
  0F 38 F0 = MOVBE (byte-swap load)
  0F 3A 0F = PALIGNR (SSSE3 shuffle)

Opcode Maps

Primary opcode byte (partial map):

        x0   x1   x2   x3   x4   x5   x6   x7   x8   x9   xA   xB   xC   xD   xE   xF
  0x  ADD  ADD  ADD  ADD  ADD  ADD  PUSH POP  OR   OR   OR   OR   OR   OR  PUSH 2-byte
  1x  ADC  ADC  ADC  ADC  ADC  ADC  PUSH POP  SBB  SBB  SBB  SBB  SBB  SBB  PUSH POP
  ...
  5x  PUSH PUSH PUSH PUSH PUSH PUSH PUSH PUSH POP  POP  POP  POP  POP  POP  POP  POP
  ...
  Bx  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV  MOV

Exercise: Decode an Opcode

# Disassemble a single instruction
echo -ne '\xB8\x2A\x00\x00\x00' | ndisasm -b 64 -
# Output: mov eax,0x2a

# Breakdown: B8 = MOV EAX, imm32 (B8 + register 0)
# 2A 00 00 00 = 0x0000002A in little-endian = 42

Instruction Prefixes

Prefixes modify instruction behavior. They must appear before the opcode:

Legacy Prefix Groups

Group Bytes Purpose
Group 1 (Lock/Rep) F0, F2, F3 LOCK, REPNE/REPNZ, REP/REPE/REPZ
Group 2 (Segment) 26, 2E, 36, 3E, 64, 65 ES, CS, SS, DS, FS, GS override
Group 3 (Operand Size) 66 Toggle 16/32-bit operand size
Group 4 (Address Size) 67 Toggle 32/64-bit address size

Operand-Size Override (66h)

; In 64-bit mode, default operand size is 32-bit
mov eax, [rbx]           ; B8 XX XX XX XX  (32-bit)
mov ax, [rbx]            ; 66 8B 03        (66h makes it 16-bit)
mov rax, [rbx]           ; 48 8B 03        (REX.W makes it 64-bit)

Rep Prefixes for String Operations

; F3 = REP prefix
rep movsb    ; F3 A4 - Copy RCX bytes from [RSI] to [RDI]
rep stosq    ; F3 48 AB - Fill RCX quadwords at [RDI] with RAX

; F2 = REPNE prefix
repne scasb  ; F2 AE - Scan for AL in [RDI], stop when found

LOCK Prefix for Atomics

; F0 = LOCK prefix (guarantees atomic read-modify-write)
lock inc dword [counter]     ; F0 FF 05 ... - Atomic increment
lock cmpxchg [mutex], ecx    ; F0 0F B1 0D ... - Atomic compare-exchange
Prefix Rules:
  • Only one prefix from each group is allowed
  • Order within groups doesn't matter (but conventional order helps disassemblers)
  • Invalid LOCK usage causes #UD (Undefined Opcode) exception
  • 66h prefix repurposed for SSE2 double-precision FP

ModRM & SIB Bytes

ModRM Byte Encoding

Encoding

ModRM Structure (8 bits)

| Mod (2 bits) | Reg (3 bits) | R/M (3 bits) |
|    7-6       |     5-3      |     2-0      |
  • Mod: Addressing mode (00=memory, 01/10=memory+disp, 11=register)
  • Reg: Register operand or opcode extension
  • R/M: Register/Memory operand

SIB Byte (Scale-Index-Base)

The SIB byte enables complex addressing modes: [base + index*scale + disp]

SIB Structure (8 bits):
| Scale (2 bits) | Index (3 bits) | Base (3 bits) |
|     7-6        |      5-3       |      2-0      |

Scale values:
  00 = ×1 (no scaling)
  01 = ×2
  10 = ×4
  11 = ×8

Special cases:
  Index = 100 (RSP): No index register used
  Base = 101 with Mod=00: No base, displacement only (RIP-relative)

SIB Examples

; Array access: arr[i*4]
mov eax, [rbx + rcx*4]        ; SIB = 10_001_011 = 0x8B
                               ; Scale=10(×4), Index=001(RCX), Base=011(RBX)

; 2D array: arr[row][col] where sizeof(row) = 8
mov eax, [rdi + rsi*8 + 16]   ; Base=RDI, Index=RSI, Scale=8, Disp=16

; No base, just scaled index + displacement
mov eax, [rcx*4 + table]      ; ModRM Mod=00, R/M=100 triggers SIB
                               ; SIB Base=101 means no base, use disp32

Decode Challenge

# Instruction: 8B 04 8D 00 00 00 00
# 8B = MOV r32, r/m32
# ModRM 04 = Mod=00, Reg=000(EAX), R/M=100(SIB follows)
# SIB 8D = Scale=10(×4), Index=001(ECX), Base=101(disp32, no base)
# Disp32 = 00 00 00 00
# Result: mov eax, [ecx*4 + 0x0]
Why SIB Matters: Array indexing (arr[i]) compiles to scaled addressing. Understanding SIB helps you read disassembly and optimize memory access patterns for cache efficiency.

Displacement & Immediate

These fields encode constant values embedded in the instruction.

Displacement (Memory Offset)

Displacement size depends on ModRM Mod field:
  Mod = 00: No displacement (except special cases)
  Mod = 01: 8-bit signed displacement (sign-extended)
  Mod = 10: 32-bit displacement (or 16-bit in 16-bit mode)
  Mod = 11: No memory access (register-to-register)
; No displacement
mov eax, [rbx]            ; ModRM = 03 (Mod=00)

; 8-bit displacement (efficient for small offsets)
mov eax, [rbx + 8]        ; ModRM = 43 (Mod=01), Disp8 = 08

; 32-bit displacement (required for large offsets)
mov eax, [rbx + 0x1000]   ; ModRM = 83 (Mod=10), Disp32 = 00 10 00 00

; The assembler picks the smallest encoding automatically

Immediate Values

; Immediate size matches operand size (usually)
mov al, 42               ; B0 2A  (8-bit immediate)
mov ax, 1000             ; 66 B8 E8 03  (16-bit, little-endian)
mov eax, 0x12345678      ; B8 78 56 34 12  (32-bit)
mov rax, 0x123456789ABC  ; 48 B8 BC 9A 78 56 34 12 00 00  (64-bit!!)

; Sign-extended immediates save space
add rax, 1               ; 48 83 C0 01  (8-bit sign-extended to 64)
add rax, 0x7FFFFFFF      ; 48 05 FF FF FF 7F  (32-bit sign-ext)
; Note: Can't add 64-bit immediate! Must use mov first
64-bit Immediate Limitation: Only MOV reg64, imm64 supports full 64-bit immediates. Other instructions use 32-bit sign-extended immediates. This is why mov rax, big_constant; add rbx, rax is sometimes needed.

Little-Endian Representation

x86 stores multi-byte values with the least significant byte first. This affects how you read hex dumps.

Little-Endian vs Big-Endian

Value: 0x12345678

Little-Endian (x86):        Big-Endian (Network/MIPS):
Address  Byte               Address  Byte
0x100    78 (LSB)           0x100    12 (MSB)
0x101    56                 0x101    34
0x102    34                 0x102    56
0x103    12 (MSB)           0x103    78 (LSB)

Memory view: 78 56 34 12    Memory view: 12 34 56 78

Practical Implications

; Instruction: mov eax, 0xDEADBEEF
; Encoding: B8 EF BE AD DE
;           ^^ opcode
;              ^^ ^^ ^^ ^^ immediate in little-endian!

; When you see this in a hex dump:
; 48 C7 C0 2A 00 00 00
; It's: mov rax, 0x0000002A (42 decimal)
; NOT: mov rax, 0x2A000000

Exercise: Read Addresses in Hex Dumps

# Disassemble with address display
echo -ne '\xE9\x1B\x00\x00\x00' | ndisasm -b 64 -
# Output: jmp near 0x20

# The offset 0x0000001B + instruction length (5) = 0x20
# Bytes E9 1B 00 00 00 = JMP rel32
# 1B 00 00 00 in little-endian = 0x0000001B
Network Programming: Network protocols use big-endian ("network byte order"). Use bswap instruction or htons()/ntohs() functions when sending/receiving multi-byte values over the network.

REX Prefixes (x86-64)

REX prefixes enable 64-bit operands and access to registers R8-R15.

REX Byte Structure

REX prefix: 0100 WRXB (0x40-0x4F)

  Bit 3 (W): 64-bit operand size (instead of default 32-bit)
  Bit 2 (R): Extends ModRM.reg to 4 bits (access R8-R15)
  Bit 1 (X): Extends SIB.index to 4 bits
  Bit 0 (B): Extends ModRM.r/m or SIB.base to 4 bits

REX prefix values:
  40 = REX (enables new 8-bit registers like SIL)
  48 = REX.W (64-bit operand)
  41 = REX.B (extended R/M)
  44 = REX.R (extended Reg)
  4D = REX.WRB (64-bit + extended Reg + extended R/M)

REX Examples

; Without REX
mov eax, ebx              ; 89 D8 (32-bit, uses low 8 registers)

; REX.W for 64-bit operand
mov rax, rbx              ; 48 89 D8 (64-bit)

; REX.R to access R8-R15 in Reg field
mov r8d, eax              ; 44 89 C0 (R8D, 32-bit)
mov r8, rax               ; 4C 89 C0 (R8, 64-bit, REX.WR)

; REX.B to access R8-R15 in R/M field  
mov eax, r8d              ; 41 89 C0
mov rax, r8               ; 49 89 C0 (REX.WB)

; REX.X for SIB index extension
mov eax, [rbx + r8*4]     ; 42 8B 04 83 (REX.X)

REX and 8-bit Registers

; REX presence changes 8-bit register encoding!
; Without REX: AH, CH, DH, BH accessible (codes 4-7)
; With REX: SPL, BPL, SIL, DIL accessible (codes 4-7)

mov ah, 5                 ; 80 C4 05 (no REX, AH = code 4)
mov spl, 5                ; 40 80 C4 05 (REX enables SPL = code 4)
mov r8b, 5                ; 41 B0 05 (REX.B for R8B)
Incompatibility: You cannot use AH, BH, CH, DH in the same instruction with R8-R15 or the new low-byte registers (SIL, DIL, BPL, SPL). REX presence blocks the high-byte encodings.

Variable-Length Decoding

x86's variable-length instructions (1-15 bytes) create unique challenges for both CPUs and disassemblers.

CPU Decoding Process

Modern x86 Decoder Pipeline:

1. Fetch: Load 16-32 bytes from instruction stream
          (aligned fetch, branch prediction critical)

2. Pre-decode: Find instruction boundaries
              - Scan for prefixes (0x40-0x4F = REX, 0x66, 0xF2, etc.)
              - Identify opcode (1, 2, or 3 bytes: 0F, 0F38, 0F3A)
              - Determine if ModRM/SIB/Disp/Imm follow

3. Decode: Convert to micro-ops
          - Simple: 1 µop (mov, add register)
          - Complex: Multiple µops via microcode ROM

4. Queue: Place µops in execution queue

Instruction Length Determination

Length Calculation Algorithm:

length = 0

// Count prefixes (Groups 1-4)
while (byte is legacy prefix or REX):
    length++
    advance

// Opcode (1-3 bytes)
if byte == 0x0F:
    length++
    if next == 0x38 or 0x3A:
        length++  // 3-byte opcode
    length++  // 2-byte opcode
else:
    length++  // 1-byte opcode

// ModRM present? (depends on opcode)
if opcode_needs_modrm:
    parse_modrm()
    length++
    if modrm.rm == 100b:  // SIB follows
        length++
    length += displacement_size(modrm.mod)

// Immediate? (depends on opcode)
length += immediate_size(opcode)

Decoding Challenges

Challenge Problem Solution
Variable length Can't know length until parsing Speculative pre-decode, large fetch
Prefix stacking Up to 4 mandatory + REX + VEX Prefix decoder state machine
Opcode ambiguity Same byte means different things Context-dependent decode tables
Branch targets May land mid-instruction Can't cache decoded instructions (code morphing)
Self-Modifying Code: x86 allows code modification, but the CPU's decoded instruction cache must be invalidated. The CPUID instruction is often used as a serializing fence after code modification.

How Disassemblers Work

Disassemblers face the inverse problem: given machine code bytes, recover the original assembly.

Linear Sweep Disassembly

Algorithm:
1. Start at entry point
2. Decode instruction, advance by its length
3. Repeat until end of section

Pros: Simple, fast
Cons: Fooled by:
  - Data embedded in code
  - Jump tables
  - Anti-disassembly tricks
# objdump uses linear sweep
objdump -d ./binary

# Problem: if code jumps over data, linear sweep reads data as code
# 00001000: jmp 0x1008
# 00001002: db "HELLO"   ← Linear sweep tries to decode as instructions!
# 00001008: mov eax, 1  ← Real code continues

Recursive Descent Disassembly

Algorithm:
1. Start at entry point, add to work list
2. While work list not empty:
   a. Pop address
   b. Decode instruction
   c. If unconditional jump: add target to work list
   d. If conditional jump: add both paths
   e. If call: add target AND next instruction
   f. If ret: stop this path

Pros: Follows actual control flow, skips embedded data
Cons:
  - Indirect jumps (jmp rax) can't be resolved statically
  - May miss dead code
  - Doesn't handle computed jump tables easily

Anti-Disassembly Techniques

; Opaque predicate: always true, but disassembler doesn't know
mov eax, 1
test eax, eax
jz fake_branch        ; Never taken, but disassembler follows it
                      ; Real code here
fake_branch:
  db 0xE8             ; Looks like CALL, corrupts next instruction decode

; Jump into middle of instruction
mov eax, 0x90909090   ; B8 90 90 90 90
jmp $-3               ; Jump to third 90 = NOP, skipping B8

Disassembler Tools

Tool Method Best For
objdump Linear sweep Quick inspection, well-formed binaries
ndisasm Linear sweep Raw binary blobs, boot sectors
IDA Pro / Ghidra Recursive + heuristics Malware, obfuscated code, full RE
Capstone library On-demand Building custom tools, dynamic analysis

Exercise: Compare Disassemblers

# Create a binary with embedded data
echo -ne '\xEB\x05HELLO\xB8\x01\x00\x00\x00\xC3' > test.bin

# Linear sweep (ndisasm)
ndisasm -b 64 test.bin
# 00000000  EB05      jmp short 0x7
# 00000002  48        dec eax        ← 'H' decoded as instruction!
# ...

# The jmp should skip "HELLO" but linear sweep decodes it