Back to Technology

x86 Assembly Series Part 1: Assembly Language Fundamentals & Toolchain Setup

February 6, 2026 Wasil Zafar 25 min read

Understand what assembly language really is, how it relates to machine code and micro-ops, and master the build pipeline from source to executable. Write your first assembly programs for Linux and Windows.

Table of Contents

  1. What is Assembly?
  2. The Build Pipeline
  3. Object File Formats
  4. First Programs

What Assembly Language Really Is

Core Understanding: Assembly language is a human-readable representation of machine code. Each assembly instruction typically maps to one CPU instruction, making it the lowest-level programming language that's still readable by humans.

When you write in a high-level language like Python or C++, your code gets transformed through multiple stages before the CPU can execute it. Assembly language sits just one step above raw binary machine code.

Concept

The Language Hierarchy

From human to machine:

  1. High-Level Languages (Python, Java, C++) → Human-friendly abstractions
  2. Assembly Language → Human-readable CPU instructions
  3. Machine Code → Binary opcodes the CPU decodes
  4. Micro-operations → Internal CPU operations (invisible to programmers)

Assembly vs Machine Code vs Micro-ops

These three levels of code representation are often confused. Let's clarify each with concrete examples:

Real-World Analogy: Think of it like cooking instructions:
  • Assembly = "Sauté the onions for 5 minutes" (human instructions)
  • Machine Code = Recipe in a foreign language you can't read (encoded instructions)
  • Micro-ops = Individual muscle movements to chop, stir, adjust flame (internal breakdown)

Assembly Language

Assembly is the human-readable representation using mnemonics (short names) for operations:

; Assembly instruction
mov     rax, 42         ; Move the value 42 into register RAX
add     rbx, rax        ; Add RAX to RBX, store result in RBX

Machine Code

Machine code is the binary encoding of those instructions—what the CPU actually reads:

Assembly:        mov rax, 42
Machine Code:    48 C7 C0 2A 00 00 00  (7 bytes)

Breakdown:
  48          REX.W prefix (indicates 64-bit operand)
  C7 C0       Opcode for MOV r64, imm32 (to RAX)
  2A 00 00 00 Immediate value 42 (0x2A) in little-endian

See It Yourself: Disassemble Machine Code

# Write machine code bytes to a file
echo -n -e '\x48\xc7\xc0\x2a\x00\x00\x00' > raw.bin

Linux

# Disassemble with objdump
objdump -D -b binary -m i386:x86-64 raw.bin

# Output shows:
# 0:  48 c7 c0 2a 00 00 00    mov    rax,0x2a

All Platforms (ships with NASM)

# Disassemble raw binary with ndisasm (comes with NASM)
ndisasm -b 64 raw.bin

# Output shows:
# 00000000  48C7C02A000000    mov rax,0x2a

macOS

# View raw bytes with hexdump
hexdump -C raw.bin

# Disassemble a compiled Mach-O binary with otool
make clean && make
otool -t -v ./main

Micro-operations (μops)

Modern x86 CPUs internally break down complex CISC instructions into simpler RISC-like micro-operations. This is invisible to programmers but affects performance:

Assembly instruction:   add [rbx], rax
                        (Add RAX to memory at address in RBX)

Internal μops (simplified):
  1. μop: Load value from memory address in RBX → temp register
  2. μop: Add RAX + temp → result register  
  3. μop: Store result → memory address in RBX

This single assembly instruction becomes 3 micro-operations internally!
Why Micro-ops Matter: Instructions that look simple might decompose into many μops, affecting:
  • Execution latency (cycles to complete)
  • Throughput (how many per cycle)
  • Out-of-order execution scheduling
We'll explore this deeply in Part 22: Advanced Optimization.

Quick Comparison Table

Aspect Assembly Machine Code Micro-ops
Form Text mnemonics Binary bytes Internal CPU signals
Created by Programmer Assembler CPU decoder
Visible to Programmers CPU, debuggers CPU only (mostly)
Relationship 1:1 with machine code 1:1 with assembly 1:many from machine code
Documentation Intel/AMD manuals Intel/AMD manuals Agner Fog, uops.info

Why and When Assembly Matters

Use Cases

When You Need Assembly

  • OS Kernels: Boot code, interrupt handlers, context switching
  • Embedded Systems: Direct hardware control, minimal footprint
  • Performance-Critical Code: SIMD optimizations, cryptography
  • Reverse Engineering: Malware analysis, security research
  • Compiler Development: Understanding code generation

The Build Pipeline

Understanding how assembly code becomes an executable is crucial. The pipeline consists of distinct stages:

Pipeline: Source (.asm) → Assemble → Object (.o/.obj) → Link → Executable → Load → Execute

Stage 1: Assemble

The assembler converts your assembly source into an object file containing machine code and metadata:

# NASM assembling to ELF64 object file
nasm -f elf64 program.asm -o program.o

# NASM assembling to Win64 object file
nasm -f win64 program.asm -o program.obj

The linker combines object files and resolves symbols to create an executable:

# Linux: Link with ld
ld -o program program.o

# Windows: Link with MSVC linker
link /SUBSYSTEM:CONSOLE program.obj

Stage 3: Load & Execute

When you run a program, the operating system's loader performs several crucial steps before your code executes:

What the Loader Does:
  1. Parse executable header (ELF/PE) to understand memory layout
  2. Create process with new virtual address space
  3. Map sections into memory with correct permissions (read/write/execute)
  4. Perform relocations (adjust addresses if loaded at different base)
  5. Load shared libraries and resolve dynamic symbols
  6. Set up stack with command-line arguments and environment
  7. Transfer control to entry point (_start or main)

Process Memory Layout

After loading, your process has a specific memory layout:

High Address (e.g., 0x7FFF...)
    ┌─────────────────────────┐
    │       Stack             │  ← Grows downward (RSP)
    │         ↓               │     Local variables, return addresses
    │                         │
    │      (unmapped)         │
    │                         │
    │         ↑               │
    │       Heap              │  ← Grows upward (brk/mmap)
    │                         │     Dynamic allocations
    ├─────────────────────────┤
    │       .bss              │  ← Uninitialized data (zeroed)
    ├─────────────────────────┤
    │       .data             │  ← Initialized data (read/write)
    ├─────────────────────────┤
    │       .rodata           │  ← Read-only data (constants, strings)
    ├─────────────────────────┤
    │       .text             │  ← Code section (read/execute)
    └─────────────────────────┘
Low Address (e.g., 0x400000)

Examining the Load Process

# Trace system calls during program load (Linux)
strace ./hello 2>&1 | head -30

# Key syscalls you'll see:
# execve("./hello", ...)         - Execute the program
# mmap(NULL, ..., PROT_READ)     - Map ELF header
# mmap(0x400000, ..., PROT_EXEC) - Map .text section
# mmap(0x600000, ..., PROT_WRITE)- Map .data section
# write(1, "Hello, World!\n", 14)- Our actual syscall!
# exit_group(0)                  - Exit

Exercise: Watch Your Program Load

# Use GDB to observe the loader
gdb ./hello

(gdb) starti              # Stop at the very first instruction
                          # (This is in the dynamic linker, not your code!)

(gdb) info proc mappings  # Show memory map
(gdb) info files          # Show loaded sections

(gdb) break _start        # Break at your entry point
(gdb) continue            # Run to _start
(gdb) x/10i $rip          # Examine your code!

Entry Point vs Main

There's often confusion about where execution actually begins:

Program Type True Entry Point Notes
Pure assembly (no libc) _start Directly to your code
C program (with libc) _start (in crt0) C runtime calls main()
PIE executable Dynamic linker first Then to _start

Object File Formats

ELF (Executable and Linkable Format)

Used by Linux, BSD, and many Unix-like systems. ELF files contain organized sections for code, data, symbols, and relocation information.

ELF Structure Overview

┌─────────────────────────────────┐
│         ELF Header              │  64 bytes (64-bit)
│    Magic: 7F 45 4C 46           │  Identifies as ELF
│    Class: 64-bit                │  
│    Entry Point: 0x401000        │  Where execution begins
├─────────────────────────────────┤
│      Program Headers            │  Describe segments for loading
│   (how to load into memory)     │  
├─────────────────────────────────┤
│      Section Headers            │  Describe sections for linking
│   (logical organization)        │  
├─────────────────────────────────┤
│         .text                   │  Executable code
├─────────────────────────────────┤
│        .rodata                  │  Read-only data (strings)
├─────────────────────────────────┤
│         .data                   │  Initialized read/write data
├─────────────────────────────────┤
│         .bss                    │  Uninitialized data (zeroed)
├─────────────────────────────────┤
│        .symtab                  │  Symbol table
├─────────────────────────────────┤
│        .strtab                  │  String table (symbol names)
└─────────────────────────────────┘

Examining ELF Files

# View ELF header
readelf -h hello

# Output shows:
#   Magic:   7f 45 4c 46 02 01 01 00 ...
#   Type:    EXEC (Executable file)
#   Entry point address: 0x401000

# View section headers
readelf -S hello

# View program headers (segments)
readelf -l hello

# Disassemble .text section
objdump -d hello

# View all symbols
nm hello

Exercise: Decode the ELF Magic

# Read first 16 bytes of any ELF file
xxd -l 16 hello

# 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
#           │  │  │  │ │  │  │
#           │  │  │  │ │  │  └─ OS/ABI (0 = System V)
#           │  │  │  │ │  └─── Endianness (1 = little)
#           │  │  │  │ └────── ELF version (1)
#           │  │  │  └──────── Class (2 = 64-bit)
#           │  │  └─────────── 'F' (0x46)
#           │  └────────────── 'L' (0x4C)  
#           └───────────────── 'E' (0x45)
#           └───────────────── Magic: 0x7F

PE (Portable Executable)

Used by Windows for .exe and .dll files. PE evolved from COFF format and maintains a DOS stub for backward compatibility.

PE Structure Overview

┌─────────────────────────────────┐
│         DOS Header              │  64 bytes
│    Magic: 4D 5A ("MZ")          │  Mark Zbikowski's initials!
│    PE offset at 0x3C            │  
├─────────────────────────────────┤
│         DOS Stub                │  "This program cannot be run..."
│    (Legacy compatibility)       │  
├─────────────────────────────────┤
│       PE Signature              │  "PE\0\0" (50 45 00 00)
├─────────────────────────────────┤
│       COFF Header               │  Machine type, section count
├─────────────────────────────────┤
│    Optional Header              │  Entry point, image base, etc.
│    (not actually optional!)     │  
├─────────────────────────────────┤
│     Section Headers             │  .text, .data, .rdata, etc.
├─────────────────────────────────┤
│         .text                   │  Executable code
├─────────────────────────────────┤
│        .rdata                   │  Read-only data, imports
├─────────────────────────────────┤
│         .data                   │  Initialized data
├─────────────────────────────────┤
│        .idata                   │  Import table (DLL references)
└─────────────────────────────────┘

Examining PE Files

# Windows: Use dumpbin (from Visual Studio)
dumpbin /headers hello.exe
dumpbin /disasm hello.exe
dumpbin /imports hello.exe

# Linux: Use objdump with PE support
objdump -x hello.exe

# Or use pe-parse/pefile (Python)
pip install pefile
python -c "import pefile; pe = pefile.PE('hello.exe'); print(pe.dump_info())"
History Note: "MZ" in the DOS header stands for Mark Zbikowski, an architect of MS-DOS. Every Windows executable still starts with these bytes—40+ years of backward compatibility!

ELF vs PE Quick Comparison

Feature ELF (Linux) PE (Windows)
Magic 7F 45 4C 46 (.ELF) 4D 5A (MZ) + PE\0\0
Code section .text .text
Read-only data .rodata .rdata
Typical base address 0x400000 0x140000000 (64-bit)
Analysis tool readelf, objdump dumpbin, PE-bear

Your First Assembly Programs

Hello World (Linux x86-64)

; hello.asm - Linux x86-64 Hello World
; Assemble: nasm -f elf64 hello.asm -o hello.o
; Link: ld hello.o -o hello
; Run: ./hello

section .data
    msg db "Hello, World!", 10    ; String with newline
    len equ $ - msg               ; Calculate string length

section .text
    global _start

_start:
    ; Write syscall
    mov rax, 1          ; syscall: write
    mov rdi, 1          ; fd: stdout
    mov rsi, msg        ; buffer address
    mov rdx, len        ; buffer length
    syscall

    ; Exit syscall
    mov rax, 60         ; syscall: exit
    xor rdi, rdi        ; status: 0
    syscall
Save & Compile: hello.asm

Linux

nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

macOS (change _start_main, syscall numbers differ — see Part 0)

nasm -f macho64 hello.asm -o hello.o
ld -macos_version_min 10.13 -e _main -static hello.o -o hello

Windows (see hello_win.asm below for native Windows version)

nasm -f win64 hello.asm -o hello.obj
link /subsystem:console /entry:_start hello.obj /out:hello.exe

Hello World (Windows x86-64)

Windows uses a completely different approach—instead of direct syscalls, we call Win32 API functions through DLL imports:

; hello_win.asm - Windows x64 Console Hello World
; Assemble: nasm -f win64 hello_win.asm -o hello_win.obj
; Link: link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
; Or with GoLink: golink /console hello_win.obj kernel32.dll

bits 64
default rel

section .data
    msg db "Hello, World!", 13, 10, 0    ; CRLF + null terminator
    msg_len equ $ - msg - 1                ; Length without null

section .bss
    written resq 1                         ; Bytes written (output)

section .text
    global main
    extern GetStdHandle
    extern WriteConsoleA
    extern ExitProcess

main:
    ; Set up stack frame (shadow space required by Win64 ABI)
    sub rsp, 40                  ; 32 bytes shadow + 8 for alignment

    ; GetStdHandle(STD_OUTPUT_HANDLE)
    mov rcx, -11                 ; STD_OUTPUT_HANDLE = -11
    call GetStdHandle
    mov rbx, rax                 ; Save handle in rbx

    ; WriteConsoleA(handle, buffer, length, &written, NULL)
    mov rcx, rbx                 ; Handle
    lea rdx, [msg]               ; Buffer pointer
    mov r8d, msg_len             ; Number of chars to write
    lea r9, [written]            ; Pointer to bytes written
    mov qword [rsp+32], 0        ; Reserved (5th param on stack)
    call WriteConsoleA

    ; ExitProcess(0)
    xor rcx, rcx                 ; Exit code 0
    call ExitProcess
Save & Compile: hello_win.asm

Windows

nasm -f win64 hello_win.asm -o hello_win.obj
link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
hello_win.exe

Alternative: golink /console hello_win.obj kernel32.dll

Win64 Calling Convention:
  • Parameters: RCX, RDX, R8, R9 for first 4 arguments
  • Shadow space: Always reserve 32 bytes on stack before calls
  • Stack alignment: Must be 16-byte aligned at CALL instruction
  • Return value: RAX
  • Caller-saved: RAX, RCX, RDX, R8-R11

Alternative approach using MASM syntax:

; hello_masm.asm - MASM syntax version
; Build: ml64 /c hello_masm.asm
; Link: link /SUBSYSTEM:CONSOLE hello_masm.obj kernel32.lib

extern GetStdHandle : proc
extern WriteConsoleA : proc  
extern ExitProcess : proc

.data
msg     db "Hello, World!", 13, 10, 0
msgLen  equ $ - msg - 1

.data?
written dq ?

.code
main proc
    sub rsp, 40                  ; Shadow space + alignment
    
    mov rcx, -11
    call GetStdHandle
    
    mov rcx, rax                 ; Handle
    lea rdx, msg                 ; Buffer
    mov r8d, msgLen              ; Length
    lea r9, written              ; Output count
    mov qword ptr [rsp+32], 0    ; Reserved
    call WriteConsoleA
    
    xor ecx, ecx
    call ExitProcess
main endp
end

Running on Bare Metal

The ultimate assembly experience: code that runs with no OS, directly on hardware (or emulator). This boot sector prints "Hi" to the screen using BIOS interrupts:

; boot.asm - Simple boot sector (512 bytes, runs at 0x7C00)
; Assemble: nasm -f bin boot.asm -o boot.bin
; Run: qemu-system-x86_64 -drive format=raw,file=boot.bin

bits 16                     ; 16-bit real mode
org 0x7C00                  ; BIOS loads us here

start:
    ; Set up segments (BIOS doesn't guarantee these)
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00          ; Stack below our code

    ; Clear screen (BIOS video interrupt)
    mov ax, 0x0003          ; 80x25 text mode
    int 0x10

    ; Print 'H'
    mov ah, 0x0E            ; Teletype output
    mov al, 'H'
    int 0x10

    ; Print 'i'
    mov al, 'i'
    int 0x10

    ; Print '!'
    mov al, '!'
    int 0x10

.halt:
    hlt                     ; Halt CPU (saves power)
    jmp .halt               ; Loop forever if interrupted

; Boot sector signature (must be at bytes 510-511)
times 510 - ($ - $$) db 0   ; Pad with zeros
dw 0xAA55                   ; Boot signature
Save & Compile: boot.asm

All Platforms (flat binary — no OS-specific linking)

nasm -f bin boot.asm -o boot.bin
qemu-system-x86_64 -drive format=raw,file=boot.bin
Boot Process Explained:
  1. Power on: CPU starts in 16-bit real mode at 0xFFFF0 (BIOS ROM)
  2. BIOS POST: Initializes hardware, tests memory
  3. Boot search: BIOS reads first 512 bytes from boot device
  4. Signature check: Last two bytes must be 0x55, 0xAA
  5. Load & jump: BIOS loads sector to 0x7C00 and jumps there
  6. Your code runs! You now control the entire machine

Slightly More Practical Boot Sector

; boot_msg.asm - Boot sector with string printing
; Assemble & run same as above

bits 16
org 0x7C00

start:
    xor ax, ax
    mov ds, ax
    mov es, ax

    ; Print welcome message
    mov si, welcome_msg
    call print_string

    ; Print hex value demo
    mov si, hex_msg
    call print_string
    mov ax, 0xDEAD
    call print_hex

    jmp $                   ; Infinite loop ($ = current address)

; Print null-terminated string from SI
print_string:
    pusha                   ; Save all registers
    mov ah, 0x0E            ; BIOS teletype
.loop:
    lodsb                   ; Load byte from [SI] into AL, increment SI
    test al, al             ; Check for null terminator
    jz .done
    int 0x10                ; Print character
    jmp .loop
.done:
    popa
    ret

; Print AX as 4 hex digits
print_hex:
    pusha
    mov cx, 4               ; 4 hex digits
.loop:
    rol ax, 4               ; Rotate left, bringing high nibble to low
    mov bx, ax
    and bx, 0x0F            ; Isolate low nibble
    mov bl, [hex_chars + bx]; Convert to ASCII
    push ax
    mov ah, 0x0E
    mov al, bl
    int 0x10
    pop ax
    loop .loop
    popa
    ret

hex_chars: db "0123456789ABCDEF"
welcome_msg: db "Boot sector loaded!", 13, 10, 0
hex_msg: db "Value: 0x", 0

times 510 - ($ - $$) db 0
dw 0xAA55
Save & Compile: boot_msg.asm

All Platforms (flat binary — no OS-specific linking)

nasm -f bin boot_msg.asm -o boot_msg.bin
qemu-system-x86_64 -drive format=raw,file=boot_msg.bin

Exercise: Your First Boot Sector

# Create and test your boot sector
nasm -f bin boot.asm -o boot.bin

# Verify size (should be exactly 512 bytes)
ls -la boot.bin

# Verify boot signature
xxd boot.bin | tail -1
# Should end with: .... 55aa

# Run in QEMU (no OS, no drivers - pure bare metal!)
qemu-system-x86_64 -drive format=raw,file=boot.bin

# Debug with QEMU + GDB
qemu-system-x86_64 -drive format=raw,file=boot.bin -s -S &
gdb -ex "target remote localhost:1234" -ex "set architecture i8086"

Challenge: Modify the boot sector to print your name, then print it in a different color (hint: use AH=0x09 with BL for color attribute).

Next Steps

Now that you understand what assembly is and how the build pipeline works, we'll dive into the CPU architecture that executes these instructions.

Technology