x86 Assembly Series Part 1: Assembly Language Fundamentals & Toolchain Setup

February 6, 2026 Wasil Zafar 25 min read

Understand what assembly language really is, how it relates to machine code and micro-ops, and master the build pipeline from source to executable. Write your first assembly programs for Linux and Windows.

What is Assembly?
The Build Pipeline
Object File Formats
- ELF (Linux)
- PE (Windows)
First Programs

What Assembly Language Really Is

                        
                        Core Understanding: Assembly language is a human-readable representation of machine code. Each assembly instruction typically maps to one CPU instruction, making it the lowest-level programming language that's still readable by humans.
                    

x86 Assembly Mastery

Your 25-step learning path • Currently on Step 2

Development Environment, Tooling & Workflow

IDEs, debuggers, build tools, workflow setup

Assembly Language Fundamentals & Toolchain Setup

Syntax basics, assemblers, linkers, object files

You Are Here

When you write in a high-level language like Python or C++, your code gets transformed through multiple stages before the CPU can execute it. Assembly language sits just one step above raw binary machine code.

Language hierarchy from high-level languages down to assembly, machine code, and micro-operations — The programming language hierarchy — from human-readable high-level languages down through assembly to binary machine code executed by the CPU

Concept

The Language Hierarchy

From human to machine:

High-Level Languages (Python, Java, C++) → Human-friendly abstractions
Assembly Language → Human-readable CPU instructions
Machine Code → Binary opcodes the CPU decodes
Micro-operations → Internal CPU operations (invisible to programmers)

Assembly vs Machine Code vs Micro-ops

These three levels of code representation are often confused. Let's clarify each with concrete examples:

                        
                        Real-World Analogy: Think of it like cooking instructions:
                        Assembly = "Sauté the onions for 5 minutes" (human instructions)
Machine Code = Recipe in a foreign language you can't read (encoded instructions)
Micro-ops = Individual muscle movements to chop, stir, adjust flame (internal breakdown)

Assembly Language

Assembly is the human-readable representation using mnemonics (short names) for operations:

; Assembly instruction
mov     rax, 42         ; Move the value 42 into register RAX
add     rbx, rax        ; Add RAX to RBX, store result in RBX

Machine Code

Machine code is the binary encoding of those instructions—what the CPU actually reads:

Assembly:        mov rax, 42
Machine Code:    48 C7 C0 2A 00 00 00  (7 bytes)

Breakdown:
  48          REX.W prefix (indicates 64-bit operand)
  C7 C0       Opcode for MOV r64, imm32 (to RAX)
  2A 00 00 00 Immediate value 42 (0x2A) in little-endian

See It Yourself: Disassemble Machine Code

# Write machine code bytes to a file
echo -n -e '\x48\xc7\xc0\x2a\x00\x00\x00' > raw.bin

Linux

# Disassemble with objdump
objdump -D -b binary -m i386:x86-64 raw.bin

# Output shows:
# 0:  48 c7 c0 2a 00 00 00    mov    rax,0x2a

All Platforms (ships with NASM)

# Disassemble raw binary with ndisasm (comes with NASM)
ndisasm -b 64 raw.bin

# Output shows:
# 00000000  48C7C02A000000    mov rax,0x2a

macOS

# View raw bytes with hexdump
hexdump -C raw.bin

# Disassemble a compiled Mach-O binary with otool
make clean && make
otool -t -v ./main

Micro-operations (μops)

Modern x86 CPUs internally break down complex CISC instructions into simpler RISC-like micro-operations. This is invisible to programmers but affects performance:

Assembly instruction:   add [rbx], rax
                        (Add RAX to memory at address in RBX)

Internal μops (simplified):
  1. μop: Load value from memory address in RBX → temp register
  2. μop: Add RAX + temp → result register  
  3. μop: Store result → memory address in RBX

This single assembly instruction becomes 3 micro-operations internally!

Why Micro-ops Matter: Instructions that look simple might decompose into many μops, affecting:

Execution latency (cycles to complete)
Throughput (how many per cycle)
Out-of-order execution scheduling

We'll explore this deeply in Part 22: Advanced Optimization.

Quick Comparison Table

Aspect	Assembly	Machine Code	Micro-ops
Form	Text mnemonics	Binary bytes	Internal CPU signals
Created by	Programmer	Assembler	CPU decoder
Visible to	Programmers	CPU, debuggers	CPU only (mostly)
Relationship	1:1 with machine code	1:1 with assembly	1:many from machine code
Documentation	Intel/AMD manuals	Intel/AMD manuals	Agner Fog, uops.info

Why and When Assembly Matters

Use Cases

When You Need Assembly

OS Kernels: Boot code, interrupt handlers, context switching
Embedded Systems: Direct hardware control, minimal footprint
Performance-Critical Code: SIMD optimizations, cryptography
Reverse Engineering: Malware analysis, security research
Compiler Development: Understanding code generation

The Build Pipeline

Understanding how assembly code becomes an executable is crucial. The pipeline consists of distinct stages:

Assembly build pipeline from source code through assembler and linker to executable — The assembly build pipeline — source (.asm) is assembled into object files (.o), then linked into a final executable ready for the OS loader

                        
                        Pipeline: Source (.asm) → Assemble → Object (.o/.obj) → Link → Executable → Load → Execute
                    

Stage 1: Assemble

The assembler converts your assembly source into an object file containing machine code and metadata:

# NASM assembling to ELF64 object file
nasm -f elf64 program.asm -o program.o

# NASM assembling to Win64 object file
nasm -f win64 program.asm -o program.obj

Stage 2: Link

The linker combines object files and resolves symbols to create an executable:

# Linux: Link with ld
ld -o program program.o

# Windows: Link with MSVC linker
link /SUBSYSTEM:CONSOLE program.obj

Stage 3: Load & Execute

When you run a program, the operating system's loader performs several crucial steps before your code executes:

Process loader mapping executable sections into virtual memory with stack, heap, and code segments — How the OS loader maps an executable into memory — parsing headers, creating the process address space, and setting up stack, heap, and code segments

                        
                        What the Loader Does:
                        Parse executable header (ELF/PE) to understand memory layout
Create process with new virtual address space
Map sections into memory with correct permissions (read/write/execute)
Perform relocations (adjust addresses if loaded at different base)
Load shared libraries and resolve dynamic symbols
Set up stack with command-line arguments and environment
Transfer control to entry point (_start or main)

                    

Process Memory Layout

After loading, your process has a specific memory layout:

High Address (e.g., 0x7FFF...)
    ┌─────────────────────────┐
    │       Stack             │  ← Grows downward (RSP)
    │         ↓               │     Local variables, return addresses
    │                         │
    │      (unmapped)         │
    │                         │
    │         ↑               │
    │       Heap              │  ← Grows upward (brk/mmap)
    │                         │     Dynamic allocations
    ├─────────────────────────┤
    │       .bss              │  ← Uninitialized data (zeroed)
    ├─────────────────────────┤
    │       .data             │  ← Initialized data (read/write)
    ├─────────────────────────┤
    │       .rodata           │  ← Read-only data (constants, strings)
    ├─────────────────────────┤
    │       .text             │  ← Code section (read/execute)
    └─────────────────────────┘
Low Address (e.g., 0x400000)

Examining the Load Process

# Trace system calls during program load (Linux)
strace ./hello 2>&1 | head -30

# Key syscalls you'll see:
# execve("./hello", ...)         - Execute the program
# mmap(NULL, ..., PROT_READ)     - Map ELF header
# mmap(0x400000, ..., PROT_EXEC) - Map .text section
# mmap(0x600000, ..., PROT_WRITE)- Map .data section
# write(1, "Hello, World!\n", 14)- Our actual syscall!
# exit_group(0)                  - Exit

Exercise: Watch Your Program Load

# Use GDB to observe the loader
gdb ./hello

(gdb) starti              # Stop at the very first instruction
                          # (This is in the dynamic linker, not your code!)

(gdb) info proc mappings  # Show memory map
(gdb) info files          # Show loaded sections

(gdb) break _start        # Break at your entry point
(gdb) continue            # Run to _start
(gdb) x/10i $rip          # Examine your code!

Entry Point vs Main

There's often confusion about where execution actually begins:

Program Type	True Entry Point	Notes
Pure assembly (no libc)	`_start`	Directly to your code
C program (with libc)	`_start` (in crt0)	C runtime calls `main()`
PIE executable	Dynamic linker first	Then to `_start`

Object File Formats

ELF (Executable and Linkable Format)

Used by Linux, BSD, and many Unix-like systems. ELF files contain organized sections for code, data, symbols, and relocation information.

ELF file format structure showing header, program headers, sections for text, data, bss, and symbol tables — ELF (Executable and Linkable Format) internal structure — header, program headers, section headers, and key sections (.text, .data, .bss, .symtab)

ELF Structure Overview

┌─────────────────────────────────┐
│         ELF Header              │  64 bytes (64-bit)
│    Magic: 7F 45 4C 46           │  Identifies as ELF
│    Class: 64-bit                │  
│    Entry Point: 0x401000        │  Where execution begins
├─────────────────────────────────┤
│      Program Headers            │  Describe segments for loading
│   (how to load into memory)     │  
├─────────────────────────────────┤
│      Section Headers            │  Describe sections for linking
│   (logical organization)        │  
├─────────────────────────────────┤
│         .text                   │  Executable code
├─────────────────────────────────┤
│        .rodata                  │  Read-only data (strings)
├─────────────────────────────────┤
│         .data                   │  Initialized read/write data
├─────────────────────────────────┤
│         .bss                    │  Uninitialized data (zeroed)
├─────────────────────────────────┤
│        .symtab                  │  Symbol table
├─────────────────────────────────┤
│        .strtab                  │  String table (symbol names)
└─────────────────────────────────┘

Examining ELF Files

# View ELF header
readelf -h hello

# Output shows:
#   Magic:   7f 45 4c 46 02 01 01 00 ...
#   Type:    EXEC (Executable file)
#   Entry point address: 0x401000

# View section headers
readelf -S hello

# View program headers (segments)
readelf -l hello

# Disassemble .text section
objdump -d hello

# View all symbols
nm hello

Exercise: Decode the ELF Magic

# Read first 16 bytes of any ELF file
xxd -l 16 hello

# 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
#           │  │  │  │ │  │  │
#           │  │  │  │ │  │  └─ OS/ABI (0 = System V)
#           │  │  │  │ │  └─── Endianness (1 = little)
#           │  │  │  │ └────── ELF version (1)
#           │  │  │  └──────── Class (2 = 64-bit)
#           │  │  └─────────── 'F' (0x46)
#           │  └────────────── 'L' (0x4C)  
#           └───────────────── 'E' (0x45)
#           └───────────────── Magic: 0x7F

PE (Portable Executable)

Used by Windows for .exe and .dll files. PE evolved from COFF format and maintains a DOS stub for backward compatibility.

PE Structure Overview

┌─────────────────────────────────┐
│         DOS Header              │  64 bytes
│    Magic: 4D 5A ("MZ")          │  Mark Zbikowski's initials!
│    PE offset at 0x3C            │  
├─────────────────────────────────┤
│         DOS Stub                │  "This program cannot be run..."
│    (Legacy compatibility)       │  
├─────────────────────────────────┤
│       PE Signature              │  "PE\0\0" (50 45 00 00)
├─────────────────────────────────┤
│       COFF Header               │  Machine type, section count
├─────────────────────────────────┤
│    Optional Header              │  Entry point, image base, etc.
│    (not actually optional!)     │  
├─────────────────────────────────┤
│     Section Headers             │  .text, .data, .rdata, etc.
├─────────────────────────────────┤
│         .text                   │  Executable code
├─────────────────────────────────┤
│        .rdata                   │  Read-only data, imports
├─────────────────────────────────┤
│         .data                   │  Initialized data
├─────────────────────────────────┤
│        .idata                   │  Import table (DLL references)
└─────────────────────────────────┘

Examining PE Files

# Windows: Use dumpbin (from Visual Studio)
dumpbin /headers hello.exe
dumpbin /disasm hello.exe
dumpbin /imports hello.exe

# Linux: Use objdump with PE support
objdump -x hello.exe

# Or use pe-parse/pefile (Python)
pip install pefile
python -c "import pefile; pe = pefile.PE('hello.exe'); print(pe.dump_info())"

                        
                        History Note: "MZ" in the DOS header stands for Mark Zbikowski, an architect of MS-DOS. Every Windows executable still starts with these bytes—40+ years of backward compatibility!
                    

ELF vs PE Quick Comparison

Feature	ELF (Linux)	PE (Windows)
Magic	`7F 45 4C 46` (.ELF)	`4D 5A` (MZ) + PE\0\0
Code section	`.text`	`.text`
Read-only data	`.rodata`	`.rdata`
Typical base address	`0x400000`	`0x140000000` (64-bit)
Analysis tool	`readelf`, `objdump`	`dumpbin`, PE-bear

Your First Assembly Programs

Hello World (Linux x86-64)

; hello.asm - Linux x86-64 Hello World
; Assemble: nasm -f elf64 hello.asm -o hello.o
; Link: ld hello.o -o hello
; Run: ./hello

section .data
    msg db "Hello, World!", 10    ; String with newline
    len equ $ - msg               ; Calculate string length

section .text
    global _start

_start:
    ; Write syscall
    mov rax, 1          ; syscall: write
    mov rdi, 1          ; fd: stdout
    mov rsi, msg        ; buffer address
    mov rdx, len        ; buffer length
    syscall

    ; Exit syscall
    mov rax, 60         ; syscall: exit
    xor rdi, rdi        ; status: 0
    syscall

Save & Compile: `hello.asm`

Linux

nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

macOS (change _start → _main, syscall numbers differ — see Part 0)

nasm -f macho64 hello.asm -o hello.o
ld -macos_version_min 10.13 -e _main -static hello.o -o hello

Windows (see hello_win.asm below for native Windows version)

nasm -f win64 hello.asm -o hello.obj
link /subsystem:console /entry:_start hello.obj /out:hello.exe

Hello World (Windows x86-64)

Windows uses a completely different approach—instead of direct syscalls, we call Win32 API functions through DLL imports:

Comparison of Linux direct syscall interface versus Windows Win32 API DLL-based system call approach — Linux vs Windows system call approaches — Linux uses direct syscall instructions while Windows routes through Win32 API DLL imports

; hello_win.asm - Windows x64 Console Hello World
; Assemble: nasm -f win64 hello_win.asm -o hello_win.obj
; Link: link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
; Or with GoLink: golink /console hello_win.obj kernel32.dll

bits 64
default rel

section .data
    msg db "Hello, World!", 13, 10, 0    ; CRLF + null terminator
    msg_len equ $ - msg - 1                ; Length without null

section .bss
    written resq 1                         ; Bytes written (output)

section .text
    global main
    extern GetStdHandle
    extern WriteConsoleA
    extern ExitProcess

main:
    ; Set up stack frame (shadow space required by Win64 ABI)
    sub rsp, 40                  ; 32 bytes shadow + 8 for alignment

    ; GetStdHandle(STD_OUTPUT_HANDLE)
    mov rcx, -11                 ; STD_OUTPUT_HANDLE = -11
    call GetStdHandle
    mov rbx, rax                 ; Save handle in rbx

    ; WriteConsoleA(handle, buffer, length, &written, NULL)
    mov rcx, rbx                 ; Handle
    lea rdx, [msg]               ; Buffer pointer
    mov r8d, msg_len             ; Number of chars to write
    lea r9, [written]            ; Pointer to bytes written
    mov qword [rsp+32], 0        ; Reserved (5th param on stack)
    call WriteConsoleA

    ; ExitProcess(0)
    xor rcx, rcx                 ; Exit code 0
    call ExitProcess

Save & Compile: `hello_win.asm`

Windows

nasm -f win64 hello_win.asm -o hello_win.obj
link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
hello_win.exe

Alternative: golink /console hello_win.obj kernel32.dll

                        
                        Win64 Calling Convention:
                        Parameters: RCX, RDX, R8, R9 for first 4 arguments
Shadow space: Always reserve 32 bytes on stack before calls
Stack alignment: Must be 16-byte aligned at CALL instruction
Return value: RAX
Caller-saved: RAX, RCX, RDX, R8-R11

                    

Alternative approach using MASM syntax:

; hello_masm.asm - MASM syntax version
; Build: ml64 /c hello_masm.asm
; Link: link /SUBSYSTEM:CONSOLE hello_masm.obj kernel32.lib

extern GetStdHandle : proc
extern WriteConsoleA : proc  
extern ExitProcess : proc

.data
msg     db "Hello, World!", 13, 10, 0
msgLen  equ $ - msg - 1

.data?
written dq ?

.code
main proc
    sub rsp, 40                  ; Shadow space + alignment
    
    mov rcx, -11
    call GetStdHandle
    
    mov rcx, rax                 ; Handle
    lea rdx, msg                 ; Buffer
    mov r8d, msgLen              ; Length
    lea r9, written              ; Output count
    mov qword ptr [rsp+32], 0    ; Reserved
    call WriteConsoleA
    
    xor ecx, ecx
    call ExitProcess
main endp
end

Running on Bare Metal

The ultimate assembly experience: code that runs with no OS, directly on hardware (or emulator). This boot sector prints "Hi" to the screen using BIOS interrupts:

; boot.asm - Simple boot sector (512 bytes, runs at 0x7C00)
; Assemble: nasm -f bin boot.asm -o boot.bin
; Run: qemu-system-x86_64 -drive format=raw,file=boot.bin

bits 16                     ; 16-bit real mode
org 0x7C00                  ; BIOS loads us here

start:
    ; Set up segments (BIOS doesn't guarantee these)
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00          ; Stack below our code

    ; Clear screen (BIOS video interrupt)
    mov ax, 0x0003          ; 80x25 text mode
    int 0x10

    ; Print 'H'
    mov ah, 0x0E            ; Teletype output
    mov al, 'H'
    int 0x10

    ; Print 'i'
    mov al, 'i'
    int 0x10

    ; Print '!'
    mov al, '!'
    int 0x10

.halt:
    hlt                     ; Halt CPU (saves power)
    jmp .halt               ; Loop forever if interrupted

; Boot sector signature (must be at bytes 510-511)
times 510 - ($ - $$) db 0   ; Pad with zeros
dw 0xAA55                   ; Boot signature

Save & Compile: `boot.asm`

All Platforms (flat binary — no OS-specific linking)

nasm -f bin boot.asm -o boot.bin
qemu-system-x86_64 -drive format=raw,file=boot.bin

                        
                        Boot Process Explained:
                        Power on: CPU starts in 16-bit real mode at 0xFFFF0 (BIOS ROM)
BIOS POST: Initializes hardware, tests memory
Boot search: BIOS reads first 512 bytes from boot device
Signature check: Last two bytes must be 0x55, 0xAA
Load & jump: BIOS loads sector to 0x7C00 and jumps there
Your code runs! You now control the entire machine

                    

Slightly More Practical Boot Sector

; boot_msg.asm - Boot sector with string printing
; Assemble & run same as above

bits 16
org 0x7C00

start:
    xor ax, ax
    mov ds, ax
    mov es, ax

    ; Print welcome message
    mov si, welcome_msg
    call print_string

    ; Print hex value demo
    mov si, hex_msg
    call print_string
    mov ax, 0xDEAD
    call print_hex

    jmp $                   ; Infinite loop ($ = current address)

; Print null-terminated string from SI
print_string:
    pusha                   ; Save all registers
    mov ah, 0x0E            ; BIOS teletype
.loop:
    lodsb                   ; Load byte from [SI] into AL, increment SI
    test al, al             ; Check for null terminator
    jz .done
    int 0x10                ; Print character
    jmp .loop
.done:
    popa
    ret

; Print AX as 4 hex digits
print_hex:
    pusha
    mov cx, 4               ; 4 hex digits
.loop:
    rol ax, 4               ; Rotate left, bringing high nibble to low
    mov bx, ax
    and bx, 0x0F            ; Isolate low nibble
    mov bl, [hex_chars + bx]; Convert to ASCII
    push ax
    mov ah, 0x0E
    mov al, bl
    int 0x10
    pop ax
    loop .loop
    popa
    ret

hex_chars: db "0123456789ABCDEF"
welcome_msg: db "Boot sector loaded!", 13, 10, 0
hex_msg: db "Value: 0x", 0

times 510 - ($ - $$) db 0
dw 0xAA55

Save & Compile: `boot_msg.asm`

All Platforms (flat binary — no OS-specific linking)

nasm -f bin boot_msg.asm -o boot_msg.bin
qemu-system-x86_64 -drive format=raw,file=boot_msg.bin

Exercise: Your First Boot Sector

# Create and test your boot sector
nasm -f bin boot.asm -o boot.bin

# Verify size (should be exactly 512 bytes)
ls -la boot.bin

# Verify boot signature
xxd boot.bin | tail -1
# Should end with: .... 55aa

# Run in QEMU (no OS, no drivers - pure bare metal!)
qemu-system-x86_64 -drive format=raw,file=boot.bin

# Debug with QEMU + GDB
qemu-system-x86_64 -drive format=raw,file=boot.bin -s -S &
gdb -ex "target remote localhost:1234" -ex "set architecture i8086"

Challenge: Modify the boot sector to print your name, then print it in a different color (hint: use AH=0x09 with BL for color attribute).

Next Steps

Now that you understand what assembly is and how the build pipeline works, we'll dive into the CPU architecture that executes these instructions.

Technology

x86 Assembly Series Part 1: Assembly Language Fundamentals & Toolchain Setup

Table of Contents

What Assembly Language Really Is

x86 Assembly Mastery

Development Environment, Tooling & Workflow

Assembly Language Fundamentals & Toolchain Setup

x86 CPU Architecture Overview

Registers – Complete Deep Dive

Instruction Encoding & Binary Layout

NASM Syntax, Directives & Macros

Complete Assembler Comparison

Memory Addressing Modes

Stack Internals & Calling Conventions

Control Flow & Procedures

Integer, Bitwise & Arithmetic Operations

Floating Point & SIMD Foundations

SIMD, Vectorization & Performance

System Calls, Interrupts & Privilege Transitions

Debugging & Reverse Engineering

Linking, Relocation & Loader Behavior

x86-64 Long Mode & Advanced Features

Assembly + C/C++ Interoperability

Memory Protection & Security Concepts

Bootloaders & Bare-Metal Programming

Kernel-Level Assembly

Complete Emulator & Simulator Guide

Advanced Optimization & CPU Internals

Real-World Assembly Projects

Assembly Mastery Capstone