What Assembly Language Really Is
Core Understanding: Assembly language is a human-readable representation of machine code. Each assembly instruction typically maps to one CPU instruction, making it the lowest-level programming language that's still readable by humans.
When you write in a high-level language like Python or C++, your code gets transformed through multiple stages before the CPU can execute it. Assembly language sits just one step above raw binary machine code.
Concept
The Language Hierarchy
From human to machine:
- High-Level Languages (Python, Java, C++) → Human-friendly abstractions
- Assembly Language → Human-readable CPU instructions
- Machine Code → Binary opcodes the CPU decodes
- Micro-operations → Internal CPU operations (invisible to programmers)
Assembly vs Machine Code vs Micro-ops
These three levels of code representation are often confused. Let's clarify each with concrete examples:
Real-World Analogy: Think of it like cooking instructions:
- Assembly = "Sauté the onions for 5 minutes" (human instructions)
- Machine Code = Recipe in a foreign language you can't read (encoded instructions)
- Micro-ops = Individual muscle movements to chop, stir, adjust flame (internal breakdown)
Assembly Language
Assembly is the human-readable representation using mnemonics (short names) for operations:
; Assembly instruction
mov rax, 42 ; Move the value 42 into register RAX
add rbx, rax ; Add RAX to RBX, store result in RBX
Machine Code
Machine code is the binary encoding of those instructions—what the CPU actually reads:
Assembly: mov rax, 42
Machine Code: 48 C7 C0 2A 00 00 00 (7 bytes)
Breakdown:
48 REX.W prefix (indicates 64-bit operand)
C7 C0 Opcode for MOV r64, imm32 (to RAX)
2A 00 00 00 Immediate value 42 (0x2A) in little-endian
See It Yourself: Disassemble Machine Code
# Write machine code bytes to a file
echo -n -e '\x48\xc7\xc0\x2a\x00\x00\x00' > raw.bin
Linux
# Disassemble with objdump
objdump -D -b binary -m i386:x86-64 raw.bin
# Output shows:
# 0: 48 c7 c0 2a 00 00 00 mov rax,0x2a
All Platforms (ships with NASM)
# Disassemble raw binary with ndisasm (comes with NASM)
ndisasm -b 64 raw.bin
# Output shows:
# 00000000 48C7C02A000000 mov rax,0x2a
macOS
# View raw bytes with hexdump
hexdump -C raw.bin
# Disassemble a compiled Mach-O binary with otool
make clean && make
otool -t -v ./main
Micro-operations (μops)
Modern x86 CPUs internally break down complex CISC instructions into simpler RISC-like micro-operations. This is invisible to programmers but affects performance:
Assembly instruction: add [rbx], rax
(Add RAX to memory at address in RBX)
Internal μops (simplified):
1. μop: Load value from memory address in RBX → temp register
2. μop: Add RAX + temp → result register
3. μop: Store result → memory address in RBX
This single assembly instruction becomes 3 micro-operations internally!
Why Micro-ops Matter: Instructions that look simple might decompose into many μops, affecting:
- Execution latency (cycles to complete)
- Throughput (how many per cycle)
- Out-of-order execution scheduling
We'll explore this deeply in
Part 22: Advanced Optimization.
Quick Comparison Table
| Aspect |
Assembly |
Machine Code |
Micro-ops |
| Form |
Text mnemonics |
Binary bytes |
Internal CPU signals |
| Created by |
Programmer |
Assembler |
CPU decoder |
| Visible to |
Programmers |
CPU, debuggers |
CPU only (mostly) |
| Relationship |
1:1 with machine code |
1:1 with assembly |
1:many from machine code |
| Documentation |
Intel/AMD manuals |
Intel/AMD manuals |
Agner Fog, uops.info |
Why and When Assembly Matters
Use Cases
When You Need Assembly
- OS Kernels: Boot code, interrupt handlers, context switching
- Embedded Systems: Direct hardware control, minimal footprint
- Performance-Critical Code: SIMD optimizations, cryptography
- Reverse Engineering: Malware analysis, security research
- Compiler Development: Understanding code generation
The Build Pipeline
Understanding how assembly code becomes an executable is crucial. The pipeline consists of distinct stages:
Pipeline: Source (.asm) → Assemble → Object (.o/.obj) → Link → Executable → Load → Execute
Stage 1: Assemble
The assembler converts your assembly source into an object file containing machine code and metadata:
# NASM assembling to ELF64 object file
nasm -f elf64 program.asm -o program.o
# NASM assembling to Win64 object file
nasm -f win64 program.asm -o program.obj
Stage 2: Link
The linker combines object files and resolves symbols to create an executable:
# Linux: Link with ld
ld -o program program.o
# Windows: Link with MSVC linker
link /SUBSYSTEM:CONSOLE program.obj
Stage 3: Load & Execute
When you run a program, the operating system's loader performs several crucial steps before your code executes:
What the Loader Does:
- Parse executable header (ELF/PE) to understand memory layout
- Create process with new virtual address space
- Map sections into memory with correct permissions (read/write/execute)
- Perform relocations (adjust addresses if loaded at different base)
- Load shared libraries and resolve dynamic symbols
- Set up stack with command-line arguments and environment
- Transfer control to entry point (_start or main)
Process Memory Layout
After loading, your process has a specific memory layout:
High Address (e.g., 0x7FFF...)
┌─────────────────────────┐
│ Stack │ ← Grows downward (RSP)
│ ↓ │ Local variables, return addresses
│ │
│ (unmapped) │
│ │
│ ↑ │
│ Heap │ ← Grows upward (brk/mmap)
│ │ Dynamic allocations
├─────────────────────────┤
│ .bss │ ← Uninitialized data (zeroed)
├─────────────────────────┤
│ .data │ ← Initialized data (read/write)
├─────────────────────────┤
│ .rodata │ ← Read-only data (constants, strings)
├─────────────────────────┤
│ .text │ ← Code section (read/execute)
└─────────────────────────┘
Low Address (e.g., 0x400000)
Examining the Load Process
# Trace system calls during program load (Linux)
strace ./hello 2>&1 | head -30
# Key syscalls you'll see:
# execve("./hello", ...) - Execute the program
# mmap(NULL, ..., PROT_READ) - Map ELF header
# mmap(0x400000, ..., PROT_EXEC) - Map .text section
# mmap(0x600000, ..., PROT_WRITE)- Map .data section
# write(1, "Hello, World!\n", 14)- Our actual syscall!
# exit_group(0) - Exit
Exercise: Watch Your Program Load
# Use GDB to observe the loader
gdb ./hello
(gdb) starti # Stop at the very first instruction
# (This is in the dynamic linker, not your code!)
(gdb) info proc mappings # Show memory map
(gdb) info files # Show loaded sections
(gdb) break _start # Break at your entry point
(gdb) continue # Run to _start
(gdb) x/10i $rip # Examine your code!
Entry Point vs Main
There's often confusion about where execution actually begins:
| Program Type |
True Entry Point |
Notes |
| Pure assembly (no libc) |
_start |
Directly to your code |
| C program (with libc) |
_start (in crt0) |
C runtime calls main() |
| PIE executable |
Dynamic linker first |
Then to _start |
Object File Formats
ELF (Executable and Linkable Format)
Used by Linux, BSD, and many Unix-like systems. ELF files contain organized sections for code, data, symbols, and relocation information.
ELF Structure Overview
┌─────────────────────────────────┐
│ ELF Header │ 64 bytes (64-bit)
│ Magic: 7F 45 4C 46 │ Identifies as ELF
│ Class: 64-bit │
│ Entry Point: 0x401000 │ Where execution begins
├─────────────────────────────────┤
│ Program Headers │ Describe segments for loading
│ (how to load into memory) │
├─────────────────────────────────┤
│ Section Headers │ Describe sections for linking
│ (logical organization) │
├─────────────────────────────────┤
│ .text │ Executable code
├─────────────────────────────────┤
│ .rodata │ Read-only data (strings)
├─────────────────────────────────┤
│ .data │ Initialized read/write data
├─────────────────────────────────┤
│ .bss │ Uninitialized data (zeroed)
├─────────────────────────────────┤
│ .symtab │ Symbol table
├─────────────────────────────────┤
│ .strtab │ String table (symbol names)
└─────────────────────────────────┘
Examining ELF Files
# View ELF header
readelf -h hello
# Output shows:
# Magic: 7f 45 4c 46 02 01 01 00 ...
# Type: EXEC (Executable file)
# Entry point address: 0x401000
# View section headers
readelf -S hello
# View program headers (segments)
readelf -l hello
# Disassemble .text section
objdump -d hello
# View all symbols
nm hello
Exercise: Decode the ELF Magic
# Read first 16 bytes of any ELF file
xxd -l 16 hello
# 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
# │ │ │ │ │ │ │
# │ │ │ │ │ │ └─ OS/ABI (0 = System V)
# │ │ │ │ │ └─── Endianness (1 = little)
# │ │ │ │ └────── ELF version (1)
# │ │ │ └──────── Class (2 = 64-bit)
# │ │ └─────────── 'F' (0x46)
# │ └────────────── 'L' (0x4C)
# └───────────────── 'E' (0x45)
# └───────────────── Magic: 0x7F
PE (Portable Executable)
Used by Windows for .exe and .dll files. PE evolved from COFF format and maintains a DOS stub for backward compatibility.
PE Structure Overview
┌─────────────────────────────────┐
│ DOS Header │ 64 bytes
│ Magic: 4D 5A ("MZ") │ Mark Zbikowski's initials!
│ PE offset at 0x3C │
├─────────────────────────────────┤
│ DOS Stub │ "This program cannot be run..."
│ (Legacy compatibility) │
├─────────────────────────────────┤
│ PE Signature │ "PE\0\0" (50 45 00 00)
├─────────────────────────────────┤
│ COFF Header │ Machine type, section count
├─────────────────────────────────┤
│ Optional Header │ Entry point, image base, etc.
│ (not actually optional!) │
├─────────────────────────────────┤
│ Section Headers │ .text, .data, .rdata, etc.
├─────────────────────────────────┤
│ .text │ Executable code
├─────────────────────────────────┤
│ .rdata │ Read-only data, imports
├─────────────────────────────────┤
│ .data │ Initialized data
├─────────────────────────────────┤
│ .idata │ Import table (DLL references)
└─────────────────────────────────┘
Examining PE Files
# Windows: Use dumpbin (from Visual Studio)
dumpbin /headers hello.exe
dumpbin /disasm hello.exe
dumpbin /imports hello.exe
# Linux: Use objdump with PE support
objdump -x hello.exe
# Or use pe-parse/pefile (Python)
pip install pefile
python -c "import pefile; pe = pefile.PE('hello.exe'); print(pe.dump_info())"
History Note: "MZ" in the DOS header stands for Mark Zbikowski, an architect of MS-DOS. Every Windows executable still starts with these bytes—40+ years of backward compatibility!
ELF vs PE Quick Comparison
| Feature |
ELF (Linux) |
PE (Windows) |
| Magic |
7F 45 4C 46 (.ELF) |
4D 5A (MZ) + PE\0\0 |
| Code section |
.text |
.text |
| Read-only data |
.rodata |
.rdata |
| Typical base address |
0x400000 |
0x140000000 (64-bit) |
| Analysis tool |
readelf, objdump |
dumpbin, PE-bear |
Your First Assembly Programs
Hello World (Linux x86-64)
; hello.asm - Linux x86-64 Hello World
; Assemble: nasm -f elf64 hello.asm -o hello.o
; Link: ld hello.o -o hello
; Run: ./hello
section .data
msg db "Hello, World!", 10 ; String with newline
len equ $ - msg ; Calculate string length
section .text
global _start
_start:
; Write syscall
mov rax, 1 ; syscall: write
mov rdi, 1 ; fd: stdout
mov rsi, msg ; buffer address
mov rdx, len ; buffer length
syscall
; Exit syscall
mov rax, 60 ; syscall: exit
xor rdi, rdi ; status: 0
syscall
Save & Compile: hello.asm
Linux
nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello
macOS (change _start → _main, syscall numbers differ — see Part 0)
nasm -f macho64 hello.asm -o hello.o
ld -macos_version_min 10.13 -e _main -static hello.o -o hello
Windows (see hello_win.asm below for native Windows version)
nasm -f win64 hello.asm -o hello.obj
link /subsystem:console /entry:_start hello.obj /out:hello.exe
Hello World (Windows x86-64)
Windows uses a completely different approach—instead of direct syscalls, we call Win32 API functions through DLL imports:
; hello_win.asm - Windows x64 Console Hello World
; Assemble: nasm -f win64 hello_win.asm -o hello_win.obj
; Link: link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
; Or with GoLink: golink /console hello_win.obj kernel32.dll
bits 64
default rel
section .data
msg db "Hello, World!", 13, 10, 0 ; CRLF + null terminator
msg_len equ $ - msg - 1 ; Length without null
section .bss
written resq 1 ; Bytes written (output)
section .text
global main
extern GetStdHandle
extern WriteConsoleA
extern ExitProcess
main:
; Set up stack frame (shadow space required by Win64 ABI)
sub rsp, 40 ; 32 bytes shadow + 8 for alignment
; GetStdHandle(STD_OUTPUT_HANDLE)
mov rcx, -11 ; STD_OUTPUT_HANDLE = -11
call GetStdHandle
mov rbx, rax ; Save handle in rbx
; WriteConsoleA(handle, buffer, length, &written, NULL)
mov rcx, rbx ; Handle
lea rdx, [msg] ; Buffer pointer
mov r8d, msg_len ; Number of chars to write
lea r9, [written] ; Pointer to bytes written
mov qword [rsp+32], 0 ; Reserved (5th param on stack)
call WriteConsoleA
; ExitProcess(0)
xor rcx, rcx ; Exit code 0
call ExitProcess
Save & Compile: hello_win.asm
Windows
nasm -f win64 hello_win.asm -o hello_win.obj
link /SUBSYSTEM:CONSOLE /ENTRY:main hello_win.obj kernel32.lib
hello_win.exe
Alternative: golink /console hello_win.obj kernel32.dll
Win64 Calling Convention:
- Parameters: RCX, RDX, R8, R9 for first 4 arguments
- Shadow space: Always reserve 32 bytes on stack before calls
- Stack alignment: Must be 16-byte aligned at CALL instruction
- Return value: RAX
- Caller-saved: RAX, RCX, RDX, R8-R11
Alternative approach using MASM syntax:
; hello_masm.asm - MASM syntax version
; Build: ml64 /c hello_masm.asm
; Link: link /SUBSYSTEM:CONSOLE hello_masm.obj kernel32.lib
extern GetStdHandle : proc
extern WriteConsoleA : proc
extern ExitProcess : proc
.data
msg db "Hello, World!", 13, 10, 0
msgLen equ $ - msg - 1
.data?
written dq ?
.code
main proc
sub rsp, 40 ; Shadow space + alignment
mov rcx, -11
call GetStdHandle
mov rcx, rax ; Handle
lea rdx, msg ; Buffer
mov r8d, msgLen ; Length
lea r9, written ; Output count
mov qword ptr [rsp+32], 0 ; Reserved
call WriteConsoleA
xor ecx, ecx
call ExitProcess
main endp
end
The ultimate assembly experience: code that runs with no OS, directly on hardware (or emulator). This boot sector prints "Hi" to the screen using BIOS interrupts:
; boot.asm - Simple boot sector (512 bytes, runs at 0x7C00)
; Assemble: nasm -f bin boot.asm -o boot.bin
; Run: qemu-system-x86_64 -drive format=raw,file=boot.bin
bits 16 ; 16-bit real mode
org 0x7C00 ; BIOS loads us here
start:
; Set up segments (BIOS doesn't guarantee these)
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00 ; Stack below our code
; Clear screen (BIOS video interrupt)
mov ax, 0x0003 ; 80x25 text mode
int 0x10
; Print 'H'
mov ah, 0x0E ; Teletype output
mov al, 'H'
int 0x10
; Print 'i'
mov al, 'i'
int 0x10
; Print '!'
mov al, '!'
int 0x10
.halt:
hlt ; Halt CPU (saves power)
jmp .halt ; Loop forever if interrupted
; Boot sector signature (must be at bytes 510-511)
times 510 - ($ - $$) db 0 ; Pad with zeros
dw 0xAA55 ; Boot signature
Save & Compile: boot.asm
All Platforms (flat binary — no OS-specific linking)
nasm -f bin boot.asm -o boot.bin
qemu-system-x86_64 -drive format=raw,file=boot.bin
Boot Process Explained:
- Power on: CPU starts in 16-bit real mode at 0xFFFF0 (BIOS ROM)
- BIOS POST: Initializes hardware, tests memory
- Boot search: BIOS reads first 512 bytes from boot device
- Signature check: Last two bytes must be 0x55, 0xAA
- Load & jump: BIOS loads sector to 0x7C00 and jumps there
- Your code runs! You now control the entire machine
Slightly More Practical Boot Sector
; boot_msg.asm - Boot sector with string printing
; Assemble & run same as above
bits 16
org 0x7C00
start:
xor ax, ax
mov ds, ax
mov es, ax
; Print welcome message
mov si, welcome_msg
call print_string
; Print hex value demo
mov si, hex_msg
call print_string
mov ax, 0xDEAD
call print_hex
jmp $ ; Infinite loop ($ = current address)
; Print null-terminated string from SI
print_string:
pusha ; Save all registers
mov ah, 0x0E ; BIOS teletype
.loop:
lodsb ; Load byte from [SI] into AL, increment SI
test al, al ; Check for null terminator
jz .done
int 0x10 ; Print character
jmp .loop
.done:
popa
ret
; Print AX as 4 hex digits
print_hex:
pusha
mov cx, 4 ; 4 hex digits
.loop:
rol ax, 4 ; Rotate left, bringing high nibble to low
mov bx, ax
and bx, 0x0F ; Isolate low nibble
mov bl, [hex_chars + bx]; Convert to ASCII
push ax
mov ah, 0x0E
mov al, bl
int 0x10
pop ax
loop .loop
popa
ret
hex_chars: db "0123456789ABCDEF"
welcome_msg: db "Boot sector loaded!", 13, 10, 0
hex_msg: db "Value: 0x", 0
times 510 - ($ - $$) db 0
dw 0xAA55
Save & Compile: boot_msg.asm
All Platforms (flat binary — no OS-specific linking)
nasm -f bin boot_msg.asm -o boot_msg.bin
qemu-system-x86_64 -drive format=raw,file=boot_msg.bin
Exercise: Your First Boot Sector
# Create and test your boot sector
nasm -f bin boot.asm -o boot.bin
# Verify size (should be exactly 512 bytes)
ls -la boot.bin
# Verify boot signature
xxd boot.bin | tail -1
# Should end with: .... 55aa
# Run in QEMU (no OS, no drivers - pure bare metal!)
qemu-system-x86_64 -drive format=raw,file=boot.bin
# Debug with QEMU + GDB
qemu-system-x86_64 -drive format=raw,file=boot.bin -s -S &
gdb -ex "target remote localhost:1234" -ex "set architecture i8086"
Challenge: Modify the boot sector to print your name, then print it in a different color (hint: use AH=0x09 with BL for color attribute).
Next Steps
Now that you understand what assembly is and how the build pipeline works, we'll dive into the CPU architecture that executes these instructions.
Continue the Series
Part 0: Development Environment, Tooling & Workflow
Set up your complete assembly development environment with assemblers, debuggers, and build tools.
Read Article
Part 2: x86 CPU Architecture Overview
Understand x86 evolution, execution modes, privilege rings, and CPU internals for effective assembly programming.
Read Article
Part 3: Registers – Complete Deep Dive
Master all x86/x64 registers including general-purpose, segment, control, and debug registers.
Read Article