Introduction
Threads are lightweight units of execution within a process. They enable concurrent programming, allowing applications to perform multiple tasks simultaneously and take full advantage of multi-core processors.
Series Context: This is Part 10 of 24 in the Computer Architecture & Operating Systems Mastery series. Building on process fundamentals, we now explore how multiple threads of execution share process resources.
1
Part 1: Foundations of Computer Systems
System overview, architectures, OS role
2
Digital Logic & CPU Building Blocks
Gates, registers, datapath, microarchitecture
3
Instruction Set Architecture (ISA)
RISC vs CISC, instruction formats, addressing
4
Assembly Language & Machine Code
Registers, stack, calling conventions
5
Assemblers, Linkers & Loaders
Object files, ELF, dynamic linking
6
Compilers & Program Translation
Lexing, parsing, code generation
7
CPU Execution & Pipelining
Fetch-decode-execute, hazards, prediction
8
OS Architecture & Kernel Design
Monolithic, microkernel, system calls
9
Processes & Program Execution
Process lifecycle, PCB, fork/exec
10
Threads & Concurrency
Threading models, pthreads, race conditions
You Are Here
11
CPU Scheduling Algorithms
FCFS, RR, CFS, real-time scheduling
12
Synchronization & Coordination
Locks, semaphores, classic problems
13
Deadlocks & Prevention
Coffman conditions, Banker's algorithm
14
Memory Hierarchy & Cache
L1/L2/L3, cache coherence, NUMA
15
Memory Management Fundamentals
Address spaces, fragmentation, allocation
16
Virtual Memory & Paging
Page tables, TLB, demand paging
17
File Systems & Storage
Inodes, journaling, ext4, NTFS
18
I/O Systems & Device Drivers
Interrupts, DMA, disk scheduling
19
Multiprocessor Systems
SMP, NUMA, cache coherence
20
OS Security & Protection
Privilege levels, ASLR, sandboxing
21
Virtualization & Containers
Hypervisors, namespaces, cgroups
22
Advanced Kernel Internals
Linux subsystems, kernel debugging
23
Case Studies
Linux vs Windows vs macOS
24
Capstone Projects
Shell, thread pool, paging simulator
Why Threads?
While processes provide isolation, they're heavyweight—creating a process and switching between them is expensive. Threads are lightweight execution units that share the same address space, enabling efficient parallelism within a single program.
Analogy: If a process is like a house (with its own address, utilities, walls), threads are like roommates sharing that house. They can move around independently but share the same kitchen, bathroom, and living room—efficient but requiring coordination!
Thread Basics
Threads vs Processes
Process vs Thread Comparison
Process vs Thread:
══════════════════════════════════════════════════════════════
PROCESS THREAD
──────────────────────────────── ────────────────────────────────
Own address space Shared address space
Own file descriptors Shared file descriptors
Own signal handlers Shared signal handlers
Own heap Shared heap
Own stack Own stack (each thread)
Own registers/PC Own registers/PC (each thread)
Own thread ID (TID) Own thread ID (TID)
Memory Layout Comparison:
══════════════════════════════════════════════════════════════
SEPARATE PROCESSES: SINGLE PROCESS, MULTIPLE THREADS:
Process A Process B Shared Process Space
┌─────────┐ ┌─────────┐ ┌─────────────────────────┐
│ Stack A │ │ Stack B │ │ Stack (Thread 1) │
├─────────┤ ├─────────┤ ├─────────────────────────┤
│ Heap A │ │ Heap B │ │ Stack (Thread 2) │
├─────────┤ ├─────────┤ ├─────────────────────────┤
│ Data A │ │ Data B │ │ Stack (Thread 3) │
├─────────┤ ├─────────┤ ├═════════════════════════┤
│ Code A │ │ Code B │ │ Heap (SHARED) │
└─────────┘ └─────────┘ ├─────────────────────────┤
│ Data (SHARED) │
Isolated! ├─────────────────────────┤
│ Code (SHARED) │
└─────────────────────────┘
Benefits of Threads
Benefits of Multithreading:
══════════════════════════════════════════════════════════════
1. RESPONSIVENESS
Main thread handles UI, worker threads do computation
→ UI stays responsive during heavy processing
Example: Browser
• Main thread: Render page, handle clicks
• Network thread: Fetch resources
• JavaScript thread: Run scripts
2. RESOURCE SHARING
Threads share memory → Easy communication
• No need for pipes, sockets, or shared memory setup
• Just read/write shared variables (with synchronization!)
3. ECONOMY
┌─────────────────────────────────────────────────────────┐
│ Operation Process Thread │
│ ─────────────────────────────────────────────────────── │
│ Creation time ~10ms ~100μs (100× faster)│
│ Context switch ~5-10μs ~1-2μs (5× faster) │
│ Memory overhead MBs KBs (stack only) │
└─────────────────────────────────────────────────────────┘
4. SCALABILITY
Threads can run on multiple CPU cores simultaneously
→ True parallelism on multicore systems
4-core CPU, 4 threads → 4× speedup (ideal case)
Challenges
Threading is Hard! Shared memory is a double-edged sword:
- Race conditions: Results depend on timing
- Deadlocks: Threads wait for each other forever
- Non-determinism: Bugs may not reproduce
- Debugging: Hard to track interleaved execution
Threading Models
There are different ways to implement threads, differing in where thread management happens.
User-Level Threads
User-Level Threading
User-Level Threads (N:1 Model):
══════════════════════════════════════════════════════════════
User Space
┌─────────────────────────────────────────────────────────────┐
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Thread Library (e.g., Green Threads) │ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ │ T1 │ │ T2 │ │ T3 │ │ T4 │ User threads │ │
│ │ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │ │
│ │ └────────┴────────┴────────┘ │ │
│ │ │ │ │
│ │ Thread Scheduler │ │
│ │ (in user space) │ │
│ └──────────────────────┬──────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────┘
══════════════════════════╪═══════════════════════════════════
Kernel Space ▼
┌─────────────────────────────────────────────────────────────┐
│ Single Kernel Thread │
│ (OS sees one "process") │
└─────────────────────────────────────────────────────────────┘
Advantages:
✓ Fast context switch (no kernel involvement)
✓ Can implement custom scheduling
✓ Works on any OS (no kernel support needed)
Disadvantages:
✗ One thread blocks → entire process blocks
✗ Can't use multiple CPUs (all run on one kernel thread)
✗ Page fault in one thread blocks all
Examples: Go goroutines (initially), Java green threads (historical)
Kernel-Level Threads
Kernel-Level Threads (1:1 Model):
══════════════════════════════════════════════════════════════
User Space
┌─────────────────────────────────────────────────────────────┐
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ T1 │ │ T2 │ │ T3 │ │ T4 │ User threads │
│ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
└─────┼───────┼───────┼───────┼──────────────────────────────┘
══════╪═══════╪═══════╪═══════╪══════════════════════════════
Kernel│Space │ │ │
┌─────┼───────┼───────┼───────┼──────────────────────────────┐
│ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ │
│ │ KT1 │ │ KT2 │ │ KT3 │ │ KT4 │ Kernel threads │
│ └─────┘ └─────┘ └─────┘ └─────┘ (1:1 mapping) │
│ │
│ Kernel Thread Scheduler │
└─────────────────────────────────────────────────────────────┘
Advantages:
✓ True parallelism on multicore
✓ One thread blocking doesn't block others
✓ Kernel can schedule on multiple CPUs
Disadvantages:
✗ Context switch requires kernel involvement (~1-2μs)
✗ Each thread uses kernel resources
✗ Thread creation slower
Examples: Linux pthreads, Windows threads, macOS threads
Hybrid Models
Hybrid M:N Model:
══════════════════════════════════════════════════════════════
M user threads mapped to N kernel threads (M > N)
User Space
┌─────────────────────────────────────────────────────────────┐
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ (M user threads) │
│ │UT1│ │UT2│ │UT3│ │UT4│ │UT5│ │UT6│ │
│ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ │
│ └─────┴──┬──┴─────┴──┬──┴─────┘ │
│ │ │ │
│ User-Level Scheduler │
└─────────────┼───────────┼───────────────────────────────────┘
══════════════╪═══════════╪═══════════════════════════════════
Kernel Space ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ ┌─────┐ ┌─────┐ │
│ │ KT1 │ │ KT2 │ (N kernel threads) │
│ └─────┘ └─────┘ │
│ │
│ Kernel Scheduler │
└─────────────────────────────────────────────────────────────┘
Examples: Go runtime (goroutines → OS threads), Solaris LWPs
Best of both worlds:
✓ Many lightweight user threads
✓ Multiple kernel threads for parallelism
✓ User-level scheduling is fast
POSIX Threads (pthreads)
pthreads is the standard threading API for Unix-like systems (Linux, macOS, BSD). It provides 1:1 kernel-level threads.
Thread Creation
Creating Threads with pthreads
// Basic pthread example
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
// Thread function - runs in new thread
void *thread_function(void *arg) {
int thread_num = *(int *)arg;
printf("Thread %d: Starting\n", thread_num);
sleep(1); // Simulate work
printf("Thread %d: Finishing\n", thread_num);
return NULL;
}
int main() {
pthread_t threads[3];
int thread_args[3] = {1, 2, 3};
// Create 3 threads
for (int i = 0; i < 3; i++) {
int ret = pthread_create(
&threads[i], // Thread ID (output)
NULL, // Attributes (NULL = default)
thread_function, // Function to run
&thread_args[i] // Argument to function
);
if (ret != 0) {
perror("pthread_create failed");
return 1;
}
}
printf("Main: All threads created\n");
// Wait for all threads to complete
for (int i = 0; i < 3; i++) {
pthread_join(threads[i], NULL);
}
printf("Main: All threads finished\n");
return 0;
}
// Compile: gcc -pthread program.c -o program
Output (order may vary!):
Main: All threads created
Thread 1: Starting
Thread 3: Starting
Thread 2: Starting
Thread 2: Finishing
Thread 1: Finishing
Thread 3: Finishing
Main: All threads finished
Joining & Detaching
// Thread joining - wait for completion and get return value
void *compute_sum(void *arg) {
int *result = malloc(sizeof(int));
*result = 42; // Computed result
return result; // Return pointer (lives on heap!)
}
int main() {
pthread_t thread;
void *retval;
pthread_create(&thread, NULL, compute_sum, NULL);
// Join: block until thread finishes, get return value
pthread_join(thread, &retval);
int *result = (int *)retval;
printf("Thread returned: %d\n", *result); // 42
free(result);
return 0;
}
// Detached threads - run independently, can't be joined
void start_background_task() {
pthread_t thread;
pthread_create(&thread, NULL, background_work, NULL);
// Detach: thread resources freed automatically on exit
pthread_detach(thread);
// Can't join a detached thread!
}
// Or set detached at creation:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&thread, &attr, func, NULL);
Thread Attributes
Common Thread Attributes:
- Stack size: Default ~8MB, can be reduced for many threads
- Detach state: Joinable (default) or detached
- Scheduling policy: SCHED_FIFO, SCHED_RR, SCHED_OTHER
- Priority: Scheduling priority within policy
Race Conditions
A race condition occurs when the behavior of a program depends on the relative timing of events (like thread scheduling). The result is non-deterministic and often buggy.
Critical Sections
Race Condition Example
// BUGGY CODE - Race condition!
int counter = 0; // Shared variable
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
counter++; // NOT atomic!
}
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
// Expected: 2,000,000
// Actual: Something less (race condition!)
printf("Counter: %d\n", counter);
return 0;
}
Why it's broken:
══════════════════════════════════════════════════════════════
counter++ is NOT atomic! It's actually 3 operations:
Thread 1 Thread 2
───────────────────────── ─────────────────────────
1. LOAD counter (= 5)
2. ADD 1 (= 6) 1. LOAD counter (= 5) ← Same value!
3. STORE counter (= 6) 2. ADD 1 (= 6)
3. STORE counter (= 6)
Both threads read 5, both write 6 → Lost update!
We incremented twice but counter only increased by 1.
Runs:
$ ./program → Counter: 1847293
$ ./program → Counter: 1923847
$ ./program → Counter: 1788234
Different every time! Non-deterministic bug.
Data Races
Data Race Definition:
══════════════════════════════════════════════════════════════
A data race occurs when:
1. Two or more threads access the same memory location
2. At least one access is a write
3. The accesses are not synchronized
Data races lead to undefined behavior in C/C++!
The compiler can assume no data races and optimize accordingly.
Examples of data races:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
int x = 0; // Shared variable
Thread 1: Thread 2: Race?
───────── ───────── ──────
x = 1; y = x; YES - write/read without sync
x = 1; x = 2; YES - write/write without sync
y = x; z = x; NO - both reads, no writes
CRITICAL SECTION:
A code region that accesses shared resources and must not be
executed by more than one thread at a time.
Atomicity
// Solution 1: Atomic operations (simple cases)
#include <stdatomic.h>
atomic_int counter = 0; // Atomic integer
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
atomic_fetch_add(&counter, 1); // Atomic increment
// Or: counter++; // Also atomic for atomic_int!
}
return NULL;
}
// Now counter will always be exactly 2,000,000!
// Solution 2: Mutex (for complex critical sections)
#include <pthread.h>
int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
pthread_mutex_lock(&lock); // Enter critical section
counter++; // Safe!
pthread_mutex_unlock(&lock); // Exit critical section
}
return NULL;
}
Thread Safety Patterns
A function or class is thread-safe if it can be safely called from multiple threads simultaneously.
Thread Safety Techniques
| Technique | Description | Example |
| Immutability |
Data never changes after creation |
const strings, config objects |
| Thread-Local Storage |
Each thread has its own copy |
__thread int x; |
| Mutual Exclusion |
Locks protect shared data |
Mutexes, spinlocks |
| Atomic Operations |
Hardware-supported indivisible ops |
atomic_int, CAS |
| Message Passing |
Communicate by sending messages |
Go channels, queues |
// Thread-Local Storage example
#include <pthread.h>
#include <stdio.h>
// Each thread gets its own 'counter'
__thread int counter = 0; // GCC extension
// Or: _Thread_local int counter = 0; // C11 standard
void *thread_func(void *arg) {
int id = *(int *)arg;
for (int i = 0; i < 5; i++) {
counter++; // Each thread increments its OWN copy
printf("Thread %d: counter = %d\n", id, counter);
}
return NULL;
}
// Output: Each thread counts 1,2,3,4,5 independently!
Design Principle: "Share nothing if possible." If you can avoid sharing state between threads, you avoid race conditions entirely. When sharing is necessary, minimize it and protect it carefully.
Conclusion & Next Steps
Threads enable parallelism within a process, but shared memory requires careful synchronization. Key takeaways:
- Threads vs Processes: Threads share address space (fast but dangerous), processes are isolated (safe but slower)
- Threading Models: User-level (N:1), kernel-level (1:1), and hybrid (M:N) each have trade-offs
- pthreads: Standard API for thread creation, joining, and attributes
- Race Conditions: Non-atomic operations on shared data cause unpredictable bugs
- Thread Safety: Achieved through locks, atomics, immutability, or thread-local storage
Golden Rule: If multiple threads access the same data and at least one writes, you MUST synchronize. The next part covers synchronization primitives in detail.
Next in the Series
In Part 11: CPU Scheduling Algorithms, we'll explore how operating systems decide which process or thread runs next—FCFS, Round Robin, priority scheduling, and the Linux CFS scheduler.
Continue the Computer Architecture & OS Series
Part 9: Processes & Program Execution
Process lifecycle, PCB, fork/exec, and process states.
Read Article
Part 11: CPU Scheduling Algorithms
FCFS, Round Robin, CFS, and real-time scheduling.
Read Article
Part 12: Synchronization & Coordination
Locks, semaphores, and classic synchronization problems.
Read Article