Back to Technology

Part 10: Threads & Concurrency

January 31, 2026 Wasil Zafar 32 min read

Master concurrent programming—from threading models and pthreads to race conditions and thread safety patterns.

Table of Contents

  1. Introduction
  2. Thread Basics
  3. Threading Models
  4. POSIX Threads (pthreads)
  5. Race Conditions
  6. Thread Safety Patterns
  7. Conclusion & Next Steps

Introduction

Threads are lightweight units of execution within a process. They enable concurrent programming, allowing applications to perform multiple tasks simultaneously and take full advantage of multi-core processors.

Series Context: This is Part 10 of 24 in the Computer Architecture & Operating Systems Mastery series. Building on process fundamentals, we now explore how multiple threads of execution share process resources.

Computer Architecture & OS Mastery

Your 24-step learning path • Currently on Step 10
1
Part 1: Foundations of Computer Systems
System overview, architectures, OS role
2
Digital Logic & CPU Building Blocks
Gates, registers, datapath, microarchitecture
3
Instruction Set Architecture (ISA)
RISC vs CISC, instruction formats, addressing
4
Assembly Language & Machine Code
Registers, stack, calling conventions
5
Assemblers, Linkers & Loaders
Object files, ELF, dynamic linking
6
Compilers & Program Translation
Lexing, parsing, code generation
7
CPU Execution & Pipelining
Fetch-decode-execute, hazards, prediction
8
OS Architecture & Kernel Design
Monolithic, microkernel, system calls
9
Processes & Program Execution
Process lifecycle, PCB, fork/exec
10
Threads & Concurrency
Threading models, pthreads, race conditions
You Are Here
11
CPU Scheduling Algorithms
FCFS, RR, CFS, real-time scheduling
12
Synchronization & Coordination
Locks, semaphores, classic problems
13
Deadlocks & Prevention
Coffman conditions, Banker's algorithm
14
Memory Hierarchy & Cache
L1/L2/L3, cache coherence, NUMA
15
Memory Management Fundamentals
Address spaces, fragmentation, allocation
16
Virtual Memory & Paging
Page tables, TLB, demand paging
17
File Systems & Storage
Inodes, journaling, ext4, NTFS
18
I/O Systems & Device Drivers
Interrupts, DMA, disk scheduling
19
Multiprocessor Systems
SMP, NUMA, cache coherence
20
OS Security & Protection
Privilege levels, ASLR, sandboxing
21
Virtualization & Containers
Hypervisors, namespaces, cgroups
22
Advanced Kernel Internals
Linux subsystems, kernel debugging
23
Case Studies
Linux vs Windows vs macOS
24
Capstone Projects
Shell, thread pool, paging simulator

Why Threads?

While processes provide isolation, they're heavyweight—creating a process and switching between them is expensive. Threads are lightweight execution units that share the same address space, enabling efficient parallelism within a single program.

Analogy: If a process is like a house (with its own address, utilities, walls), threads are like roommates sharing that house. They can move around independently but share the same kitchen, bathroom, and living room—efficient but requiring coordination!

Thread Basics

Threads vs Processes

Process vs Thread Comparison

Process vs Thread:
══════════════════════════════════════════════════════════════

PROCESS                              THREAD
────────────────────────────────     ────────────────────────────────
Own address space                    Shared address space
Own file descriptors                 Shared file descriptors
Own signal handlers                  Shared signal handlers
Own heap                             Shared heap
                                     
Own stack                            Own stack (each thread)
Own registers/PC                     Own registers/PC (each thread)
Own thread ID (TID)                  Own thread ID (TID)

Memory Layout Comparison:
══════════════════════════════════════════════════════════════

SEPARATE PROCESSES:                  SINGLE PROCESS, MULTIPLE THREADS:

Process A        Process B           Shared Process Space
┌─────────┐     ┌─────────┐         ┌─────────────────────────┐
│ Stack A │     │ Stack B │         │ Stack (Thread 1)        │
├─────────┤     ├─────────┤         ├─────────────────────────┤
│ Heap A  │     │ Heap B  │         │ Stack (Thread 2)        │
├─────────┤     ├─────────┤         ├─────────────────────────┤
│ Data A  │     │ Data B  │         │ Stack (Thread 3)        │
├─────────┤     ├─────────┤         ├═════════════════════════┤
│ Code A  │     │ Code B  │         │ Heap (SHARED)           │
└─────────┘     └─────────┘         ├─────────────────────────┤
                                    │ Data (SHARED)           │
Isolated!                           ├─────────────────────────┤
                                    │ Code (SHARED)           │
                                    └─────────────────────────┘

Benefits of Threads

Benefits of Multithreading:
══════════════════════════════════════════════════════════════

1. RESPONSIVENESS
   Main thread handles UI, worker threads do computation
   → UI stays responsive during heavy processing
   
   Example: Browser
   • Main thread: Render page, handle clicks
   • Network thread: Fetch resources
   • JavaScript thread: Run scripts

2. RESOURCE SHARING
   Threads share memory → Easy communication
   • No need for pipes, sockets, or shared memory setup
   • Just read/write shared variables (with synchronization!)
   
3. ECONOMY
   ┌─────────────────────────────────────────────────────────┐
   │ Operation              Process      Thread              │
   │ ─────────────────────────────────────────────────────── │
   │ Creation time          ~10ms        ~100μs (100× faster)│
   │ Context switch         ~5-10μs      ~1-2μs (5× faster) │
   │ Memory overhead        MBs          KBs (stack only)   │
   └─────────────────────────────────────────────────────────┘

4. SCALABILITY
   Threads can run on multiple CPU cores simultaneously
   → True parallelism on multicore systems
   
   4-core CPU, 4 threads → 4× speedup (ideal case)

Challenges

Threading is Hard! Shared memory is a double-edged sword:
  • Race conditions: Results depend on timing
  • Deadlocks: Threads wait for each other forever
  • Non-determinism: Bugs may not reproduce
  • Debugging: Hard to track interleaved execution

Threading Models

There are different ways to implement threads, differing in where thread management happens.

User-Level Threads

User-Level Threading

User-Level Threads (N:1 Model):
══════════════════════════════════════════════════════════════

User Space
┌─────────────────────────────────────────────────────────────┐
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Thread Library (e.g., Green Threads)    │   │
│  │  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐               │   │
│  │  │ T1  │  │ T2  │  │ T3  │  │ T4  │  User threads │   │
│  │  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘               │   │
│  │     └────────┴────────┴────────┘                    │   │
│  │                    │                                │   │
│  │              Thread Scheduler                       │   │
│  │              (in user space)                        │   │
│  └──────────────────────┬──────────────────────────────┘   │
└─────────────────────────┼───────────────────────────────────┘
══════════════════════════╪═══════════════════════════════════
Kernel Space              ▼
┌─────────────────────────────────────────────────────────────┐
│                Single Kernel Thread                         │
│                (OS sees one "process")                      │
└─────────────────────────────────────────────────────────────┘

Advantages:
✓ Fast context switch (no kernel involvement)
✓ Can implement custom scheduling
✓ Works on any OS (no kernel support needed)

Disadvantages:
✗ One thread blocks → entire process blocks
✗ Can't use multiple CPUs (all run on one kernel thread)
✗ Page fault in one thread blocks all

Examples: Go goroutines (initially), Java green threads (historical)

Kernel-Level Threads

Kernel-Level Threads (1:1 Model):
══════════════════════════════════════════════════════════════

User Space
┌─────────────────────────────────────────────────────────────┐
│  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐                        │
│  │ T1  │  │ T2  │  │ T3  │  │ T4  │  User threads          │
│  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘                        │
└─────┼───────┼───────┼───────┼──────────────────────────────┘
══════╪═══════╪═══════╪═══════╪══════════════════════════════
Kernel│Space  │       │       │
┌─────┼───────┼───────┼───────┼──────────────────────────────┐
│  ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐                          │
│  │ KT1 │ │ KT2 │ │ KT3 │ │ KT4 │  Kernel threads          │
│  └─────┘ └─────┘ └─────┘ └─────┘  (1:1 mapping)           │
│                                                             │
│         Kernel Thread Scheduler                            │
└─────────────────────────────────────────────────────────────┘

Advantages:
✓ True parallelism on multicore
✓ One thread blocking doesn't block others
✓ Kernel can schedule on multiple CPUs

Disadvantages:
✗ Context switch requires kernel involvement (~1-2μs)
✗ Each thread uses kernel resources
✗ Thread creation slower

Examples: Linux pthreads, Windows threads, macOS threads

Hybrid Models

Hybrid M:N Model:
══════════════════════════════════════════════════════════════
M user threads mapped to N kernel threads (M > N)

User Space
┌─────────────────────────────────────────────────────────────┐
│  ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐  (M user threads)    │
│  │UT1│ │UT2│ │UT3│ │UT4│ │UT5│ │UT6│                      │
│  └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘                      │
│    └─────┴──┬──┴─────┴──┬──┴─────┘                         │
│             │           │                                   │
│      User-Level Scheduler                                   │
└─────────────┼───────────┼───────────────────────────────────┘
══════════════╪═══════════╪═══════════════════════════════════
Kernel Space  ▼           ▼
┌─────────────────────────────────────────────────────────────┐
│         ┌─────┐     ┌─────┐                                 │
│         │ KT1 │     │ KT2 │    (N kernel threads)          │
│         └─────┘     └─────┘                                 │
│                                                             │
│         Kernel Scheduler                                    │
└─────────────────────────────────────────────────────────────┘

Examples: Go runtime (goroutines → OS threads), Solaris LWPs

Best of both worlds:
✓ Many lightweight user threads
✓ Multiple kernel threads for parallelism
✓ User-level scheduling is fast

POSIX Threads (pthreads)

pthreads is the standard threading API for Unix-like systems (Linux, macOS, BSD). It provides 1:1 kernel-level threads.

Thread Creation

Creating Threads with pthreads

// Basic pthread example
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

// Thread function - runs in new thread
void *thread_function(void *arg) {
    int thread_num = *(int *)arg;
    printf("Thread %d: Starting\n", thread_num);
    sleep(1);  // Simulate work
    printf("Thread %d: Finishing\n", thread_num);
    return NULL;
}

int main() {
    pthread_t threads[3];
    int thread_args[3] = {1, 2, 3};
    
    // Create 3 threads
    for (int i = 0; i < 3; i++) {
        int ret = pthread_create(
            &threads[i],      // Thread ID (output)
            NULL,             // Attributes (NULL = default)
            thread_function,  // Function to run
            &thread_args[i]   // Argument to function
        );
        if (ret != 0) {
            perror("pthread_create failed");
            return 1;
        }
    }
    
    printf("Main: All threads created\n");
    
    // Wait for all threads to complete
    for (int i = 0; i < 3; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Main: All threads finished\n");
    return 0;
}

// Compile: gcc -pthread program.c -o program
Output (order may vary!):
Main: All threads created
Thread 1: Starting
Thread 3: Starting
Thread 2: Starting
Thread 2: Finishing
Thread 1: Finishing
Thread 3: Finishing
Main: All threads finished

Joining & Detaching

// Thread joining - wait for completion and get return value
void *compute_sum(void *arg) {
    int *result = malloc(sizeof(int));
    *result = 42;  // Computed result
    return result;  // Return pointer (lives on heap!)
}

int main() {
    pthread_t thread;
    void *retval;
    
    pthread_create(&thread, NULL, compute_sum, NULL);
    
    // Join: block until thread finishes, get return value
    pthread_join(thread, &retval);
    
    int *result = (int *)retval;
    printf("Thread returned: %d\n", *result);  // 42
    free(result);
    
    return 0;
}

// Detached threads - run independently, can't be joined
void start_background_task() {
    pthread_t thread;
    pthread_create(&thread, NULL, background_work, NULL);
    
    // Detach: thread resources freed automatically on exit
    pthread_detach(thread);
    // Can't join a detached thread!
}

// Or set detached at creation:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&thread, &attr, func, NULL);

Thread Attributes

Common Thread Attributes:
  • Stack size: Default ~8MB, can be reduced for many threads
  • Detach state: Joinable (default) or detached
  • Scheduling policy: SCHED_FIFO, SCHED_RR, SCHED_OTHER
  • Priority: Scheduling priority within policy

Race Conditions

A race condition occurs when the behavior of a program depends on the relative timing of events (like thread scheduling). The result is non-deterministic and often buggy.

Critical Sections

Race Condition Example

// BUGGY CODE - Race condition!
int counter = 0;  // Shared variable

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // NOT atomic!
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    
    // Expected: 2,000,000
    // Actual: Something less (race condition!)
    printf("Counter: %d\n", counter);
    return 0;
}
Why it's broken:
══════════════════════════════════════════════════════════════
counter++ is NOT atomic! It's actually 3 operations:

Thread 1                   Thread 2
─────────────────────────  ─────────────────────────
1. LOAD counter (= 5)      
2. ADD 1 (= 6)             1. LOAD counter (= 5)  ← Same value!
3. STORE counter (= 6)     2. ADD 1 (= 6)
                           3. STORE counter (= 6)

Both threads read 5, both write 6 → Lost update!
We incremented twice but counter only increased by 1.

Runs:
$ ./program → Counter: 1847293
$ ./program → Counter: 1923847
$ ./program → Counter: 1788234
Different every time! Non-deterministic bug.

Data Races

Data Race Definition:
══════════════════════════════════════════════════════════════
A data race occurs when:
1. Two or more threads access the same memory location
2. At least one access is a write
3. The accesses are not synchronized

Data races lead to undefined behavior in C/C++!
The compiler can assume no data races and optimize accordingly.

Examples of data races:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
int x = 0;                  // Shared variable

Thread 1:     Thread 2:     Race?
─────────     ─────────     ──────
x = 1;        y = x;        YES - write/read without sync
x = 1;        x = 2;        YES - write/write without sync
y = x;        z = x;        NO  - both reads, no writes

CRITICAL SECTION:
A code region that accesses shared resources and must not be
executed by more than one thread at a time.

Atomicity

// Solution 1: Atomic operations (simple cases)
#include <stdatomic.h>

atomic_int counter = 0;  // Atomic integer

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        atomic_fetch_add(&counter, 1);  // Atomic increment
        // Or: counter++;  // Also atomic for atomic_int!
    }
    return NULL;
}

// Now counter will always be exactly 2,000,000!


// Solution 2: Mutex (for complex critical sections)
#include <pthread.h>

int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);    // Enter critical section
        counter++;                     // Safe!
        pthread_mutex_unlock(&lock);  // Exit critical section
    }
    return NULL;
}

Thread Safety Patterns

A function or class is thread-safe if it can be safely called from multiple threads simultaneously.

Thread Safety Techniques

TechniqueDescriptionExample
Immutability Data never changes after creation const strings, config objects
Thread-Local Storage Each thread has its own copy __thread int x;
Mutual Exclusion Locks protect shared data Mutexes, spinlocks
Atomic Operations Hardware-supported indivisible ops atomic_int, CAS
Message Passing Communicate by sending messages Go channels, queues
// Thread-Local Storage example
#include <pthread.h>
#include <stdio.h>

// Each thread gets its own 'counter'
__thread int counter = 0;  // GCC extension
// Or: _Thread_local int counter = 0;  // C11 standard

void *thread_func(void *arg) {
    int id = *(int *)arg;
    for (int i = 0; i < 5; i++) {
        counter++;  // Each thread increments its OWN copy
        printf("Thread %d: counter = %d\n", id, counter);
    }
    return NULL;
}

// Output: Each thread counts 1,2,3,4,5 independently!
Design Principle: "Share nothing if possible." If you can avoid sharing state between threads, you avoid race conditions entirely. When sharing is necessary, minimize it and protect it carefully.

Conclusion & Next Steps

Threads enable parallelism within a process, but shared memory requires careful synchronization. Key takeaways:

  • Threads vs Processes: Threads share address space (fast but dangerous), processes are isolated (safe but slower)
  • Threading Models: User-level (N:1), kernel-level (1:1), and hybrid (M:N) each have trade-offs
  • pthreads: Standard API for thread creation, joining, and attributes
  • Race Conditions: Non-atomic operations on shared data cause unpredictable bugs
  • Thread Safety: Achieved through locks, atomics, immutability, or thread-local storage
Golden Rule: If multiple threads access the same data and at least one writes, you MUST synchronize. The next part covers synchronization primitives in detail.

Next in the Series

In Part 11: CPU Scheduling Algorithms, we'll explore how operating systems decide which process or thread runs next—FCFS, Round Robin, priority scheduling, and the Linux CFS scheduler.