Back to Technology

Complete Protocols Master Part 19: Emerging Protocols

January 31, 2026 Wasil Zafar 50 min read

Explore the future of networking: QUIC replaces TCP+TLS, HTTP/3 brings speed, and WebTransport enables low-latency bidirectional communication.

Table of Contents

  1. Introduction
  2. QUIC Protocol
  3. HTTP/3
  4. WebTransport
  5. Future Trends
  6. Data Centre Interconnects
  7. Summary

Introduction: The Next Generation

TCP has served us for 40+ years, but its limitations are showing. QUIC is a ground-up redesign of transport—faster connections, no head-of-line blocking, and built-in encryption.

Timeline diagram showing the evolution from TCP/IP through HTTP/2 and TLS 1.3 to QUIC, HTTP/3, and WebTransport
The next generation of internet protocols — QUIC replaces TCP+TLS at the transport layer, enabling HTTP/3 and WebTransport for faster, more resilient connections
Series Context: This is Part 19 of 20 in the Complete Protocols Master series. These protocols are reshaping the transport and application layers.
Problems

Why Replace TCP?

TCP Limitations:

1. HEAD-OF-LINE BLOCKING
   One lost packet blocks ALL streams
   HTTP/2 multiplexing limited by TCP
   
2. SLOW CONNECTION SETUP
   TCP: 1 RTT (SYN, SYN-ACK, ACK)
   TLS: 1-2 RTT additional
   Total: 2-3 RTT before data
   
3. OSSIFICATION
   Middleboxes inspect TCP headers
   Hard to deploy new TCP features
   
4. NO ENCRYPTION BY DEFAULT
   TCP metadata visible
   Optional TLS adds latency

5. CONNECTION TIED TO IP
   Mobile users lose connection on network switch
   VPN reconnection issues

QUIC Fixes:
✅ Independent stream multiplexing
✅ 0-RTT connection establishment
✅ Encryption mandatory (metadata too)
✅ Connection migration
✅ UDP-based (bypasses middleboxes)

QUIC Protocol

QUIC (Quick UDP Internet Connections) is a transport protocol built on UDP. Google developed it, and it's now an IETF standard (RFC 9000). Over 25% of internet traffic uses QUIC.

Side-by-side comparison of QUIC 1-RTT handshake versus TCP plus TLS 1.3 multi-round-trip connection establishment
QUIC combines transport and encryption into a single 1-RTT handshake, compared to TCP+TLS 1.3 which requires separate handshakes totaling 2-3 round trips
Key Insight: QUIC isn't "UDP but reliable"—it's a complete transport redesign that happens to use UDP as its substrate.
Architecture

QUIC vs TCP+TLS Stack

Traditional Stack:
┌─────────────┐
│   HTTP/2    │
├─────────────┤
│    TLS 1.3  │
├─────────────┤
│     TCP     │
├─────────────┤
│     IP      │
└─────────────┘

QUIC Stack:
┌─────────────┐
│   HTTP/3    │
├─────────────┤
│    QUIC     │ ← Transport + Crypto combined
├─────────────┤
│     UDP     │
├─────────────┤
│     IP      │
└─────────────┘

QUIC includes:
• Reliable delivery
• Congestion control
• TLS 1.3 encryption
• Stream multiplexing
• Connection migration
0-RTT

QUIC Connection Establishment

QUIC Handshake (1-RTT):

Client                                 Server
   |                                      |
   |--- Initial [CRYPTO] ---------------->| 
   |    (ClientHello, key share)          |
   |                                      |
   |<-- Initial [CRYPTO] -----------------|
   |<-- Handshake [CRYPTO] ---------------|
   |    (ServerHello, cert, finished)     |
   |                                      |
   |--- Handshake [CRYPTO] -------------->|
   |--- 1-RTT [STREAM data] ------------->|
   |    (Application data!)               |
   |                                      |

Compare TCP + TLS 1.3:
• TCP: 1 RTT (SYN/SYN-ACK)
• TLS: 1 RTT (ClientHello/ServerHello)
• Total: 2 RTT before app data

QUIC 0-RTT Resumption:
• Returning clients send data immediately
• Server verifies with resumption token
• Risk: Replay attacks (mitigated)
# Check if site supports QUIC/HTTP/3

# Using curl (must be compiled with HTTP/3 support)
curl --http3 -I https://cloudflare.com
# alt-svc: h3=":443"

# Check using online tools
# https://http3check.net/

# Chrome DevTools
# Network tab → Protocol column shows "h3"

# Firefox about:networking
# Shows QUIC connections

# Wireshark filter
quic
# QUIC concepts demonstration

def quic_features():
    """Explain QUIC's key features"""
    
    print("QUIC Key Features")
    print("=" * 50)
    
    features = {
        "Stream Multiplexing": """
            Multiple independent streams in one connection.
            Lost packet on stream 1 doesn't block stream 2.
            
            HTTP/2 over TCP:
            Stream 1: [===X===] ← Packet lost
            Stream 2: [=======] ← Blocked waiting!
            
            HTTP/3 over QUIC:
            Stream 1: [===X===] ← Retransmit
            Stream 2: [=======] ← Continues!
        """,
        
        "Connection Migration": """
            Connection ID instead of IP:port tuple.
            Mobile user switches WiFi → cellular:
            
            TCP: Connection lost, reconnect
            QUIC: Same connection, new path
        """,
        
        "Encryption": """
            All headers encrypted (except first byte).
            Middleboxes can't inspect or modify.
            Connection ID visible for routing only.
        """,
        
        "Congestion Control": """
            Pluggable algorithms (CUBIC, BBR).
            Per-stream flow control.
            Connection-level flow control.
        """
    }
    
    for name, desc in features.items():
        print(f"\n{name}:")
        print(desc)

quic_features()
Adoption

QUIC Adoption Status

ProviderStatusNotes
Google✅ FullYouTube, Search, all services
Cloudflare✅ FullAll plans, enabled by default
Facebook✅ FullApps and web
Akamai✅ AvailableCDN edge support
nginx✅ ExperimentalSince 1.25.0
Apache⏳ In progressmod_http3

HTTP/3

HTTP/3 is HTTP over QUIC. Same semantics as HTTP/2 (headers, streams, push), but without TCP's limitations.

Comparison chart showing HTTP/1.1, HTTP/2, and HTTP/3 protocol stack differences including transport, multiplexing, and encryption layers
HTTP version evolution — HTTP/1.1 uses sequential TCP connections, HTTP/2 adds multiplexing over TCP, and HTTP/3 eliminates head-of-line blocking via QUIC
Comparison

HTTP Version Evolution

HTTP Version History:

HTTP/1.0 (1996):
• One request per TCP connection
• Connection: close

HTTP/1.1 (1997):
• Keep-alive connections
• Pipelining (rarely used)
• Still head-of-line blocking

HTTP/2 (2015):
• Binary framing
• Multiplexed streams
• Header compression (HPACK)
• Server push
• TCP head-of-line blocking remains

HTTP/3 (2022):
• QUIC transport (UDP-based)
• No head-of-line blocking
• 0-RTT connection resumption
• Header compression (QPACK)
• Connection migration
# Enable HTTP/3 on Cloudflare
# Dashboard → Speed → Optimization → HTTP/3 (with QUIC)

# nginx HTTP/3 configuration (experimental)
server {
    listen 443 quic reuseport;
    listen 443 ssl;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    # Advertise HTTP/3 support
    add_header Alt-Svc 'h3=":443"; ma=86400';
    
    # HTTP/3 specific
    http3 on;
    quic_retry on;
}

# Test HTTP/3
curl --http3 -v https://your-site.com
# HTTP/3 client with httpx (experimental)

def http3_example():
    """HTTP/3 client demonstration"""
    
    print("HTTP/3 Client Example")
    print("=" * 50)
    
    print("""
    # Using httpx with HTTP/3 support
    # pip install httpx[http2] aioquic
    
    import httpx
    
    # HTTP/3 requires async
    async def fetch_h3():
        async with httpx.AsyncClient(http2=True) as client:
            # Will auto-upgrade to HTTP/3 if available
            response = await client.get('https://cloudflare.com')
            print(f"HTTP Version: {response.http_version}")
            print(f"Status: {response.status_code}")
    
    # Check Alt-Svc header for HTTP/3 support
    response = httpx.get('https://cloudflare.com')
    alt_svc = response.headers.get('alt-svc', '')
    print(f"Alt-Svc: {alt_svc}")
    # h3=":443" indicates HTTP/3 support
    """)
    
    print("\nHTTP/3 Benefits:")
    print("• Faster page loads (0-RTT)")
    print("• Better on lossy networks (mobile)")
    print("• No head-of-line blocking")
    print("• Seamless network switching")

http3_example()

WebTransport

WebTransport is a web API for low-latency bidirectional communication over HTTP/3. Better than WebSockets for real-time applications.

Architecture diagram comparing WebTransport QUIC-based multiple streams with WebSocket single TCP-based ordered connection
WebTransport supports multiple independent streams (reliable and unreliable) over QUIC, while WebSocket is limited to a single ordered TCP stream
vs WebSocket

WebTransport vs WebSocket

WebTransport vs WebSocket:

WebSocket:
• TCP-based (head-of-line blocking)
• Single ordered stream
• Reliable delivery only
• Established, wide support

WebTransport:
• QUIC-based (no HOL blocking)
• Multiple streams (ordered/unordered)
• Reliable OR unreliable delivery
• Newer, growing support

Use WebTransport for:
• Real-time gaming
• Live streaming
• Collaborative editing
• IoT data streams

Use WebSocket for:
• Simple chat applications
• Broad browser support needed
• Proxy/firewall traversal
// WebTransport client (browser)

async function webTransportDemo() {
    // Connect to WebTransport server
    const transport = new WebTransport('https://example.com/wt');
    
    await transport.ready;
    console.log('Connected!');
    
    // Bidirectional stream (reliable, ordered)
    const stream = await transport.createBidirectionalStream();
    const writer = stream.writable.getWriter();
    const reader = stream.readable.getReader();
    
    // Send data
    await writer.write(new TextEncoder().encode('Hello!'));
    
    // Receive data
    const { value } = await reader.read();
    console.log('Received:', new TextDecoder().decode(value));
    
    // Unidirectional stream (for one-way data)
    const uniStream = await transport.createUnidirectionalStream();
    
    // Datagrams (unreliable, unordered - for real-time)
    const datagramWriter = transport.datagrams.writable.getWriter();
    await datagramWriter.write(new Uint8Array([1, 2, 3]));
    
    // Close
    transport.close();
}

// Server-sent events via WebTransport
async function receiveUpdates(transport) {
    const reader = transport.incomingUnidirectionalStreams.getReader();
    
    while (true) {
        const { value: stream, done } = await reader.read();
        if (done) break;
        
        // Process incoming stream
        const streamReader = stream.getReader();
        const { value } = await streamReader.read();
        console.log('Update:', new TextDecoder().decode(value));
    }
}

Future Trends

Overview diagram of emerging protocols including MASQUE tunneling, OHTTP privacy relay, DNS over QUIC, and Multipath QUIC
Emerging protocols push toward privacy and performance — MASQUE tunnels any protocol over HTTP/3, OHTTP hides client identity, and Multipath QUIC bonds network paths
Emerging

What's Coming Next

Emerging Protocols & Trends:

1. MASQUE (Multiplexed Application Substrate)
   • Tunnel any protocol over HTTP/3
   • Modern VPN alternative
   • Apple iCloud Private Relay uses it

2. OHTTP (Oblivious HTTP)
   • Privacy-preserving HTTP
   • Client → Relay → Gateway → Origin
   • Hides client IP from origin

3. DNS over QUIC (DoQ)
   • DNS queries over QUIC
   • Faster than DoH/DoT
   • RFC 9250

4. WebCodecs + WebTransport
   • Low-latency video streaming
   • Game streaming (Stadia-like)

5. Multipath QUIC
   • Use multiple network paths
   • WiFi + cellular simultaneously

6. Encrypted Client Hello (ECH)
   • Hide SNI (server name indicator)
   • Privacy enhancement for TLS
# Protocol adoption timeline

def protocol_timeline():
    """Show protocol evolution"""
    
    timeline = [
        ("1983", "TCP/IP", "Foundation of internet"),
        ("1996", "HTTP/1.0", "Web begins"),
        ("1999", "TLS 1.0", "Secure web"),
        ("2015", "HTTP/2", "Multiplexing"),
        ("2018", "TLS 1.3", "Faster, simpler"),
        ("2021", "QUIC RFC", "New transport"),
        ("2022", "HTTP/3 RFC", "HTTP over QUIC"),
        ("2023+", "WebTransport", "Real-time web"),
        ("Future", "MASQUE/OHTTP", "Privacy-first"),
    ]
    
    print("Protocol Evolution Timeline")
    print("=" * 50)
    
    for year, protocol, description in timeline:
        print(f"{year:6} │ {protocol:15} │ {description}")

protocol_timeline()

High-Performance Data Centre Interconnects

While QUIC and HTTP/3 speed up the public internet, a parallel revolution is happening inside data centres and supercomputers. Traditional TCP/IP was designed for wide-area networks where packets traverse many hops, but within a data centre rack (or between GPUs on the same board) that overhead is wasteful. A family of technologies has emerged to move data between servers, GPUs, and memory pools with microsecond latency instead of millisecond latency—often bypassing the operating system entirely.

RDMA (Remote Direct Memory Access)

RDMA lets one computer read or write the memory of another computer directly, without involving the remote CPU or operating system. In normal networking, data must travel: Application → OS kernel → NIC → wire → NIC → OS kernel → Application. RDMA eliminates the kernel on both sides—the network adapter (NIC) reads from or writes to application memory in a single operation. This achieves latencies under 2 microseconds and bandwidths exceeding 400 Gbps.

Why RDMA Matters

Every kernel crossing (system call) adds ~1–5 µs of latency and consumes CPU cycles for copying data. For high-frequency trading, distributed databases (like SAP HANA or Oracle RAC), and AI training that exchanges gradients millions of times per second, these microseconds compound into seconds of wasted time. RDMA removes that overhead entirely.

RDMA Operations

Operation Description CPU Involvement
RDMA ReadLocal NIC reads from remote memoryRemote CPU not notified
RDMA WriteLocal NIC writes to remote memoryRemote CPU not notified
Send / ReceiveTwo-sided message passingBoth CPUs involved
AtomicCompare-and-swap or fetch-and-add on remote memoryExecuted by remote NIC hardware

Key concept: RDMA Read and Write are one-sided—the remote server's CPU never knows it happened. This is what makes RDMA so fast: zero context switches, zero memory copies, zero interrupts on the remote side.

InfiniBand

InfiniBand is a dedicated high-performance network technology purpose-built for RDMA. Unlike Ethernet (which was designed for office LANs and later adapted for data centres), InfiniBand was designed from day one for ultra-low latency, lossless delivery, and direct memory access. It uses its own switches, cables, and host channel adapters (HCAs) instead of standard Ethernet NICs.

InfiniBand Speed Tiers

Generation Per-Lane Speed 4× Link Speed Typical Latency
SDR2.5 Gbps10 Gbps~5 µs
FDR14 Gbps56 Gbps~1.3 µs
HDR25 Gbps200 Gbps~0.6 µs
NDR100 Gbps400 Gbps~0.5 µs
XDR (2025)200 Gbps800 Gbps<0.5 µs

Where it's used: HPC clusters (Top500 supercomputers), AI training farms (NVIDIA DGX systems), financial trading platforms, and large-scale storage systems. NVIDIA acquired Mellanox (the dominant InfiniBand vendor) in 2020 for $6.9B, signalling how critical this technology is for AI infrastructure.

RoCE (RDMA over Converged Ethernet)

RoCE (pronounced "rocky") brings RDMA capabilities to standard Ethernet networks, eliminating the need for specialised InfiniBand hardware. This is significant because most data centres already have Ethernet infrastructure—RoCE lets them gain RDMA performance benefits without replacing every switch and cable.

RoCE Versions

Version Transport Routable? Use Case
RoCE v1Ethernet L2 frames onlyNo (same subnet)Single rack / small cluster
RoCE v2UDP/IP encapsulationYes (across subnets)Data centre–wide RDMA

InfiniBand vs RoCE: When to Choose What

  • InfiniBand: Best for dedicated HPC/AI clusters where maximum performance matters and you control the entire network (e.g., NVIDIA DGX SuperPOD)
  • RoCE v2: Best when RDMA is needed over existing Ethernet infrastructure, or when sharing the network with non-RDMA traffic (e.g., Azure cloud instances with RDMA)
  • Key trade-off: InfiniBand guarantees lossless delivery in hardware; RoCE requires careful Ethernet configuration (PFC, ECN) to avoid packet drops that devastate RDMA performance

DPDK (Data Plane Development Kit)

DPDK takes a different approach to high-speed networking. Instead of hardware-level RDMA, DPDK is a software framework that bypasses the Linux kernel's networking stack entirely. It gives user-space applications direct access to the NIC hardware via poll-mode drivers, eliminating interrupts and context switches. Originally developed by Intel, DPDK is now the foundation for software-defined networking equipment, virtual switches, and telecom infrastructure.

How DPDK Bypasses the Kernel

Traditional packet path (Linux kernel):
  NIC → IRQ → Kernel driver → sk_buff alloc → TCP/IP stack
  → socket buffer → copy to user space → Application
  ⏱ Overhead: ~10-20 µs per packet, CPU-intensive

DPDK packet path (kernel bypass):
  NIC → DMA to user-space hugepage memory → Poll-mode driver
  → Application processes packet directly
  ⏱ Overhead: ~1-2 µs per packet, line-rate processing

Key DPDK techniques:
  • Poll Mode Drivers (PMD) — no interrupts, CPU polls NIC
  • Hugepages (2 MB / 1 GB) — reduced TLB misses
  • Lockless ring buffers — zero-copy between cores
  • CPU affinity — pin threads to cores, no scheduling jitter
  • Batch processing — handle 32+ packets per function call

Where it's used: Software routers (VPP/FD.io), virtual switches (Open vSwitch with DPDK), 5G user-plane (UPF), NFV appliances, packet capture tools, and cloud provider network infrastructure. DPDK can process 100+ million packets per second on a single server.

DSM (Distributed Shared Memory)

Distributed Shared Memory creates the illusion that multiple machines share a single, unified memory space—even though physically each machine has its own local RAM. Applications can read and write to any address as if it were local memory, and the DSM system transparently handles fetching data from remote machines. This simplifies programming dramatically: instead of writing explicit network send/receive code, developers use familiar pointers and memory operations.

How DSM Works

Physical reality:
  Machine A: [RAM 0x0000 - 0x3FFF]  (local)
  Machine B: [RAM 0x0000 - 0x3FFF]  (local)

DSM virtual view (what the application sees):
  Unified address space: [0x0000 - 0x7FFF]
  • Address 0x1000 → Machine A's local RAM  ⚡ fast
  • Address 0x5000 → Machine B's RAM        📡 fetched via network

Consistency models:
  • Sequential — all nodes see same order (slow, simple)
  • Release — sync only at lock/unlock points (fast, complex)
  • Lazy release — defer sync until data actually needed (fastest)

DSM in Practice

Pure software DSM (like Treadmarks or Grappa) suffered from high latency over traditional networks. Modern DSM revival is driven by RDMA and CXL: RDMA-based DSM systems (like FaRM from Microsoft Research) achieve single-digit microsecond remote reads, and CXL 3.0's hardware-coherent shared memory makes DSM practical at rack scale. Today, DSM concepts underpin disaggregated memory architectures in hyperscale data centres.

CXL (Compute Express Link)

CXL is an open industry standard built on PCIe physical layer that enables cache-coherent communication between CPUs, accelerators (GPUs, FPGAs), and memory devices. Think of it as giving devices a shared, coherent view of memory—the CPU and a GPU can both read and write the same memory region with automatic hardware cache synchronisation, eliminating the need for explicit data copies between host and device memory.

CXL Protocol Types

Sub-Protocol Purpose Example Use
CXL.ioStandard PCIe I/O (discovery, config, DMA)Device enumeration, driver communication
CXL.cacheDevice caches host memory with coherencyGPU/FPGA accelerator caches hot data from host RAM
CXL.memHost accesses device-attached memoryMemory expander adds 512 GB to server via CXL DIMM

CXL Versions & Capabilities

Version PCIe Base Key Addition
CXL 1.1PCIe 5.0Single host ↔ single device coherency
CXL 2.0PCIe 5.0CXL switches — multiple hosts share a memory pool
CXL 3.0PCIe 6.0Multi-level switching, hardware-coherent shared memory across racks, peer-to-peer
CXL 3.1PCIe 6.0Enhanced security, port-based routing, TSP (security protocol)

Impact: CXL enables "memory disaggregation"—instead of buying servers with fixed RAM, data centres can add memory independently as CXL-attached pools. A server that needs 2 TB of RAM for a brief analytics job can dynamically allocate CXL memory from a shared pool, then release it for other workloads. Intel, AMD, ARM, Samsung, and Microsoft are all shipping CXL products.

NVLink is NVIDIA's proprietary high-bandwidth interconnect designed specifically for GPU-to-GPU and GPU-to-CPU communication. Standard PCIe bottlenecks multi-GPU AI training because gradient synchronisation requires terabytes of data to flow between GPUs every second. NVLink solves this with a dedicated, high-bandwidth, cache-coherent link that makes multiple GPUs behave almost like a single, larger GPU.

NVLink Evolution

Generation GPU Architecture Bandwidth (per GPU) Notable Feature
NVLink 1.0Pascal (P100)160 GB/sFirst GPU-to-GPU high-speed link
NVLink 2.0Volta (V100)300 GB/sCPU-GPU coherency (IBM POWER9)
NVLink 3.0Ampere (A100)600 GB/sNVSwitch for all-to-all topology
NVLink 4.0Hopper (H100)900 GB/sNVLink Switch — 256 GPUs fully connected
NVLink 5.0Blackwell (B200)1,800 GB/sNVLink-C2C chip-to-chip, 576 GPUs

NVLink vs PCIe: The Bandwidth Gap

PCIe 5.0 x16 delivers ~64 GB/s per direction. NVLink 4.0 delivers 900 GB/s—over 14× more bandwidth. For large language model training where 8 GPUs must synchronise billions of parameters every iteration, this difference means the network is no longer the bottleneck. NVSwitch extends this further: instead of each GPU connecting to a few neighbours, a dedicated NVLink switch fabric provides full bisection bandwidth so every GPU can talk to every other GPU at full speed simultaneously.

Interconnect Comparison Summary

Technology Type Max Bandwidth Latency Primary Use
Ethernet (100G)Network100 Gbps~10-50 µsGeneral data centre networking
RoCE v2Network (RDMA)400 Gbps~2-5 µsRDMA over existing Ethernet
InfiniBand NDRNetwork (RDMA)400 Gbps~0.5 µsHPC, AI training clusters
DPDKSoftware frameworkLine-rate~1-2 µsSoftware routers, NFV, 5G
CXL 3.0CPU-device link64 GB/s (PCIe 6.0)~100-300 nsMemory pooling, accelerators
NVLink 4.0GPU-GPU link900 GB/s~50 nsMulti-GPU AI training
DSMProgramming modelDepends on transportVariesShared memory illusion across nodes

Summary & Next Steps

Key Takeaways:
  • QUIC: New transport on UDP, replaces TCP+TLS
  • HTTP/3: HTTP over QUIC, no HOL blocking
  • 0-RTT: Immediate data on resumed connections
  • WebTransport: Low-latency bidirectional streams
  • Connection migration: Seamless network switching
Quiz

Test Your Knowledge

  1. QUIC transport layer? (UDP)
  2. HTTP/3 solves what TCP problem? (Head-of-line blocking)
  3. 0-RTT tradeoff? (Replay attack risk)
  4. WebTransport vs WebSocket? (Unreliable delivery, multiple streams)