Back to Technology

Complete Protocols Master Part 19: Emerging Protocols

January 31, 2026 Wasil Zafar 50 min read

Explore the future of networking: QUIC replaces TCP+TLS, HTTP/3 brings speed, and WebTransport enables low-latency bidirectional communication.

Introduction
QUIC Protocol
HTTP/3
WebTransport
Future Trends
Data Centre Interconnects
- RDMA
- InfiniBand
- RoCE
- DPDK
- DSM
- CXL
- NVLink
Summary

Introduction: The Next Generation

TCP has served us for 40+ years, but its limitations are showing. QUIC is a ground-up redesign of transport—faster connections, no head-of-line blocking, and built-in encryption.

Timeline diagram showing the evolution from TCP/IP through HTTP/2 and TLS 1.3 to QUIC, HTTP/3, and WebTransport — The next generation of internet protocols — QUIC replaces TCP+TLS at the transport layer, enabling HTTP/3 and WebTransport for faster, more resilient connections

                        
                        Series Context: This is Part 19 of 20 in the Complete Protocols Master series. These protocols are reshaping the transport and application layers.
                    

Protocol Mastery

Your 20-step learning path • Currently on Step 19

Emerging Protocols

QUIC, HTTP/3, WebTransport

You Are Here

Web Security Standards

CORS, CSP, HSTS, SRI

Problems

Why Replace TCP?

TCP Limitations:

1. HEAD-OF-LINE BLOCKING
   One lost packet blocks ALL streams
   HTTP/2 multiplexing limited by TCP
   
2. SLOW CONNECTION SETUP
   TCP: 1 RTT (SYN, SYN-ACK, ACK)
   TLS: 1-2 RTT additional
   Total: 2-3 RTT before data
   
3. OSSIFICATION
   Middleboxes inspect TCP headers
   Hard to deploy new TCP features
   
4. NO ENCRYPTION BY DEFAULT
   TCP metadata visible
   Optional TLS adds latency

5. CONNECTION TIED TO IP
   Mobile users lose connection on network switch
   VPN reconnection issues

QUIC Fixes:
✅ Independent stream multiplexing
✅ 0-RTT connection establishment
✅ Encryption mandatory (metadata too)
✅ Connection migration
✅ UDP-based (bypasses middleboxes)

QUIC Protocol

QUIC (Quick UDP Internet Connections) is a transport protocol built on UDP. Google developed it, and it's now an IETF standard (RFC 9000). Over 25% of internet traffic uses QUIC.

Side-by-side comparison of QUIC 1-RTT handshake versus TCP plus TLS 1.3 multi-round-trip connection establishment — QUIC combines transport and encryption into a single 1-RTT handshake, compared to TCP+TLS 1.3 which requires separate handshakes totaling 2-3 round trips

                        
                        Key Insight: QUIC isn't "UDP but reliable"—it's a complete transport redesign that happens to use UDP as its substrate.
                    

Architecture

QUIC vs TCP+TLS Stack

Traditional Stack:
┌─────────────┐
│   HTTP/2    │
├─────────────┤
│    TLS 1.3  │
├─────────────┤
│     TCP     │
├─────────────┤
│     IP      │
└─────────────┘

QUIC Stack:
┌─────────────┐
│   HTTP/3    │
├─────────────┤
│    QUIC     │ ← Transport + Crypto combined
├─────────────┤
│     UDP     │
├─────────────┤
│     IP      │
└─────────────┘

QUIC includes:
• Reliable delivery
• Congestion control
• TLS 1.3 encryption
• Stream multiplexing
• Connection migration

0-RTT

QUIC Connection Establishment

QUIC Handshake (1-RTT):

Client                                 Server
   |                                      |
   |--- Initial [CRYPTO] ---------------->| 
   |    (ClientHello, key share)          |
   |                                      |
   |<-- Initial [CRYPTO] -----------------|
   |<-- Handshake [CRYPTO] ---------------|
   |    (ServerHello, cert, finished)     |
   |                                      |
   |--- Handshake [CRYPTO] -------------->|
   |--- 1-RTT [STREAM data] ------------->|
   |    (Application data!)               |
   |                                      |

Compare TCP + TLS 1.3:
• TCP: 1 RTT (SYN/SYN-ACK)
• TLS: 1 RTT (ClientHello/ServerHello)
• Total: 2 RTT before app data

QUIC 0-RTT Resumption:
• Returning clients send data immediately
• Server verifies with resumption token
• Risk: Replay attacks (mitigated)

# Check if site supports QUIC/HTTP/3

# Using curl (must be compiled with HTTP/3 support)
curl --http3 -I https://cloudflare.com
# alt-svc: h3=":443"

# Check using online tools
# https://http3check.net/

# Chrome DevTools
# Network tab → Protocol column shows "h3"

# Firefox about:networking
# Shows QUIC connections

# Wireshark filter
quic

# QUIC concepts demonstration

def quic_features():
    """Explain QUIC's key features"""
    
    print("QUIC Key Features")
    print("=" * 50)
    
    features = {
        "Stream Multiplexing": """
            Multiple independent streams in one connection.
            Lost packet on stream 1 doesn't block stream 2.
            
            HTTP/2 over TCP:
            Stream 1: [===X===] ← Packet lost
            Stream 2: [=======] ← Blocked waiting!
            
            HTTP/3 over QUIC:
            Stream 1: [===X===] ← Retransmit
            Stream 2: [=======] ← Continues!
        """,
        
        "Connection Migration": """
            Connection ID instead of IP:port tuple.
            Mobile user switches WiFi → cellular:
            
            TCP: Connection lost, reconnect
            QUIC: Same connection, new path
        """,
        
        "Encryption": """
            All headers encrypted (except first byte).
            Middleboxes can't inspect or modify.
            Connection ID visible for routing only.
        """,
        
        "Congestion Control": """
            Pluggable algorithms (CUBIC, BBR).
            Per-stream flow control.
            Connection-level flow control.
        """
    }
    
    for name, desc in features.items():
        print(f"\n{name}:")
        print(desc)

quic_features()

Adoption

QUIC Adoption Status

Provider	Status	Notes
Google	✅ Full	YouTube, Search, all services
Cloudflare	✅ Full	All plans, enabled by default
Facebook	✅ Full	Apps and web
Akamai	✅ Available	CDN edge support
nginx	✅ Experimental	Since 1.25.0
Apache	⏳ In progress	mod_http3

HTTP/3

HTTP/3 is HTTP over QUIC. Same semantics as HTTP/2 (headers, streams, push), but without TCP's limitations.

Comparison

HTTP Version Evolution

HTTP Version History:

HTTP/1.0 (1996):
• One request per TCP connection
• Connection: close

HTTP/1.1 (1997):
• Keep-alive connections
• Pipelining (rarely used)
• Still head-of-line blocking

HTTP/2 (2015):
• Binary framing
• Multiplexed streams
• Header compression (HPACK)
• Server push
• TCP head-of-line blocking remains

HTTP/3 (2022):
• QUIC transport (UDP-based)
• No head-of-line blocking
• 0-RTT connection resumption
• Header compression (QPACK)
• Connection migration

# Enable HTTP/3 on Cloudflare
# Dashboard → Speed → Optimization → HTTP/3 (with QUIC)

# nginx HTTP/3 configuration (experimental)
server {
    listen 443 quic reuseport;
    listen 443 ssl;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    # Advertise HTTP/3 support
    add_header Alt-Svc 'h3=":443"; ma=86400';
    
    # HTTP/3 specific
    http3 on;
    quic_retry on;
}

# Test HTTP/3
curl --http3 -v https://your-site.com

# HTTP/3 client with httpx (experimental)

def http3_example():
    """HTTP/3 client demonstration"""
    
    print("HTTP/3 Client Example")
    print("=" * 50)
    
    print("""
    # Using httpx with HTTP/3 support
    # pip install httpx[http2] aioquic
    
    import httpx
    
    # HTTP/3 requires async
    async def fetch_h3():
        async with httpx.AsyncClient(http2=True) as client:
            # Will auto-upgrade to HTTP/3 if available
            response = await client.get('https://cloudflare.com')
            print(f"HTTP Version: {response.http_version}")
            print(f"Status: {response.status_code}")
    
    # Check Alt-Svc header for HTTP/3 support
    response = httpx.get('https://cloudflare.com')
    alt_svc = response.headers.get('alt-svc', '')
    print(f"Alt-Svc: {alt_svc}")
    # h3=":443" indicates HTTP/3 support
    """)
    
    print("\nHTTP/3 Benefits:")
    print("• Faster page loads (0-RTT)")
    print("• Better on lossy networks (mobile)")
    print("• No head-of-line blocking")
    print("• Seamless network switching")

http3_example()

WebTransport

WebTransport is a web API for low-latency bidirectional communication over HTTP/3. Better than WebSockets for real-time applications.

vs WebSocket

WebTransport vs WebSocket

WebTransport vs WebSocket:

WebSocket:
• TCP-based (head-of-line blocking)
• Single ordered stream
• Reliable delivery only
• Established, wide support

WebTransport:
• QUIC-based (no HOL blocking)
• Multiple streams (ordered/unordered)
• Reliable OR unreliable delivery
• Newer, growing support

Use WebTransport for:
• Real-time gaming
• Live streaming
• Collaborative editing
• IoT data streams

Use WebSocket for:
• Simple chat applications
• Broad browser support needed
• Proxy/firewall traversal

// WebTransport client (browser)

async function webTransportDemo() {
    // Connect to WebTransport server
    const transport = new WebTransport('https://example.com/wt');
    
    await transport.ready;
    console.log('Connected!');
    
    // Bidirectional stream (reliable, ordered)
    const stream = await transport.createBidirectionalStream();
    const writer = stream.writable.getWriter();
    const reader = stream.readable.getReader();
    
    // Send data
    await writer.write(new TextEncoder().encode('Hello!'));
    
    // Receive data
    const { value } = await reader.read();
    console.log('Received:', new TextDecoder().decode(value));
    
    // Unidirectional stream (for one-way data)
    const uniStream = await transport.createUnidirectionalStream();
    
    // Datagrams (unreliable, unordered - for real-time)
    const datagramWriter = transport.datagrams.writable.getWriter();
    await datagramWriter.write(new Uint8Array([1, 2, 3]));
    
    // Close
    transport.close();
}

// Server-sent events via WebTransport
async function receiveUpdates(transport) {
    const reader = transport.incomingUnidirectionalStreams.getReader();
    
    while (true) {
        const { value: stream, done } = await reader.read();
        if (done) break;
        
        // Process incoming stream
        const streamReader = stream.getReader();
        const { value } = await streamReader.read();
        console.log('Update:', new TextDecoder().decode(value));
    }
}

Future Trends

Emerging

What's Coming Next

Emerging Protocols & Trends:

1. MASQUE (Multiplexed Application Substrate)
   • Tunnel any protocol over HTTP/3
   • Modern VPN alternative
   • Apple iCloud Private Relay uses it

2. OHTTP (Oblivious HTTP)
   • Privacy-preserving HTTP
   • Client → Relay → Gateway → Origin
   • Hides client IP from origin

3. DNS over QUIC (DoQ)
   • DNS queries over QUIC
   • Faster than DoH/DoT
   • RFC 9250

4. WebCodecs + WebTransport
   • Low-latency video streaming
   • Game streaming (Stadia-like)

5. Multipath QUIC
   • Use multiple network paths
   • WiFi + cellular simultaneously

6. Encrypted Client Hello (ECH)
   • Hide SNI (server name indicator)
   • Privacy enhancement for TLS

# Protocol adoption timeline

def protocol_timeline():
    """Show protocol evolution"""
    
    timeline = [
        ("1983", "TCP/IP", "Foundation of internet"),
        ("1996", "HTTP/1.0", "Web begins"),
        ("1999", "TLS 1.0", "Secure web"),
        ("2015", "HTTP/2", "Multiplexing"),
        ("2018", "TLS 1.3", "Faster, simpler"),
        ("2021", "QUIC RFC", "New transport"),
        ("2022", "HTTP/3 RFC", "HTTP over QUIC"),
        ("2023+", "WebTransport", "Real-time web"),
        ("Future", "MASQUE/OHTTP", "Privacy-first"),
    ]
    
    print("Protocol Evolution Timeline")
    print("=" * 50)
    
    for year, protocol, description in timeline:
        print(f"{year:6} │ {protocol:15} │ {description}")

protocol_timeline()

High-Performance Data Centre Interconnects

While QUIC and HTTP/3 speed up the public internet, a parallel revolution is happening inside data centres and supercomputers. Traditional TCP/IP was designed for wide-area networks where packets traverse many hops, but within a data centre rack (or between GPUs on the same board) that overhead is wasteful. A family of technologies has emerged to move data between servers, GPUs, and memory pools with microsecond latency instead of millisecond latency—often bypassing the operating system entirely.

RDMA (Remote Direct Memory Access)

RDMA lets one computer read or write the memory of another computer directly, without involving the remote CPU or operating system. In normal networking, data must travel: Application → OS kernel → NIC → wire → NIC → OS kernel → Application. RDMA eliminates the kernel on both sides—the network adapter (NIC) reads from or writes to application memory in a single operation. This achieves latencies under 2 microseconds and bandwidths exceeding 400 Gbps.

Why RDMA Matters

Every kernel crossing (system call) adds ~1–5 µs of latency and consumes CPU cycles for copying data. For high-frequency trading, distributed databases (like SAP HANA or Oracle RAC), and AI training that exchanges gradients millions of times per second, these microseconds compound into seconds of wasted time. RDMA removes that overhead entirely.

RDMA Operations

Operation	Description	CPU Involvement
RDMA Read	Local NIC reads from remote memory	Remote CPU not notified
RDMA Write	Local NIC writes to remote memory	Remote CPU not notified
Send / Receive	Two-sided message passing	Both CPUs involved
Atomic	Compare-and-swap or fetch-and-add on remote memory	Executed by remote NIC hardware

Key concept: RDMA Read and Write are one-sided—the remote server's CPU never knows it happened. This is what makes RDMA so fast: zero context switches, zero memory copies, zero interrupts on the remote side.

InfiniBand

InfiniBand is a dedicated high-performance network technology purpose-built for RDMA. Unlike Ethernet (which was designed for office LANs and later adapted for data centres), InfiniBand was designed from day one for ultra-low latency, lossless delivery, and direct memory access. It uses its own switches, cables, and host channel adapters (HCAs) instead of standard Ethernet NICs.

InfiniBand Speed Tiers

Generation	Per-Lane Speed	4× Link Speed	Typical Latency
SDR	2.5 Gbps	10 Gbps	~5 µs
FDR	14 Gbps	56 Gbps	~1.3 µs
HDR	25 Gbps	200 Gbps	~0.6 µs
NDR	100 Gbps	400 Gbps	~0.5 µs
XDR (2025)	200 Gbps	800 Gbps	<0.5 µs

Where it's used: HPC clusters (Top500 supercomputers), AI training farms (NVIDIA DGX systems), financial trading platforms, and large-scale storage systems. NVIDIA acquired Mellanox (the dominant InfiniBand vendor) in 2020 for $6.9B, signalling how critical this technology is for AI infrastructure.

RoCE (RDMA over Converged Ethernet)

RoCE (pronounced "rocky") brings RDMA capabilities to standard Ethernet networks, eliminating the need for specialised InfiniBand hardware. This is significant because most data centres already have Ethernet infrastructure—RoCE lets them gain RDMA performance benefits without replacing every switch and cable.

RoCE Versions

Version	Transport	Routable?	Use Case
RoCE v1	Ethernet L2 frames only	No (same subnet)	Single rack / small cluster
RoCE v2	UDP/IP encapsulation	Yes (across subnets)	Data centre–wide RDMA

                        InfiniBand vs RoCE: When to Choose What
                        InfiniBand: Best for dedicated HPC/AI clusters where maximum performance matters and you control the entire network (e.g., NVIDIA DGX SuperPOD)
RoCE v2: Best when RDMA is needed over existing Ethernet infrastructure, or when sharing the network with non-RDMA traffic (e.g., Azure cloud instances with RDMA)
Key trade-off: InfiniBand guarantees lossless delivery in hardware; RoCE requires careful Ethernet configuration (PFC, ECN) to avoid packet drops that devastate RDMA performance

                    

DPDK (Data Plane Development Kit)

DPDK takes a different approach to high-speed networking. Instead of hardware-level RDMA, DPDK is a software framework that bypasses the Linux kernel's networking stack entirely. It gives user-space applications direct access to the NIC hardware via poll-mode drivers, eliminating interrupts and context switches. Originally developed by Intel, DPDK is now the foundation for software-defined networking equipment, virtual switches, and telecom infrastructure.

How DPDK Bypasses the Kernel

Traditional packet path (Linux kernel):
  NIC → IRQ → Kernel driver → sk_buff alloc → TCP/IP stack
  → socket buffer → copy to user space → Application
  ⏱ Overhead: ~10-20 µs per packet, CPU-intensive

DPDK packet path (kernel bypass):
  NIC → DMA to user-space hugepage memory → Poll-mode driver
  → Application processes packet directly
  ⏱ Overhead: ~1-2 µs per packet, line-rate processing

Key DPDK techniques:
  • Poll Mode Drivers (PMD) — no interrupts, CPU polls NIC
  • Hugepages (2 MB / 1 GB) — reduced TLB misses
  • Lockless ring buffers — zero-copy between cores
  • CPU affinity — pin threads to cores, no scheduling jitter
  • Batch processing — handle 32+ packets per function call

Where it's used: Software routers (VPP/FD.io), virtual switches (Open vSwitch with DPDK), 5G user-plane (UPF), NFV appliances, packet capture tools, and cloud provider network infrastructure. DPDK can process 100+ million packets per second on a single server.

DSM (Distributed Shared Memory)

Distributed Shared Memory creates the illusion that multiple machines share a single, unified memory space—even though physically each machine has its own local RAM. Applications can read and write to any address as if it were local memory, and the DSM system transparently handles fetching data from remote machines. This simplifies programming dramatically: instead of writing explicit network send/receive code, developers use familiar pointers and memory operations.

How DSM Works

Physical reality:
  Machine A: [RAM 0x0000 - 0x3FFF]  (local)
  Machine B: [RAM 0x0000 - 0x3FFF]  (local)

DSM virtual view (what the application sees):
  Unified address space: [0x0000 - 0x7FFF]
  • Address 0x1000 → Machine A's local RAM  ⚡ fast
  • Address 0x5000 → Machine B's RAM        📡 fetched via network

Consistency models:
  • Sequential — all nodes see same order (slow, simple)
  • Release — sync only at lock/unlock points (fast, complex)
  • Lazy release — defer sync until data actually needed (fastest)

DSM in Practice

Pure software DSM (like Treadmarks or Grappa) suffered from high latency over traditional networks. Modern DSM revival is driven by RDMA and CXL: RDMA-based DSM systems (like FaRM from Microsoft Research) achieve single-digit microsecond remote reads, and CXL 3.0's hardware-coherent shared memory makes DSM practical at rack scale. Today, DSM concepts underpin disaggregated memory architectures in hyperscale data centres.

CXL (Compute Express Link)

CXL is an open industry standard built on PCIe physical layer that enables cache-coherent communication between CPUs, accelerators (GPUs, FPGAs), and memory devices. Think of it as giving devices a shared, coherent view of memory—the CPU and a GPU can both read and write the same memory region with automatic hardware cache synchronisation, eliminating the need for explicit data copies between host and device memory.

CXL Protocol Types

Sub-Protocol	Purpose	Example Use
CXL.io	Standard PCIe I/O (discovery, config, DMA)	Device enumeration, driver communication
CXL.cache	Device caches host memory with coherency	GPU/FPGA accelerator caches hot data from host RAM
CXL.mem	Host accesses device-attached memory	Memory expander adds 512 GB to server via CXL DIMM

CXL Versions & Capabilities

Version	PCIe Base	Key Addition
CXL 1.1	PCIe 5.0	Single host ↔ single device coherency
CXL 2.0	PCIe 5.0	CXL switches — multiple hosts share a memory pool
CXL 3.0	PCIe 6.0	Multi-level switching, hardware-coherent shared memory across racks, peer-to-peer
CXL 3.1	PCIe 6.0	Enhanced security, port-based routing, TSP (security protocol)

Impact: CXL enables "memory disaggregation"—instead of buying servers with fixed RAM, data centres can add memory independently as CXL-attached pools. A server that needs 2 TB of RAM for a brief analytics job can dynamically allocate CXL memory from a shared pool, then release it for other workloads. Intel, AMD, ARM, Samsung, and Microsoft are all shipping CXL products.

NVLink

NVLink is NVIDIA's proprietary high-bandwidth interconnect designed specifically for GPU-to-GPU and GPU-to-CPU communication. Standard PCIe bottlenecks multi-GPU AI training because gradient synchronisation requires terabytes of data to flow between GPUs every second. NVLink solves this with a dedicated, high-bandwidth, cache-coherent link that makes multiple GPUs behave almost like a single, larger GPU.

NVLink Evolution

Generation	GPU Architecture	Bandwidth (per GPU)	Notable Feature
NVLink 1.0	Pascal (P100)	160 GB/s	First GPU-to-GPU high-speed link
NVLink 2.0	Volta (V100)	300 GB/s	CPU-GPU coherency (IBM POWER9)
NVLink 3.0	Ampere (A100)	600 GB/s	NVSwitch for all-to-all topology
NVLink 4.0	Hopper (H100)	900 GB/s	NVLink Switch — 256 GPUs fully connected
NVLink 5.0	Blackwell (B200)	1,800 GB/s	NVLink-C2C chip-to-chip, 576 GPUs

NVLink vs PCIe: The Bandwidth Gap

PCIe 5.0 x16 delivers ~64 GB/s per direction. NVLink 4.0 delivers 900 GB/s—over 14× more bandwidth. For large language model training where 8 GPUs must synchronise billions of parameters every iteration, this difference means the network is no longer the bottleneck. NVSwitch extends this further: instead of each GPU connecting to a few neighbours, a dedicated NVLink switch fabric provides full bisection bandwidth so every GPU can talk to every other GPU at full speed simultaneously.

Interconnect Comparison Summary

Technology	Type	Max Bandwidth	Latency	Primary Use
Ethernet (100G)	Network	100 Gbps	~10-50 µs	General data centre networking
RoCE v2	Network (RDMA)	400 Gbps	~2-5 µs	RDMA over existing Ethernet
InfiniBand NDR	Network (RDMA)	400 Gbps	~0.5 µs	HPC, AI training clusters
DPDK	Software framework	Line-rate	~1-2 µs	Software routers, NFV, 5G
CXL 3.0	CPU-device link	64 GB/s (PCIe 6.0)	~100-300 ns	Memory pooling, accelerators
NVLink 4.0	GPU-GPU link	900 GB/s	~50 ns	Multi-GPU AI training
DSM	Programming model	Depends on transport	Varies	Shared memory illusion across nodes

Summary & Next Steps

                        
                        Key Takeaways:
                        QUIC: New transport on UDP, replaces TCP+TLS
HTTP/3: HTTP over QUIC, no HOL blocking
0-RTT: Immediate data on resumed connections
WebTransport: Low-latency bidirectional streams
Connection migration: Seamless network switching

                    

Quiz

Test Your Knowledge

QUIC transport layer? (UDP)
HTTP/3 solves what TCP problem? (Head-of-line blocking)
0-RTT tradeoff? (Replay attack risk)
WebTransport vs WebSocket? (Unreliable delivery, multiple streams)

Complete Protocols Master Part 19: Emerging Protocols

Table of Contents

Introduction: The Next Generation

Protocol Mastery

Part 1: OSI Model & Protocol Foundations

Physical & Data Link Layers

Network Layer & IP

Transport Layer

Session & Presentation Layers

Web Protocols

API Protocols

DNS Deep Dive

Email Protocols

File Transfer Protocols

Real-Time Protocols

Streaming Protocols

IoT Protocols

VPN & Tunneling

Authentication Protocols

Network Management

Security Protocols

Cloud Provider Protocols