Back to Infrastructure & Cloud Automation Series

Part 16: Multi-Cloud Architecture

May 14, 2026 Wasil Zafar 50 min read

Design portable, resilient infrastructure across AWS, Azure, and GCP — implement abstraction layers, manage cross-cloud networking, federate identity, and build truly cloud-agnostic systems without falling into the lowest-common-denominator trap.

Table of Contents

  1. Why Multi-Cloud
  2. Strategy Patterns
  3. Abstraction Layers
  4. Multi-Cloud Networking
  5. Identity & Security
  6. Data Strategies
  7. Terraform Multi-Cloud
  8. Multi-Cloud Kubernetes
  9. Cost & Governance
  10. Common Anti-Patterns
  11. Hands-On Exercises
  12. Conclusion & Next Steps

Why Multi-Cloud

Multi-cloud is one of the most debated topics in modern infrastructure. Every analyst report claims that 90% of enterprises have a multi-cloud strategy, yet when you dig deeper, most have accidental multi-cloud — different teams adopted different providers independently, with no unified architecture, no shared tooling, and no intentional design.

True multi-cloud architecture is an intentional design decision to distribute workloads across two or more cloud providers with a coherent strategy for networking, identity, data, and operations.

Definitions That Matter

TermDefinitionExample
Multi-CloudUsing 2+ public cloud providers intentionallyML on GCP, core app on AWS
Hybrid CloudCombining public cloud with on-premises/private cloudAWS + on-prem data center
Multi-RegionSame provider, different geographic regionsAWS us-east-1 + eu-west-1
Poly-CloudMulti-cloud without a unifying strategyEach team picks their own cloud
Key Insight: Multi-cloud is not inherently better than single-cloud. It adds significant complexity to networking, identity, operations, and cost management. The question is not "should we go multi-cloud?" but "do our requirements justify the additional complexity?"

When Multi-Cloud Is Worth It

Multi-cloud is justified when at least one of these conditions is true:

  • Regulatory compliance — Data sovereignty requires specific providers in specific regions
  • Best-of-breed necessity — Critical workloads genuinely need unique capabilities (GCP BigQuery for analytics, AWS for broadest service catalog)
  • Acquisition integration — Merging companies on different providers with no compelling reason to migrate
  • Disaster recovery — True provider-level resilience (extremely rare requirement)
  • Negotiation leverage — Credible alternative keeps pricing competitive (works at very large scale)
The Honest Truth: For 80% of organizations, multi-cloud adds 2-3x operational complexity while delivering marginal resilience benefits. A single cloud provider with multi-region deployment is usually more cost-effective and simpler to operate. Only pursue multi-cloud with intentional strategy and adequate staffing.
Single Cloud vs Multi-Cloud vs Hybrid Architecture
flowchart TB
    subgraph Single["Single Cloud (Multi-Region)"]
        direction TB
        S1[Region A] --- S2[Region B]
        S1 --- S3[Region C]
    end

    subgraph Multi["Multi-Cloud"]
        direction TB
        M1[AWS
Primary Compute] --- M2[GCP
ML & Analytics] M1 --- M3[Azure
Enterprise Apps] M2 --- M3 end subgraph Hybrid["Hybrid Cloud"] direction TB H1[Public Cloud
AWS/Azure] --- H2[On-Premises
Data Center] H2 --- H3[Edge
IoT Devices] end

Multi-Cloud Strategy Patterns

Not all multi-cloud is the same. The pattern you choose determines your architecture, tooling requirements, and operational burden. Understanding these patterns is the first step toward intentional multi-cloud design.

Pattern 1: Best-of-Breed

Use each cloud provider for what it does best. This is the most common intentional multi-cloud pattern and often the most pragmatic.

# Example: Best-of-breed workload mapping
workloads:
  compute_and_networking:
    provider: aws
    reason: "Broadest service catalog, mature VPC"
    services: [EKS, Lambda, API Gateway, CloudFront]
  
  machine_learning:
    provider: gcp
    reason: "TPUs, Vertex AI, BigQuery ML integration"
    services: [Vertex AI, BigQuery, Cloud Storage]
  
  enterprise_apps:
    provider: azure
    reason: "Active Directory, Office 365, Power Platform"
    services: [Azure AD, Logic Apps, Power Automate]

  data_analytics:
    provider: gcp
    reason: "BigQuery performance, Looker integration"
    services: [BigQuery, Dataflow, Looker]

Pattern 2: Active-Passive DR

Run production on one cloud, maintain a warm standby on another. This provides genuine provider-level resilience but at significant cost and operational complexity.

Pattern 3: Workload Distribution

Different business units or application tiers run on different clouds, with cross-cloud integration at defined boundaries.

Pattern 4: Cloud-Agnostic

Design workloads to run identically on any cloud provider. This demands heavy abstraction (Kubernetes, Terraform, cloud-agnostic databases) and typically sacrifices cloud-native optimization.

PatternComplexityCloud OptimizationPortabilityBest For
Best-of-BreedMediumHighLowLeveraging unique capabilities
Active-Passive DRHighHigh (primary)MediumProvider-level resilience
Workload DistributionMediumMedium-HighLowOrg-level autonomy
Cloud-AgnosticVery HighLowHighExit strategy, vendor leverage
Decision Framework for Multi-Cloud Strategy
flowchart TD
    A[Need Multi-Cloud?] -->|Regulatory| B[Workload Distribution]
    A -->|Best capabilities| C[Best-of-Breed]
    A -->|Provider failure DR| D[Active-Passive]
    A -->|Full portability| E[Cloud-Agnostic]
    A -->|No strong reason| F[Stay Single Cloud]
    
    B --> G[Define integration boundaries]
    C --> H[Map workloads to providers]
    D --> I[Accept 2x infrastructure cost]
    E --> J[Accept lowest common denominator]
    F --> K[Multi-region for resilience]
                            

Abstraction Layers

The key to managing multi-cloud complexity is abstraction — creating layers that hide provider-specific details behind consistent interfaces. The right abstraction layer depends on what you are abstracting.

The Abstraction Principle: Abstract operations (how you deploy, monitor, secure) but be cautious about abstracting capabilities (what each cloud uniquely offers). Over-abstraction leads to the "lowest common denominator" anti-pattern where you use no cloud well.
LayerToolWhat It AbstractsTrade-off
InfrastructureTerraformProvisioning APIs across cloudsProvider-specific resources still differ
ComputeKubernetesContainer orchestrationManaged K8s differs per cloud
Infrastructure Control PlaneCrossplaneCloud resources as K8s objectsAdditional control plane complexity
NetworkingService Mesh (Istio)Service-to-service communicationOperational overhead of mesh
SecretsHashiCorp VaultSecrets management across cloudsAdditional infrastructure to manage
PolicyOPA/GatekeeperAuthorization and compliancePolicy language learning curve
ObservabilityDatadog/GrafanaMetrics, logs, traces across cloudsVendor cost or self-hosted complexity

Crossplane for Cloud-Agnostic Infrastructure

Crossplane extends Kubernetes with cloud resource management. You define cloud resources as Kubernetes custom resources, and Crossplane controllers reconcile them against cloud APIs.

# Crossplane Composite Resource Definition
# Abstracts a "database" concept across clouds
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xdatabases.platform.example.com
spec:
  group: platform.example.com
  names:
    kind: XDatabase
    plural: xdatabases
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                parameters:
                  type: object
                  properties:
                    engine:
                      type: string
                      enum: [postgres, mysql]
                    size:
                      type: string
                      enum: [small, medium, large]
                    region:
                      type: string
                  required: [engine, size, region]
# Crossplane Composition - AWS Implementation
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: xdatabases.aws.platform.example.com
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1alpha1
    kind: XDatabase
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.crossplane.io/v1alpha1
        kind: DBInstance
        spec:
          forProvider:
            dbInstanceClass: db.t3.medium
            engine: postgres
            engineVersion: "15"
            masterUsername: admin
            allocatedStorage: 20
            publiclyAccessible: false
          providerConfigRef:
            name: aws-provider
# Crossplane Composition - Azure Implementation
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: xdatabases.azure.platform.example.com
  labels:
    provider: azure
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1alpha1
    kind: XDatabase
  resources:
    - name: azure-db
      base:
        apiVersion: dbforpostgresql.azure.crossplane.io/v1alpha1
        kind: FlexibleServer
        spec:
          forProvider:
            version: "15"
            skuName: Standard_B1ms
            storageMb: 32768
            administratorLogin: admin
          providerConfigRef:
            name: azure-provider
Multi-Cloud Abstraction Layer Stack
flowchart TB
    subgraph App["Application Layer"]
        A1[Microservices]
        A2[APIs]
        A3[ML Pipelines]
    end

    subgraph Abstraction["Abstraction Layer"]
        B1[Kubernetes - Compute]
        B2[Istio - Networking]
        B3[Vault - Secrets]
        B4[OPA - Policy]
        B5[Crossplane - Resources]
    end

    subgraph Infra["Infrastructure Layer"]
        C1[Terraform - Provisioning]
    end

    subgraph Clouds["Cloud Providers"]
        D1[AWS]
        D2[Azure]
        D3[GCP]
    end

    App --> Abstraction
    Abstraction --> Infra
    Infra --> Clouds
                            

Multi-Cloud Networking

Networking is the hardest problem in multi-cloud architecture. Each cloud has its own networking model, IP address scheme, firewall rules, and connectivity options. Connecting them securely and performantly requires careful planning.

Cross-Cloud Connectivity Options

MethodBandwidthLatencyCostSetup Complexity
Site-to-Site VPN1-2 GbpsVariable (internet)LowMedium
Cloud Interconnect10-100 GbpsLow (dedicated)HighHigh
SD-WAN OverlayVariableOptimizedMediumMedium
Service Mesh (Istio)Application-levelDepends on transportLowHigh
API Gateway FederationAPI-levelHTTP overheadLowLow

Terraform: AWS-to-Azure VPN Tunnel

# AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "aws-to-azure-vpn-gw"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_customer_gateway" "azure" {
  bgp_asn    = 65515  # Azure default ASN
  ip_address = azurerm_public_ip.vpn_gw.ip_address
  type       = "ipsec.1"

  tags = {
    Name = "azure-customer-gw"
  }
}

resource "aws_vpn_connection" "to_azure" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.azure.id
  type                = "ipsec.1"
  static_routes_only  = false

  tunnel1_preshared_key = var.vpn_preshared_key
  tunnel1_ike_versions  = ["ikev2"]

  tags = {
    Name = "aws-to-azure-vpn"
  }
}
# Azure VPN Gateway
resource "azurerm_virtual_network_gateway" "main" {
  name                = "azure-to-aws-vpn-gw"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  type                = "Vpn"
  vpn_type            = "RouteBased"
  sku                 = "VpnGw2"
  active_active       = false
  enable_bgp          = true

  ip_configuration {
    name                          = "vpn-gw-config"
    public_ip_address_id          = azurerm_public_ip.vpn_gw.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }

  bgp_settings {
    asn = 65515
  }
}

resource "azurerm_local_network_gateway" "aws" {
  name                = "aws-local-gw"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  gateway_address     = aws_vpn_connection.to_azure.tunnel1_address

  address_space = [var.aws_vpc_cidr]

  bgp_settings {
    asn                 = 64512  # AWS default ASN
    bgp_peering_address = aws_vpn_connection.to_azure.tunnel1_bgp_peer_address
  }
}

resource "azurerm_virtual_network_gateway_connection" "to_aws" {
  name                       = "azure-to-aws-connection"
  location                   = azurerm_resource_group.main.location
  resource_group_name        = azurerm_resource_group.main.name
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
  local_network_gateway_id   = azurerm_local_network_gateway.aws.id
  shared_key                 = var.vpn_preshared_key
  enable_bgp                 = true
}

DNS & Global Load Balancing

# Multi-cloud DNS with Route 53 as primary
# Health checks monitor endpoints on both clouds
resource "aws_route53_health_check" "aws_primary" {
  fqdn              = "aws-app.internal.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = {
    Name = "aws-primary-health"
  }
}

resource "aws_route53_health_check" "azure_secondary" {
  fqdn              = "azure-app.internal.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = {
    Name = "azure-secondary-health"
  }
}

resource "aws_route53_record" "app_primary" {
  zone_id = var.hosted_zone_id
  name    = "app.example.com"
  type    = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier  = "primary-aws"
  health_check_id = aws_route53_health_check.aws_primary.id

  alias {
    name                   = aws_lb.main.dns_name
    zone_id                = aws_lb.main.zone_id
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "app_secondary" {
  zone_id = var.hosted_zone_id
  name    = "app.example.com"
  type    = "A"

  failover_routing_policy {
    type = "SECONDARY"
  }

  set_identifier  = "secondary-azure"
  health_check_id = aws_route53_health_check.azure_secondary.id

  alias {
    name                   = "azure-tm.trafficmanager.net"
    zone_id                = "Z2FDTNDATAQYW2"
    evaluate_target_health = true
  }
}
Multi-Cloud Network Topology
flowchart TB
    Users[Global Users] --> GLB[Global Load Balancer
Cloudflare/Route53] GLB --> AWS_LB[AWS ALB
us-east-1] GLB --> AZ_LB[Azure Front Door
East US] subgraph AWS["AWS VPC (10.0.0.0/16)"] AWS_LB --> AWS_APP[EKS Cluster] AWS_APP --> AWS_DB[(RDS PostgreSQL)] end subgraph Azure["Azure VNet (10.1.0.0/16)"] AZ_LB --> AZ_APP[AKS Cluster] AZ_APP --> AZ_DB[(Azure Database)] end AWS_APP <-->|VPN Tunnel
IPsec/BGP| AZ_APP AWS_DB <-->|Cross-Cloud
Replication| AZ_DB

Multi-Cloud Identity & Security

In a multi-cloud environment, identity is the new perimeter. Each cloud has its own IAM system (AWS IAM, Azure Entra ID, GCP IAM), and federating identities across them is essential for both human operators and machine-to-machine communication.

Federated Identity Architecture

# AWS: Trust Azure AD as OIDC identity provider
resource "aws_iam_openid_connect_provider" "azure_ad" {
  url             = "https://login.microsoftonline.com/${var.azure_tenant_id}/v2.0"
  client_id_list  = [var.azure_app_client_id]
  thumbprint_list = [var.azure_ad_thumbprint]
}

# IAM Role that Azure workloads can assume via OIDC
resource "aws_iam_role" "azure_cross_cloud" {
  name = "azure-cross-cloud-access"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.azure_ad.arn
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "${aws_iam_openid_connect_provider.azure_ad.url}:aud" = var.azure_app_client_id
            "${aws_iam_openid_connect_provider.azure_ad.url}:sub" = var.azure_managed_identity_object_id
          }
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cross_cloud_s3" {
  role       = aws_iam_role.azure_cross_cloud.name
  policy_arn = aws_iam_policy.cross_cloud_s3_access.arn
}

HashiCorp Vault for Centralized Secrets

# Vault configuration for multi-cloud secrets
resource "vault_mount" "aws_secrets" {
  path        = "aws"
  type        = "aws"
  description = "AWS dynamic credentials"
}

resource "vault_aws_secret_backend_role" "deploy" {
  backend         = vault_mount.aws_secrets.path
  name            = "deploy-role"
  credential_type = "iam_user"

  policy_document = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["s3:*", "ec2:*", "eks:*"]
        Resource = "*"
      }
    ]
  })
}

resource "vault_mount" "azure_secrets" {
  path        = "azure"
  type        = "azure"
  description = "Azure dynamic credentials"
}

resource "vault_azure_secret_backend" "main" {
  subscription_id = var.azure_subscription_id
  tenant_id       = var.azure_tenant_id
  client_id       = var.vault_azure_client_id
  client_secret   = var.vault_azure_client_secret
}

Unified Policy with OPA

# OPA Rego policy: enforce tagging across all clouds
# File: policies/multi-cloud-tagging.rego
package multicloud.tagging

import future.keywords.in

# Required tags for all cloud resources
required_tags := {"environment", "team", "cost-center", "managed-by"}

# Check AWS resources
deny[msg] {
    input.provider == "aws"
    resource := input.planned_values.root_module.resources[_]
    tags := object.get(resource.values, "tags", {})
    missing := required_tags - {key | tags[key]}
    count(missing) > 0
    msg := sprintf("AWS resource %s missing tags: %v", [resource.address, missing])
}

# Check Azure resources
deny[msg] {
    input.provider == "azure"
    resource := input.planned_values.root_module.resources[_]
    tags := object.get(resource.values, "tags", {})
    missing := required_tags - {key | tags[key]}
    count(missing) > 0
    msg := sprintf("Azure resource %s missing tags: %v", [resource.address, missing])
}

# Check GCP resources
deny[msg] {
    input.provider == "gcp"
    resource := input.planned_values.root_module.resources[_]
    labels := object.get(resource.values, "labels", {})
    missing := required_tags - {key | labels[key]}
    count(missing) > 0
    msg := sprintf("GCP resource %s missing labels: %v", [resource.address, missing])
}
ApproachScopeIdentity TypeComplexityBest For
OIDC FederationCross-cloud workloadsMachine identityMediumService-to-service auth
SAML via IdPHuman accessUser identityLowConsole/portal SSO
Vault Dynamic CredsAll credentialsBothHighShort-lived, audited access
SPIFFE/SPIREWorkload identityMachine identityHighZero-trust service mesh

Multi-Cloud Data Strategies

Data gravity is the concept that applications tend to move toward where their data resides. In multi-cloud, data gravity is one of the strongest forces determining your architecture — moving compute is easy, moving petabytes of data is not.

Data Gravity Rule: Cross-cloud data transfer costs $0.01-$0.09/GB. A 10 TB daily data movement costs $36,000-$328,000/year. Always compute where the data lives, not the other way around.

Cross-Cloud Database Replication

# CockroachDB multi-cloud topology configuration
# CockroachDB natively supports multi-cloud deployment
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: multi-cloud-crdb
spec:
  dataStore:
    pvc:
      spec:
        storageClassName: premium-rwo
        resources:
          requests:
            storage: 100Gi
  nodes: 9
  topology:
    - cloud: aws
      region: us-east-1
      zones: [us-east-1a, us-east-1b, us-east-1c]
      nodes: 3
    - cloud: azure
      region: eastus
      zones: [1, 2, 3]
      nodes: 3
    - cloud: gcp
      region: us-east4
      zones: [us-east4-a, us-east4-b, us-east4-c]
      nodes: 3
# Kafka/Confluent Cloud for cross-cloud event streaming
resource "confluent_kafka_cluster" "multi_cloud" {
  display_name = "multi-cloud-events"
  availability = "MULTI_ZONE"
  cloud        = "AWS"
  region       = "us-east-1"

  dedicated {
    cku = 2
  }
}

# Cluster linking for cross-cloud replication
resource "confluent_cluster_link" "aws_to_azure" {
  link_name = "aws-to-azure-mirror"
  
  source_kafka_cluster {
    id            = confluent_kafka_cluster.multi_cloud.id
    rest_endpoint = confluent_kafka_cluster.multi_cloud.rest_endpoint
  }

  destination_kafka_cluster {
    id            = confluent_kafka_cluster.azure_cluster.id
    rest_endpoint = confluent_kafka_cluster.azure_cluster.rest_endpoint
  }
}
ApproachConsistencyLatencyCostUse Case
CockroachDBStrong (serializable)Cross-region penaltyHighMulti-cloud OLTP
Kafka Cluster LinkingEventualSecondsMediumEvent streaming
Object Storage SyncEventualMinutesTransfer + storageData lake mirroring
Database CDCEventualSecondsLowRead replicas
API-level SyncApplication-definedVariableLowSelective data sharing

Multi-Cloud with Terraform

Terraform is the most widely adopted tool for multi-cloud infrastructure because it supports 3,000+ providers through a single workflow. However, using multiple providers in one project requires careful state and dependency management.

Multi-Provider Project Structure

# Recommended directory structure for multi-cloud Terraform
multi-cloud-infra/
├── modules/
│   ├── networking/
│   │   ├── aws/          # AWS-specific networking
│   │   ├── azure/        # Azure-specific networking
│   │   └── interface.tf  # Shared input/output contract
│   ├── compute/
│   │   ├── aws/
│   │   ├── azure/
│   │   └── interface.tf
│   └── dns/
│       └── main.tf       # Cloud-agnostic DNS module
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── providers.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/
├── shared/
│   ├── vpn-connections/  # Cross-cloud connectivity
│   └── dns-zones/        # Global DNS management
└── terragrunt.hcl        # DRY configuration
# providers.tf - Multi-cloud provider configuration
terraform {
  required_version = ">= 1.6"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "multi-cloud/production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = "multi-cloud-platform"
    }
  }
}

provider "azurerm" {
  features {}
  subscription_id = var.azure_subscription_id
}

provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
}
# main.tf - Multi-cloud deployment orchestration
locals {
  aws_vpc_cidr   = "10.0.0.0/16"
  azure_vnet_cidr = "10.1.0.0/16"
  gcp_vpc_cidr   = "10.2.0.0/16"
}

# AWS Infrastructure
module "aws_networking" {
  source   = "../../modules/networking/aws"
  vpc_cidr = local.aws_vpc_cidr
  environment = var.environment
}

module "aws_compute" {
  source     = "../../modules/compute/aws"
  vpc_id     = module.aws_networking.vpc_id
  subnet_ids = module.aws_networking.private_subnet_ids
}

# Azure Infrastructure
module "azure_networking" {
  source    = "../../modules/networking/azure"
  vnet_cidr = local.azure_vnet_cidr
  environment = var.environment
}

module "azure_compute" {
  source    = "../../modules/compute/azure"
  vnet_id   = module.azure_networking.vnet_id
  subnet_id = module.azure_networking.private_subnet_id
}

# Cross-Cloud Connectivity
module "vpn_aws_to_azure" {
  source = "../../shared/vpn-connections"

  aws_vpc_id         = module.aws_networking.vpc_id
  aws_vpc_cidr       = local.aws_vpc_cidr
  azure_vnet_id      = module.azure_networking.vnet_id
  azure_vnet_cidr    = local.azure_vnet_cidr
  azure_gateway_subnet_id = module.azure_networking.gateway_subnet_id
  vpn_preshared_key  = var.vpn_preshared_key
}

# Outputs for cross-referencing
output "aws_endpoint" {
  value = module.aws_compute.load_balancer_dns
}

output "azure_endpoint" {
  value = module.azure_compute.load_balancer_ip
}
Terraform Multi-Cloud Workflow
flowchart LR
    subgraph Plan["terraform plan"]
        P1[AWS Provider] --> P2[Plan AWS Resources]
        P3[Azure Provider] --> P4[Plan Azure Resources]
        P5[GCP Provider] --> P6[Plan GCP Resources]
    end

    subgraph Apply["terraform apply"]
        A1[Create AWS VPC] --> A2[Create Azure VNet]
        A2 --> A3[Create VPN Connection]
        A3 --> A4[Create GCP VPC]
        A4 --> A5[Deploy K8s Clusters]
    end

    subgraph State["State Management"]
        S1[Single State File
Multi-Provider] S2[Split State
Per Provider] end Plan --> Apply Apply --> State

Multi-Cloud Kubernetes

Kubernetes is the most common compute abstraction layer for multi-cloud. Running workloads across EKS, AKS, and GKE provides a consistent deployment target regardless of the underlying cloud — but multi-cluster management introduces its own complexity.

Multi-Cluster Management Tools

ToolVendorModelBest ForComplexity
RancherSUSECentral management planeMulti-cluster lifecycleMedium
Azure ArcMicrosoftAzure control plane extensionAzure-centric hybridMedium
AnthosGoogleGCP control plane extensionGCP-centric hybridHigh
KubefedCNCFFederation v2 CRDsResource propagationHigh
Loft/vClusterLoftVirtual clustersMulti-tenancyLow
SkupperRed HatApplication networkingCross-cluster servicesLow

GitOps for Multi-Cluster Deployments

# ArgoCD ApplicationSet for multi-cluster deployment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: multi-cloud-app
  namespace: argocd
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            environment: production
  template:
    metadata:
      name: "app-{{name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/multi-cloud-app.git
        targetRevision: main
        path: "deploy/overlays/{{metadata.labels.cloud}}"
      destination:
        server: "{{server}}"
        namespace: production
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
# Kustomize overlay for AWS-specific configuration
# deploy/overlays/aws/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

patches:
  - target:
      kind: Deployment
      name: api-server
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: CLOUD_PROVIDER
          value: "aws"
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: OBJECT_STORAGE_ENDPOINT
          value: "s3.us-east-1.amazonaws.com"
      - op: add
        path: /spec/template/spec/serviceAccountName
        value: app-irsa-sa

  - target:
      kind: Service
      name: api-server
    patch: |-
      - op: add
        path: /metadata/annotations
        value:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
# Kustomize overlay for Azure-specific configuration
# deploy/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

patches:
  - target:
      kind: Deployment
      name: api-server
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: CLOUD_PROVIDER
          value: "azure"
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: OBJECT_STORAGE_ENDPOINT
          value: "https://storageaccount.blob.core.windows.net"
      - op: add
        path: /spec/template/metadata/labels/azure.workload.identity~1use
        value: "true"

  - target:
      kind: Service
      name: api-server
    patch: |-
      - op: add
        path: /metadata/annotations
        value:
          service.beta.kubernetes.io/azure-load-balancer-internal: "false"
# ArgoCD cluster registration
# Register multiple clusters for multi-cloud GitOps
apiVersion: v1
kind: Secret
metadata:
  name: aws-production
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: production
    cloud: aws
    region: us-east-1
type: Opaque
stringData:
  name: aws-production
  server: https://eks-cluster.us-east-1.eks.amazonaws.com
  config: |
    {
      "execProviderConfig": {
        "command": "aws",
        "args": ["eks", "get-token", "--cluster-name", "production"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      }
    }
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-production
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: production
    cloud: azure
    region: eastus
type: Opaque
stringData:
  name: azure-production
  server: https://aks-cluster.eastus.azmk8s.io
  config: |
    {
      "execProviderConfig": {
        "command": "kubelogin",
        "args": ["get-token", "--login", "azurecli", "--server-id", "APP_ID"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      }
    }

Cost & Governance

Multi-cloud cost management is exponentially harder than single-cloud. Each provider has different pricing models, different discount mechanisms (Reserved Instances, Committed Use Discounts, Azure Reservations), and different billing APIs. Without unified governance, costs spiral.

Cost Reality Check: Organizations running multi-cloud typically spend 20-35% more than equivalent single-cloud deployments due to: duplicate infrastructure, cross-cloud data transfer, multiple support contracts, and the inability to fully leverage volume discounts from any single provider.

FinOps Practices for Multi-Cloud

# Unified tagging policy across clouds
# Enforced via OPA/Sentinel at deploy time
tagging_standard:
  required:
    - key: "cost-center"
      format: "CC-[0-9]{4}"
      description: "Financial cost center code"
    - key: "environment"
      values: [dev, staging, production]
    - key: "team"
      description: "Owning team name"
    - key: "service"
      description: "Service/application name"
    - key: "managed-by"
      values: [terraform, crossplane, manual]
  
  provider_mapping:
    aws:
      tag_key_format: "PascalCase"
      example: "CostCenter: CC-1234"
    azure:
      tag_key_format: "camelCase"  
      example: "costCenter: CC-1234"
    gcp:
      label_key_format: "lowercase-hyphen"
      example: "cost-center: cc-1234"
# Terraform: enforce tagging via variable validation
variable "tags" {
  type = map(string)

  validation {
    condition = alltrue([
      contains(keys(var.tags), "cost-center"),
      contains(keys(var.tags), "environment"),
      contains(keys(var.tags), "team"),
      contains(keys(var.tags), "service"),
      can(regex("^CC-[0-9]{4}$", var.tags["cost-center"]))
    ])
    error_message = "Tags must include cost-center (CC-XXXX format), environment, team, and service."
  }
}
ToolTypeClouds SupportedBest For
KubecostK8s-native costAll (via K8s)Container workload costs
InfracostPre-deploy estimationAWS, Azure, GCPCost in PR reviews
CloudHealthCloud cost managementAWS, Azure, GCPEnterprise FinOps
Spot.ioCost optimizationAWS, Azure, GCPSpot/preemptible automation
VantageCost visibilityAWS, Azure, GCP, DatadogDeveloper-friendly reporting
OpenCostCNCF standardAll (via K8s)Open-source K8s cost allocation

Common Anti-Patterns

Multi-cloud failures are more common than successes. Recognizing anti-patterns early saves millions in wasted infrastructure spending and years of accumulated technical debt.

Anti-Pattern #1: Accidental Multi-Cloud. No strategy, just drift. One team chose AWS, another chose Azure, and now you have "multi-cloud" with no shared tooling, no cross-cloud networking, and duplicate operational toil. This is poly-cloud, not multi-cloud.
Anti-Pattern #2: Lowest Common Denominator. Abstracting away all cloud-native capabilities to achieve portability. You end up using no cloud well — no managed databases, no serverless, no cloud-native AI services — building everything on raw VMs and Kubernetes. The result costs more and delivers less than using any single cloud natively.
Anti-Pattern #3: Duplicate Everything. Running the same workload on two clouds "for resilience" without a genuine failover mechanism. You pay 2x infrastructure cost, 2x operational burden, and when failover is actually needed, the secondary has never been tested under real load.
Anti-Pattern #4: No Shared Tooling. Each team picks their own CI/CD, monitoring, secrets management, and IaC tool per cloud. Operations teams cannot support the combinatorial explosion. Hire 3x the staff or consolidate tooling.
Anti-Pattern #5: Ignoring Data Gravity. Placing compute on Cloud B while data lives on Cloud A, then wondering why cross-cloud data transfer costs $50,000/month and latency breaks SLAs. Always compute where data lives.

Hands-On Exercises

Exercise 1 Multi-Cloud Strategy Design

Design a Multi-Cloud Strategy

Scenario: A fintech company processes payments on AWS (for PCI compliance tooling), runs their data analytics on GCP BigQuery, and has acquired a company with all infrastructure on Azure. They need a unified strategy.

Tasks:

  1. Identify which multi-cloud pattern best fits this scenario
  2. Design the networking topology (VPN or interconnect between clouds)
  3. Define the identity federation strategy
  4. Create a unified tagging/labeling standard
  5. Propose a centralized observability approach
  6. Estimate the monthly cross-cloud data transfer cost for 5 TB/day
# Calculate cross-cloud transfer costs
# AWS egress to internet: $0.09/GB (first 10 TB)
# Azure ingress: free
# Estimate for 5 TB/day = 150 TB/month
echo "AWS egress cost: 150000 GB * $0.09 = $13,500/month"
echo "With Direct Connect: 150000 GB * $0.02 = $3,000/month"
echo "Savings with dedicated interconnect: $10,500/month"
Strategy Architecture FinOps
Exercise 2 Terraform Multi-Provider Project

Build a Multi-Cloud Terraform Project

Objective: Create a Terraform project that provisions a VPC on AWS and a VNet on Azure, then connects them with a VPN tunnel.

# Exercise: Complete this multi-cloud configuration
# File: main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "azurerm" {
  features {}
}

# Task 1: Create AWS VPC with CIDR 10.0.0.0/16
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "multi-cloud-vpc" }
}

# Task 2: Create Azure Resource Group and VNet with CIDR 10.1.0.0/16
resource "azurerm_resource_group" "main" {
  name     = "multi-cloud-rg"
  location = "East US"
}

resource "azurerm_virtual_network" "main" {
  name                = "multi-cloud-vnet"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

# Task 3: Create subnets on both sides
# Task 4: Set up VPN gateways on both sides
# Task 5: Establish the VPN connection
# Task 6: Verify connectivity with a test VM on each side

Validation:

# Validate Terraform configuration
terraform init
terraform validate
terraform plan -out=multicloud.plan

# Check resource count
terraform show multicloud.plan | grep "Plan:"
# Expected: "Plan: X to add, 0 to change, 0 to destroy."
Terraform AWS Azure VPN
Exercise 3 Cross-Cloud VPN Connectivity

Set Up Cross-Cloud VPN Connectivity

Objective: Configure IPsec VPN between AWS and GCP with BGP dynamic routing.

# GCP side: Create HA VPN gateway
resource "google_compute_ha_vpn_gateway" "to_aws" {
  name    = "ha-vpn-to-aws"
  network = google_compute_network.main.id
  region  = "us-central1"
}

resource "google_compute_router" "vpn_router" {
  name    = "vpn-router"
  network = google_compute_network.main.id
  region  = "us-central1"

  bgp {
    asn               = 64513
    advertise_mode    = "CUSTOM"
    advertised_groups = ["ALL_SUBNETS"]
  }
}

# Task: Create the following resources
# 1. google_compute_external_vpn_gateway (represents AWS VPN endpoint)
# 2. google_compute_vpn_tunnel (2 tunnels for HA)
# 3. google_compute_router_interface (BGP interface)
# 4. google_compute_router_peer (BGP peer configuration)

# AWS side: Create matching VPN infrastructure
# 1. aws_vpn_gateway
# 2. aws_customer_gateway (points to GCP external IP)
# 3. aws_vpn_connection (with BGP enabled)
# Verify VPN tunnel status
# AWS
aws ec2 describe-vpn-connections \
  --filters "Name=tag:Name,Values=aws-to-gcp-vpn" \
  --query "VpnConnections[].VgwTelemetry[].Status"

# GCP
gcloud compute vpn-tunnels describe ha-vpn-to-aws-tunnel-0 \
  --region=us-central1 \
  --format="value(status)"

# Expected output: "ESTABLISHED" on both sides
VPN BGP GCP AWS
Exercise 4 Unified Multi-Cloud Monitoring

Implement Multi-Cloud Monitoring

Objective: Set up a unified Grafana dashboard that displays metrics from AWS CloudWatch, Azure Monitor, and GCP Cloud Monitoring.

# Grafana data sources for multi-cloud monitoring
# grafana/provisioning/datasources/multi-cloud.yaml
apiVersion: 1

datasources:
  - name: AWS CloudWatch
    type: cloudwatch
    access: proxy
    jsonData:
      authType: keys
      defaultRegion: us-east-1
    secureJsonData:
      accessKey: ${AWS_ACCESS_KEY}
      secretKey: ${AWS_SECRET_KEY}

  - name: Azure Monitor
    type: grafana-azure-monitor-datasource
    access: proxy
    jsonData:
      cloudName: azuremonitor
      tenantId: ${AZURE_TENANT_ID}
      clientId: ${AZURE_CLIENT_ID}
      subscriptionId: ${AZURE_SUBSCRIPTION_ID}
    secureJsonData:
      clientSecret: ${AZURE_CLIENT_SECRET}

  - name: GCP Cloud Monitoring
    type: stackdriver
    access: proxy
    jsonData:
      authenticationType: gce
      defaultProject: ${GCP_PROJECT_ID}
{
  "dashboard": {
    "title": "Multi-Cloud Overview",
    "panels": [
      {
        "title": "AWS EC2 CPU Utilization",
        "datasource": "AWS CloudWatch",
        "targets": [{
          "namespace": "AWS/EC2",
          "metricName": "CPUUtilization",
          "statistics": ["Average"],
          "period": "300"
        }]
      },
      {
        "title": "Azure VM CPU Percentage",
        "datasource": "Azure Monitor",
        "targets": [{
          "resourceGroup": "production-rg",
          "metricDefinition": "Microsoft.Compute/virtualMachines",
          "metricName": "Percentage CPU",
          "aggregation": "Average"
        }]
      },
      {
        "title": "GCP GCE CPU Usage",
        "datasource": "GCP Cloud Monitoring",
        "targets": [{
          "metricType": "compute.googleapis.com/instance/cpu/utilization",
          "filters": ["resource.type=\"gce_instance\""]
        }]
      }
    ]
  }
}
Grafana Monitoring Observability FinOps

Conclusion & Next Steps

Multi-cloud architecture is a powerful tool when applied intentionally, and a costly burden when adopted accidentally. The patterns, tooling, and practices in this article provide a framework for making informed decisions about when and how to distribute workloads across providers.

Key takeaways:

  • Strategy first — Choose a multi-cloud pattern (best-of-breed, active-passive, workload distribution, or cloud-agnostic) based on actual requirements, not buzzword compliance
  • Abstract operations, not capabilities — Terraform, Kubernetes, and service mesh abstract how you deploy; avoid abstracting away what each cloud uniquely offers
  • Networking is the hard part — Cross-cloud VPN/interconnect, DNS federation, and global load balancing require significant planning and testing
  • Federate identity early — OIDC federation and centralized secrets management (Vault) are prerequisites, not afterthoughts
  • Respect data gravity — Compute should move to data, not the other way around; cross-cloud transfer costs add up fast
  • Unified tooling is essential — GitOps (ArgoCD), observability (Grafana), and policy (OPA) must span all clouds consistently
  • Budget for complexity — Multi-cloud requires 2-3x the operational expertise of single-cloud; staff accordingly

Next in the Series

In Part 17: Service Mesh & Advanced Networking, we dive into Istio, Envoy proxies, mutual TLS, traffic management, circuit breaking, and the advanced networking patterns that enable secure, observable communication across multi-cloud Kubernetes clusters.