Part 16: Multi-Cloud Architecture

Why Multi-Cloud

Multi-cloud is one of the most debated topics in modern infrastructure. Every analyst report claims that 90% of enterprises have a multi-cloud strategy, yet when you dig deeper, most have accidental multi-cloud — different teams adopted different providers independently, with no unified architecture, no shared tooling, and no intentional design.

True multi-cloud architecture is an intentional design decision to distribute workloads across two or more cloud providers with a coherent strategy for networking, identity, data, and operations.

Definitions That Matter

Term	Definition	Example
Multi-Cloud	Using 2+ public cloud providers intentionally	ML on GCP, core app on AWS
Hybrid Cloud	Combining public cloud with on-premises/private cloud	AWS + on-prem data center
Multi-Region	Same provider, different geographic regions	AWS us-east-1 + eu-west-1
Poly-Cloud	Multi-cloud without a unifying strategy	Each team picks their own cloud

                            
                            Key Insight: Multi-cloud is not inherently better than single-cloud. It adds significant complexity to networking, identity, operations, and cost management. The question is not "should we go multi-cloud?" but "do our requirements justify the additional complexity?"
                        

When Multi-Cloud Is Worth It

Multi-cloud is justified when at least one of these conditions is true:

Regulatory compliance — Data sovereignty requires specific providers in specific regions
Best-of-breed necessity — Critical workloads genuinely need unique capabilities (GCP BigQuery for analytics, AWS for broadest service catalog)
Acquisition integration — Merging companies on different providers with no compelling reason to migrate
Disaster recovery — True provider-level resilience (extremely rare requirement)
Negotiation leverage — Credible alternative keeps pricing competitive (works at very large scale)

                            
                            The Honest Truth: For 80% of organizations, multi-cloud adds 2-3x operational complexity while delivering marginal resilience benefits. A single cloud provider with multi-region deployment is usually more cost-effective and simpler to operate. Only pursue multi-cloud with intentional strategy and adequate staffing.
                        

Single Cloud vs Multi-Cloud vs Hybrid Architecture

flowchart TB
    subgraph Single["Single Cloud (Multi-Region)"]
        direction TB
        S1[Region A] --- S2[Region B]
        S1 --- S3[Region C]
    end

    subgraph Multi["Multi-Cloud"]
        direction TB
        M1[AWS
Primary Compute] --- M2[GCP
ML & Analytics]
        M1 --- M3[Azure
Enterprise Apps]
        M2 --- M3
    end

    subgraph Hybrid["Hybrid Cloud"]
        direction TB
        H1[Public Cloud
AWS/Azure] --- H2[On-Premises
Data Center]
        H2 --- H3[Edge
IoT Devices]
    end

Multi-Cloud Strategy Patterns

Not all multi-cloud is the same. The pattern you choose determines your architecture, tooling requirements, and operational burden. Understanding these patterns is the first step toward intentional multi-cloud design.

Pattern 1: Best-of-Breed

Use each cloud provider for what it does best. This is the most common intentional multi-cloud pattern and often the most pragmatic.

# Example: Best-of-breed workload mapping
workloads:
  compute_and_networking:
    provider: aws
    reason: "Broadest service catalog, mature VPC"
    services: [EKS, Lambda, API Gateway, CloudFront]
  
  machine_learning:
    provider: gcp
    reason: "TPUs, Vertex AI, BigQuery ML integration"
    services: [Vertex AI, BigQuery, Cloud Storage]
  
  enterprise_apps:
    provider: azure
    reason: "Active Directory, Office 365, Power Platform"
    services: [Azure AD, Logic Apps, Power Automate]

  data_analytics:
    provider: gcp
    reason: "BigQuery performance, Looker integration"
    services: [BigQuery, Dataflow, Looker]

Pattern 2: Active-Passive DR

Run production on one cloud, maintain a warm standby on another. This provides genuine provider-level resilience but at significant cost and operational complexity.

Pattern 3: Workload Distribution

Different business units or application tiers run on different clouds, with cross-cloud integration at defined boundaries.

Pattern 4: Cloud-Agnostic

Design workloads to run identically on any cloud provider. This demands heavy abstraction (Kubernetes, Terraform, cloud-agnostic databases) and typically sacrifices cloud-native optimization.

Pattern	Complexity	Cloud Optimization	Portability	Best For
Best-of-Breed	Medium	High	Low	Leveraging unique capabilities
Active-Passive DR	High	High (primary)	Medium	Provider-level resilience
Workload Distribution	Medium	Medium-High	Low	Org-level autonomy
Cloud-Agnostic	Very High	Low	High	Exit strategy, vendor leverage

Decision Framework for Multi-Cloud Strategy

flowchart TD
    A[Need Multi-Cloud?] -->|Regulatory| B[Workload Distribution]
    A -->|Best capabilities| C[Best-of-Breed]
    A -->|Provider failure DR| D[Active-Passive]
    A -->|Full portability| E[Cloud-Agnostic]
    A -->|No strong reason| F[Stay Single Cloud]
    
    B --> G[Define integration boundaries]
    C --> H[Map workloads to providers]
    D --> I[Accept 2x infrastructure cost]
    E --> J[Accept lowest common denominator]
    F --> K[Multi-region for resilience]

Abstraction Layers

The key to managing multi-cloud complexity is abstraction — creating layers that hide provider-specific details behind consistent interfaces. The right abstraction layer depends on what you are abstracting.

                            
                            The Abstraction Principle: Abstract operations (how you deploy, monitor, secure) but be cautious about abstracting capabilities (what each cloud uniquely offers). Over-abstraction leads to the "lowest common denominator" anti-pattern where you use no cloud well.
                        

Layer	Tool	What It Abstracts	Trade-off
Infrastructure	Terraform	Provisioning APIs across clouds	Provider-specific resources still differ
Compute	Kubernetes	Container orchestration	Managed K8s differs per cloud
Infrastructure Control Plane	Crossplane	Cloud resources as K8s objects	Additional control plane complexity
Networking	Service Mesh (Istio)	Service-to-service communication	Operational overhead of mesh
Secrets	HashiCorp Vault	Secrets management across clouds	Additional infrastructure to manage
Policy	OPA/Gatekeeper	Authorization and compliance	Policy language learning curve
Observability	Datadog/Grafana	Metrics, logs, traces across clouds	Vendor cost or self-hosted complexity

Crossplane for Cloud-Agnostic Infrastructure

Crossplane extends Kubernetes with cloud resource management. You define cloud resources as Kubernetes custom resources, and Crossplane controllers reconcile them against cloud APIs.

# Crossplane Composite Resource Definition
# Abstracts a "database" concept across clouds
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xdatabases.platform.example.com
spec:
  group: platform.example.com
  names:
    kind: XDatabase
    plural: xdatabases
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                parameters:
                  type: object
                  properties:
                    engine:
                      type: string
                      enum: [postgres, mysql]
                    size:
                      type: string
                      enum: [small, medium, large]
                    region:
                      type: string
                  required: [engine, size, region]

# Crossplane Composition - AWS Implementation
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: xdatabases.aws.platform.example.com
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1alpha1
    kind: XDatabase
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.crossplane.io/v1alpha1
        kind: DBInstance
        spec:
          forProvider:
            dbInstanceClass: db.t3.medium
            engine: postgres
            engineVersion: "15"
            masterUsername: admin
            allocatedStorage: 20
            publiclyAccessible: false
          providerConfigRef:
            name: aws-provider

# Crossplane Composition - Azure Implementation
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: xdatabases.azure.platform.example.com
  labels:
    provider: azure
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1alpha1
    kind: XDatabase
  resources:
    - name: azure-db
      base:
        apiVersion: dbforpostgresql.azure.crossplane.io/v1alpha1
        kind: FlexibleServer
        spec:
          forProvider:
            version: "15"
            skuName: Standard_B1ms
            storageMb: 32768
            administratorLogin: admin
          providerConfigRef:
            name: azure-provider

Multi-Cloud Abstraction Layer Stack

flowchart TB
    subgraph App["Application Layer"]
        A1[Microservices]
        A2[APIs]
        A3[ML Pipelines]
    end

    subgraph Abstraction["Abstraction Layer"]
        B1[Kubernetes - Compute]
        B2[Istio - Networking]
        B3[Vault - Secrets]
        B4[OPA - Policy]
        B5[Crossplane - Resources]
    end

    subgraph Infra["Infrastructure Layer"]
        C1[Terraform - Provisioning]
    end

    subgraph Clouds["Cloud Providers"]
        D1[AWS]
        D2[Azure]
        D3[GCP]
    end

    App --> Abstraction
    Abstraction --> Infra
    Infra --> Clouds

Multi-Cloud Networking

Networking is the hardest problem in multi-cloud architecture. Each cloud has its own networking model, IP address scheme, firewall rules, and connectivity options. Connecting them securely and performantly requires careful planning.

Cross-Cloud Connectivity Options

Method	Bandwidth	Latency	Cost	Setup Complexity
Site-to-Site VPN	1-2 Gbps	Variable (internet)	Low	Medium
Cloud Interconnect	10-100 Gbps	Low (dedicated)	High	High
SD-WAN Overlay	Variable	Optimized	Medium	Medium
Service Mesh (Istio)	Application-level	Depends on transport	Low	High
API Gateway Federation	API-level	HTTP overhead	Low	Low

Terraform: AWS-to-Azure VPN Tunnel

# AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "aws-to-azure-vpn-gw"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_customer_gateway" "azure" {
  bgp_asn    = 65515  # Azure default ASN
  ip_address = azurerm_public_ip.vpn_gw.ip_address
  type       = "ipsec.1"

  tags = {
    Name = "azure-customer-gw"
  }
}

resource "aws_vpn_connection" "to_azure" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.azure.id
  type                = "ipsec.1"
  static_routes_only  = false

  tunnel1_preshared_key = var.vpn_preshared_key
  tunnel1_ike_versions  = ["ikev2"]

  tags = {
    Name = "aws-to-azure-vpn"
  }
}

# Azure VPN Gateway
resource "azurerm_virtual_network_gateway" "main" {
  name                = "azure-to-aws-vpn-gw"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  type                = "Vpn"
  vpn_type            = "RouteBased"
  sku                 = "VpnGw2"
  active_active       = false
  enable_bgp          = true

  ip_configuration {
    name                          = "vpn-gw-config"
    public_ip_address_id          = azurerm_public_ip.vpn_gw.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }

  bgp_settings {
    asn = 65515
  }
}

resource "azurerm_local_network_gateway" "aws" {
  name                = "aws-local-gw"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  gateway_address     = aws_vpn_connection.to_azure.tunnel1_address

  address_space = [var.aws_vpc_cidr]

  bgp_settings {
    asn                 = 64512  # AWS default ASN
    bgp_peering_address = aws_vpn_connection.to_azure.tunnel1_bgp_peer_address
  }
}

resource "azurerm_virtual_network_gateway_connection" "to_aws" {
  name                       = "azure-to-aws-connection"
  location                   = azurerm_resource_group.main.location
  resource_group_name        = azurerm_resource_group.main.name
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
  local_network_gateway_id   = azurerm_local_network_gateway.aws.id
  shared_key                 = var.vpn_preshared_key
  enable_bgp                 = true
}

DNS & Global Load Balancing

# Multi-cloud DNS with Route 53 as primary
# Health checks monitor endpoints on both clouds
resource "aws_route53_health_check" "aws_primary" {
  fqdn              = "aws-app.internal.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = {
    Name = "aws-primary-health"
  }
}

resource "aws_route53_health_check" "azure_secondary" {
  fqdn              = "azure-app.internal.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = {
    Name = "azure-secondary-health"
  }
}

resource "aws_route53_record" "app_primary" {
  zone_id = var.hosted_zone_id
  name    = "app.example.com"
  type    = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier  = "primary-aws"
  health_check_id = aws_route53_health_check.aws_primary.id

  alias {
    name                   = aws_lb.main.dns_name
    zone_id                = aws_lb.main.zone_id
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "app_secondary" {
  zone_id = var.hosted_zone_id
  name    = "app.example.com"
  type    = "A"

  failover_routing_policy {
    type = "SECONDARY"
  }

  set_identifier  = "secondary-azure"
  health_check_id = aws_route53_health_check.azure_secondary.id

  alias {
    name                   = "azure-tm.trafficmanager.net"
    zone_id                = "Z2FDTNDATAQYW2"
    evaluate_target_health = true
  }
}

Multi-Cloud Network Topology

flowchart TB
    Users[Global Users] --> GLB[Global Load Balancer
Cloudflare/Route53]
    
    GLB --> AWS_LB[AWS ALB
us-east-1]
    GLB --> AZ_LB[Azure Front Door
East US]
    
    subgraph AWS["AWS VPC (10.0.0.0/16)"]
        AWS_LB --> AWS_APP[EKS Cluster]
        AWS_APP --> AWS_DB[(RDS PostgreSQL)]
    end
    
    subgraph Azure["Azure VNet (10.1.0.0/16)"]
        AZ_LB --> AZ_APP[AKS Cluster]
        AZ_APP --> AZ_DB[(Azure Database)]
    end
    
    AWS_APP <-->|VPN Tunnel
IPsec/BGP| AZ_APP
    AWS_DB <-->|Cross-Cloud
Replication| AZ_DB

Multi-Cloud Identity & Security

In a multi-cloud environment, identity is the new perimeter. Each cloud has its own IAM system (AWS IAM, Azure Entra ID, GCP IAM), and federating identities across them is essential for both human operators and machine-to-machine communication.

Federated Identity Architecture

# AWS: Trust Azure AD as OIDC identity provider
resource "aws_iam_openid_connect_provider" "azure_ad" {
  url             = "https://login.microsoftonline.com/${var.azure_tenant_id}/v2.0"
  client_id_list  = [var.azure_app_client_id]
  thumbprint_list = [var.azure_ad_thumbprint]
}

# IAM Role that Azure workloads can assume via OIDC
resource "aws_iam_role" "azure_cross_cloud" {
  name = "azure-cross-cloud-access"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.azure_ad.arn
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "${aws_iam_openid_connect_provider.azure_ad.url}:aud" = var.azure_app_client_id
            "${aws_iam_openid_connect_provider.azure_ad.url}:sub" = var.azure_managed_identity_object_id
          }
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cross_cloud_s3" {
  role       = aws_iam_role.azure_cross_cloud.name
  policy_arn = aws_iam_policy.cross_cloud_s3_access.arn
}

HashiCorp Vault for Centralized Secrets

# Vault configuration for multi-cloud secrets
resource "vault_mount" "aws_secrets" {
  path        = "aws"
  type        = "aws"
  description = "AWS dynamic credentials"
}

resource "vault_aws_secret_backend_role" "deploy" {
  backend         = vault_mount.aws_secrets.path
  name            = "deploy-role"
  credential_type = "iam_user"

  policy_document = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["s3:*", "ec2:*", "eks:*"]
        Resource = "*"
      }
    ]
  })
}

resource "vault_mount" "azure_secrets" {
  path        = "azure"
  type        = "azure"
  description = "Azure dynamic credentials"
}

resource "vault_azure_secret_backend" "main" {
  subscription_id = var.azure_subscription_id
  tenant_id       = var.azure_tenant_id
  client_id       = var.vault_azure_client_id
  client_secret   = var.vault_azure_client_secret
}

Unified Policy with OPA

# OPA Rego policy: enforce tagging across all clouds
# File: policies/multi-cloud-tagging.rego
package multicloud.tagging

import future.keywords.in

# Required tags for all cloud resources
required_tags := {"environment", "team", "cost-center", "managed-by"}

# Check AWS resources
deny[msg] {
    input.provider == "aws"
    resource := input.planned_values.root_module.resources[_]
    tags := object.get(resource.values, "tags", {})
    missing := required_tags - {key | tags[key]}
    count(missing) > 0
    msg := sprintf("AWS resource %s missing tags: %v", [resource.address, missing])
}

# Check Azure resources
deny[msg] {
    input.provider == "azure"
    resource := input.planned_values.root_module.resources[_]
    tags := object.get(resource.values, "tags", {})
    missing := required_tags - {key | tags[key]}
    count(missing) > 0
    msg := sprintf("Azure resource %s missing tags: %v", [resource.address, missing])
}

# Check GCP resources
deny[msg] {
    input.provider == "gcp"
    resource := input.planned_values.root_module.resources[_]
    labels := object.get(resource.values, "labels", {})
    missing := required_tags - {key | labels[key]}
    count(missing) > 0
    msg := sprintf("GCP resource %s missing labels: %v", [resource.address, missing])
}

Approach	Scope	Identity Type	Complexity	Best For
OIDC Federation	Cross-cloud workloads	Machine identity	Medium	Service-to-service auth
SAML via IdP	Human access	User identity	Low	Console/portal SSO
Vault Dynamic Creds	All credentials	Both	High	Short-lived, audited access
SPIFFE/SPIRE	Workload identity	Machine identity	High	Zero-trust service mesh

Multi-Cloud Data Strategies

Data gravity is the concept that applications tend to move toward where their data resides. In multi-cloud, data gravity is one of the strongest forces determining your architecture — moving compute is easy, moving petabytes of data is not.

                            
                            Data Gravity Rule: Cross-cloud data transfer costs $0.01-$0.09/GB. A 10 TB daily data movement costs $36,000-$328,000/year. Always compute where the data lives, not the other way around.
                        

Cross-Cloud Database Replication

# CockroachDB multi-cloud topology configuration
# CockroachDB natively supports multi-cloud deployment
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: multi-cloud-crdb
spec:
  dataStore:
    pvc:
      spec:
        storageClassName: premium-rwo
        resources:
          requests:
            storage: 100Gi
  nodes: 9
  topology:
    - cloud: aws
      region: us-east-1
      zones: [us-east-1a, us-east-1b, us-east-1c]
      nodes: 3
    - cloud: azure
      region: eastus
      zones: [1, 2, 3]
      nodes: 3
    - cloud: gcp
      region: us-east4
      zones: [us-east4-a, us-east4-b, us-east4-c]
      nodes: 3

# Kafka/Confluent Cloud for cross-cloud event streaming
resource "confluent_kafka_cluster" "multi_cloud" {
  display_name = "multi-cloud-events"
  availability = "MULTI_ZONE"
  cloud        = "AWS"
  region       = "us-east-1"

  dedicated {
    cku = 2
  }
}

# Cluster linking for cross-cloud replication
resource "confluent_cluster_link" "aws_to_azure" {
  link_name = "aws-to-azure-mirror"
  
  source_kafka_cluster {
    id            = confluent_kafka_cluster.multi_cloud.id
    rest_endpoint = confluent_kafka_cluster.multi_cloud.rest_endpoint
  }

  destination_kafka_cluster {
    id            = confluent_kafka_cluster.azure_cluster.id
    rest_endpoint = confluent_kafka_cluster.azure_cluster.rest_endpoint
  }
}

Approach	Consistency	Latency	Cost	Use Case
CockroachDB	Strong (serializable)	Cross-region penalty	High	Multi-cloud OLTP
Kafka Cluster Linking	Eventual	Seconds	Medium	Event streaming
Object Storage Sync	Eventual	Minutes	Transfer + storage	Data lake mirroring
Database CDC	Eventual	Seconds	Low	Read replicas
API-level Sync	Application-defined	Variable	Low	Selective data sharing

Multi-Cloud with Terraform

Terraform is the most widely adopted tool for multi-cloud infrastructure because it supports 3,000+ providers through a single workflow. However, using multiple providers in one project requires careful state and dependency management.

Multi-Provider Project Structure

# Recommended directory structure for multi-cloud Terraform
multi-cloud-infra/
├── modules/
│   ├── networking/
│   │   ├── aws/          # AWS-specific networking
│   │   ├── azure/        # Azure-specific networking
│   │   └── interface.tf  # Shared input/output contract
│   ├── compute/
│   │   ├── aws/
│   │   ├── azure/
│   │   └── interface.tf
│   └── dns/
│       └── main.tf       # Cloud-agnostic DNS module
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── providers.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/
├── shared/
│   ├── vpn-connections/  # Cross-cloud connectivity
│   └── dns-zones/        # Global DNS management
└── terragrunt.hcl        # DRY configuration

# providers.tf - Multi-cloud provider configuration
terraform {
  required_version = ">= 1.6"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "multi-cloud/production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = "multi-cloud-platform"
    }
  }
}

provider "azurerm" {
  features {}
  subscription_id = var.azure_subscription_id
}

provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
}

# main.tf - Multi-cloud deployment orchestration
locals {
  aws_vpc_cidr   = "10.0.0.0/16"
  azure_vnet_cidr = "10.1.0.0/16"
  gcp_vpc_cidr   = "10.2.0.0/16"
}

# AWS Infrastructure
module "aws_networking" {
  source   = "../../modules/networking/aws"
  vpc_cidr = local.aws_vpc_cidr
  environment = var.environment
}

module "aws_compute" {
  source     = "../../modules/compute/aws"
  vpc_id     = module.aws_networking.vpc_id
  subnet_ids = module.aws_networking.private_subnet_ids
}

# Azure Infrastructure
module "azure_networking" {
  source    = "../../modules/networking/azure"
  vnet_cidr = local.azure_vnet_cidr
  environment = var.environment
}

module "azure_compute" {
  source    = "../../modules/compute/azure"
  vnet_id   = module.azure_networking.vnet_id
  subnet_id = module.azure_networking.private_subnet_id
}

# Cross-Cloud Connectivity
module "vpn_aws_to_azure" {
  source = "../../shared/vpn-connections"

  aws_vpc_id         = module.aws_networking.vpc_id
  aws_vpc_cidr       = local.aws_vpc_cidr
  azure_vnet_id      = module.azure_networking.vnet_id
  azure_vnet_cidr    = local.azure_vnet_cidr
  azure_gateway_subnet_id = module.azure_networking.gateway_subnet_id
  vpn_preshared_key  = var.vpn_preshared_key
}

# Outputs for cross-referencing
output "aws_endpoint" {
  value = module.aws_compute.load_balancer_dns
}

output "azure_endpoint" {
  value = module.azure_compute.load_balancer_ip
}

Terraform Multi-Cloud Workflow

flowchart LR
    subgraph Plan["terraform plan"]
        P1[AWS Provider] --> P2[Plan AWS Resources]
        P3[Azure Provider] --> P4[Plan Azure Resources]
        P5[GCP Provider] --> P6[Plan GCP Resources]
    end

    subgraph Apply["terraform apply"]
        A1[Create AWS VPC] --> A2[Create Azure VNet]
        A2 --> A3[Create VPN Connection]
        A3 --> A4[Create GCP VPC]
        A4 --> A5[Deploy K8s Clusters]
    end

    subgraph State["State Management"]
        S1[Single State File
Multi-Provider]
        S2[Split State
Per Provider]
    end

    Plan --> Apply
    Apply --> State

Multi-Cloud Kubernetes

Kubernetes is the most common compute abstraction layer for multi-cloud. Running workloads across EKS, AKS, and GKE provides a consistent deployment target regardless of the underlying cloud — but multi-cluster management introduces its own complexity.

Multi-Cluster Management Tools

Tool	Vendor	Model	Best For	Complexity
Rancher	SUSE	Central management plane	Multi-cluster lifecycle	Medium
Azure Arc	Microsoft	Azure control plane extension	Azure-centric hybrid	Medium
Anthos	Google	GCP control plane extension	GCP-centric hybrid	High
Kubefed	CNCF	Federation v2 CRDs	Resource propagation	High
Loft/vCluster	Loft	Virtual clusters	Multi-tenancy	Low
Skupper	Red Hat	Application networking	Cross-cluster services	Low

GitOps for Multi-Cluster Deployments

# ArgoCD ApplicationSet for multi-cluster deployment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: multi-cloud-app
  namespace: argocd
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            environment: production
  template:
    metadata:
      name: "app-{{name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/multi-cloud-app.git
        targetRevision: main
        path: "deploy/overlays/{{metadata.labels.cloud}}"
      destination:
        server: "{{server}}"
        namespace: production
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

# Kustomize overlay for AWS-specific configuration
# deploy/overlays/aws/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

patches:
  - target:
      kind: Deployment
      name: api-server
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: CLOUD_PROVIDER
          value: "aws"
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: OBJECT_STORAGE_ENDPOINT
          value: "s3.us-east-1.amazonaws.com"
      - op: add
        path: /spec/template/spec/serviceAccountName
        value: app-irsa-sa

  - target:
      kind: Service
      name: api-server
    patch: |-
      - op: add
        path: /metadata/annotations
        value:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

# Kustomize overlay for Azure-specific configuration
# deploy/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

patches:
  - target:
      kind: Deployment
      name: api-server
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: CLOUD_PROVIDER
          value: "azure"
      - op: add
        path: /spec/template/spec/containers/0/env/-
        value:
          name: OBJECT_STORAGE_ENDPOINT
          value: "https://storageaccount.blob.core.windows.net"
      - op: add
        path: /spec/template/metadata/labels/azure.workload.identity~1use
        value: "true"

  - target:
      kind: Service
      name: api-server
    patch: |-
      - op: add
        path: /metadata/annotations
        value:
          service.beta.kubernetes.io/azure-load-balancer-internal: "false"

# ArgoCD cluster registration
# Register multiple clusters for multi-cloud GitOps
apiVersion: v1
kind: Secret
metadata:
  name: aws-production
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: production
    cloud: aws
    region: us-east-1
type: Opaque
stringData:
  name: aws-production
  server: https://eks-cluster.us-east-1.eks.amazonaws.com
  config: |
    {
      "execProviderConfig": {
        "command": "aws",
        "args": ["eks", "get-token", "--cluster-name", "production"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      }
    }
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-production
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: production
    cloud: azure
    region: eastus
type: Opaque
stringData:
  name: azure-production
  server: https://aks-cluster.eastus.azmk8s.io
  config: |
    {
      "execProviderConfig": {
        "command": "kubelogin",
        "args": ["get-token", "--login", "azurecli", "--server-id", "APP_ID"],
        "apiVersion": "client.authentication.k8s.io/v1beta1"
      }
    }

Cost & Governance

Multi-cloud cost management is exponentially harder than single-cloud. Each provider has different pricing models, different discount mechanisms (Reserved Instances, Committed Use Discounts, Azure Reservations), and different billing APIs. Without unified governance, costs spiral.

                            
                            Cost Reality Check: Organizations running multi-cloud typically spend 20-35% more than equivalent single-cloud deployments due to: duplicate infrastructure, cross-cloud data transfer, multiple support contracts, and the inability to fully leverage volume discounts from any single provider.
                        

FinOps Practices for Multi-Cloud

# Unified tagging policy across clouds
# Enforced via OPA/Sentinel at deploy time
tagging_standard:
  required:
    - key: "cost-center"
      format: "CC-[0-9]{4}"
      description: "Financial cost center code"
    - key: "environment"
      values: [dev, staging, production]
    - key: "team"
      description: "Owning team name"
    - key: "service"
      description: "Service/application name"
    - key: "managed-by"
      values: [terraform, crossplane, manual]
  
  provider_mapping:
    aws:
      tag_key_format: "PascalCase"
      example: "CostCenter: CC-1234"
    azure:
      tag_key_format: "camelCase"  
      example: "costCenter: CC-1234"
    gcp:
      label_key_format: "lowercase-hyphen"
      example: "cost-center: cc-1234"

# Terraform: enforce tagging via variable validation
variable "tags" {
  type = map(string)

  validation {
    condition = alltrue([
      contains(keys(var.tags), "cost-center"),
      contains(keys(var.tags), "environment"),
      contains(keys(var.tags), "team"),
      contains(keys(var.tags), "service"),
      can(regex("^CC-[0-9]{4}$", var.tags["cost-center"]))
    ])
    error_message = "Tags must include cost-center (CC-XXXX format), environment, team, and service."
  }
}

Tool	Type	Clouds Supported	Best For
Kubecost	K8s-native cost	All (via K8s)	Container workload costs
Infracost	Pre-deploy estimation	AWS, Azure, GCP	Cost in PR reviews
CloudHealth	Cloud cost management	AWS, Azure, GCP	Enterprise FinOps
Spot.io	Cost optimization	AWS, Azure, GCP	Spot/preemptible automation
Vantage	Cost visibility	AWS, Azure, GCP, Datadog	Developer-friendly reporting
OpenCost	CNCF standard	All (via K8s)	Open-source K8s cost allocation

Common Anti-Patterns

Multi-cloud failures are more common than successes. Recognizing anti-patterns early saves millions in wasted infrastructure spending and years of accumulated technical debt.

                            
                            Anti-Pattern #1: Accidental Multi-Cloud. No strategy, just drift. One team chose AWS, another chose Azure, and now you have "multi-cloud" with no shared tooling, no cross-cloud networking, and duplicate operational toil. This is poly-cloud, not multi-cloud.
                        

                            
                            Anti-Pattern #2: Lowest Common Denominator. Abstracting away all cloud-native capabilities to achieve portability. You end up using no cloud well — no managed databases, no serverless, no cloud-native AI services — building everything on raw VMs and Kubernetes. The result costs more and delivers less than using any single cloud natively.
                        

                            
                            Anti-Pattern #3: Duplicate Everything. Running the same workload on two clouds "for resilience" without a genuine failover mechanism. You pay 2x infrastructure cost, 2x operational burden, and when failover is actually needed, the secondary has never been tested under real load.
                        

                            
                            Anti-Pattern #4: No Shared Tooling. Each team picks their own CI/CD, monitoring, secrets management, and IaC tool per cloud. Operations teams cannot support the combinatorial explosion. Hire 3x the staff or consolidate tooling.
                        

                            
                            Anti-Pattern #5: Ignoring Data Gravity. Placing compute on Cloud B while data lives on Cloud A, then wondering why cross-cloud data transfer costs $50,000/month and latency breaks SLAs. Always compute where data lives.
                        

Hands-On Exercises

Exercise 1 Multi-Cloud Strategy Design

Design a Multi-Cloud Strategy

Scenario: A fintech company processes payments on AWS (for PCI compliance tooling), runs their data analytics on GCP BigQuery, and has acquired a company with all infrastructure on Azure. They need a unified strategy.

Tasks:

Identify which multi-cloud pattern best fits this scenario
Design the networking topology (VPN or interconnect between clouds)
Define the identity federation strategy
Create a unified tagging/labeling standard
Propose a centralized observability approach
Estimate the monthly cross-cloud data transfer cost for 5 TB/day

# Calculate cross-cloud transfer costs
# AWS egress to internet: $0.09/GB (first 10 TB)
# Azure ingress: free
# Estimate for 5 TB/day = 150 TB/month
echo "AWS egress cost: 150000 GB * $0.09 = $13,500/month"
echo "With Direct Connect: 150000 GB * $0.02 = $3,000/month"
echo "Savings with dedicated interconnect: $10,500/month"

Strategy Architecture FinOps

Exercise 2 Terraform Multi-Provider Project

Build a Multi-Cloud Terraform Project

Objective: Create a Terraform project that provisions a VPC on AWS and a VNet on Azure, then connects them with a VPN tunnel.

# Exercise: Complete this multi-cloud configuration
# File: main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "azurerm" {
  features {}
}

# Task 1: Create AWS VPC with CIDR 10.0.0.0/16
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "multi-cloud-vpc" }
}

# Task 2: Create Azure Resource Group and VNet with CIDR 10.1.0.0/16
resource "azurerm_resource_group" "main" {
  name     = "multi-cloud-rg"
  location = "East US"
}

resource "azurerm_virtual_network" "main" {
  name                = "multi-cloud-vnet"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

# Task 3: Create subnets on both sides
# Task 4: Set up VPN gateways on both sides
# Task 5: Establish the VPN connection
# Task 6: Verify connectivity with a test VM on each side

Validation:

# Validate Terraform configuration
terraform init
terraform validate
terraform plan -out=multicloud.plan

# Check resource count
terraform show multicloud.plan | grep "Plan:"
# Expected: "Plan: X to add, 0 to change, 0 to destroy."

Terraform AWS Azure VPN

Exercise 3 Cross-Cloud VPN Connectivity

Set Up Cross-Cloud VPN Connectivity

Objective: Configure IPsec VPN between AWS and GCP with BGP dynamic routing.

# GCP side: Create HA VPN gateway
resource "google_compute_ha_vpn_gateway" "to_aws" {
  name    = "ha-vpn-to-aws"
  network = google_compute_network.main.id
  region  = "us-central1"
}

resource "google_compute_router" "vpn_router" {
  name    = "vpn-router"
  network = google_compute_network.main.id
  region  = "us-central1"

  bgp {
    asn               = 64513
    advertise_mode    = "CUSTOM"
    advertised_groups = ["ALL_SUBNETS"]
  }
}

# Task: Create the following resources
# 1. google_compute_external_vpn_gateway (represents AWS VPN endpoint)
# 2. google_compute_vpn_tunnel (2 tunnels for HA)
# 3. google_compute_router_interface (BGP interface)
# 4. google_compute_router_peer (BGP peer configuration)

# AWS side: Create matching VPN infrastructure
# 1. aws_vpn_gateway
# 2. aws_customer_gateway (points to GCP external IP)
# 3. aws_vpn_connection (with BGP enabled)

# Verify VPN tunnel status
# AWS
aws ec2 describe-vpn-connections \
  --filters "Name=tag:Name,Values=aws-to-gcp-vpn" \
  --query "VpnConnections[].VgwTelemetry[].Status"

# GCP
gcloud compute vpn-tunnels describe ha-vpn-to-aws-tunnel-0 \
  --region=us-central1 \
  --format="value(status)"

# Expected output: "ESTABLISHED" on both sides

VPN BGP GCP AWS

Exercise 4 Unified Multi-Cloud Monitoring

Implement Multi-Cloud Monitoring

Objective: Set up a unified Grafana dashboard that displays metrics from AWS CloudWatch, Azure Monitor, and GCP Cloud Monitoring.

# Grafana data sources for multi-cloud monitoring
# grafana/provisioning/datasources/multi-cloud.yaml
apiVersion: 1

datasources:
  - name: AWS CloudWatch
    type: cloudwatch
    access: proxy
    jsonData:
      authType: keys
      defaultRegion: us-east-1
    secureJsonData:
      accessKey: ${AWS_ACCESS_KEY}
      secretKey: ${AWS_SECRET_KEY}

  - name: Azure Monitor
    type: grafana-azure-monitor-datasource
    access: proxy
    jsonData:
      cloudName: azuremonitor
      tenantId: ${AZURE_TENANT_ID}
      clientId: ${AZURE_CLIENT_ID}
      subscriptionId: ${AZURE_SUBSCRIPTION_ID}
    secureJsonData:
      clientSecret: ${AZURE_CLIENT_SECRET}

  - name: GCP Cloud Monitoring
    type: stackdriver
    access: proxy
    jsonData:
      authenticationType: gce
      defaultProject: ${GCP_PROJECT_ID}

{
  "dashboard": {
    "title": "Multi-Cloud Overview",
    "panels": [
      {
        "title": "AWS EC2 CPU Utilization",
        "datasource": "AWS CloudWatch",
        "targets": [{
          "namespace": "AWS/EC2",
          "metricName": "CPUUtilization",
          "statistics": ["Average"],
          "period": "300"
        }]
      },
      {
        "title": "Azure VM CPU Percentage",
        "datasource": "Azure Monitor",
        "targets": [{
          "resourceGroup": "production-rg",
          "metricDefinition": "Microsoft.Compute/virtualMachines",
          "metricName": "Percentage CPU",
          "aggregation": "Average"
        }]
      },
      {
        "title": "GCP GCE CPU Usage",
        "datasource": "GCP Cloud Monitoring",
        "targets": [{
          "metricType": "compute.googleapis.com/instance/cpu/utilization",
          "filters": ["resource.type=\"gce_instance\""]
        }]
      }
    ]
  }
}

Grafana Monitoring Observability FinOps

Conclusion & Next Steps

Multi-cloud architecture is a powerful tool when applied intentionally, and a costly burden when adopted accidentally. The patterns, tooling, and practices in this article provide a framework for making informed decisions about when and how to distribute workloads across providers.

Key takeaways:

Strategy first — Choose a multi-cloud pattern (best-of-breed, active-passive, workload distribution, or cloud-agnostic) based on actual requirements, not buzzword compliance
Abstract operations, not capabilities — Terraform, Kubernetes, and service mesh abstract how you deploy; avoid abstracting away what each cloud uniquely offers
Networking is the hard part — Cross-cloud VPN/interconnect, DNS federation, and global load balancing require significant planning and testing
Federate identity early — OIDC federation and centralized secrets management (Vault) are prerequisites, not afterthoughts
Respect data gravity — Compute should move to data, not the other way around; cross-cloud transfer costs add up fast
Unified tooling is essential — GitOps (ArgoCD), observability (Grafana), and policy (OPA) must span all clouds consistently
Budget for complexity — Multi-cloud requires 2-3x the operational expertise of single-cloud; staff accordingly

Next in the Series

In Part 17: Service Mesh & Advanced Networking, we dive into Istio, Envoy proxies, mutual TLS, traffic management, circuit breaking, and the advanced networking patterns that enable secure, observable communication across multi-cloud Kubernetes clusters.

Previous Part 15: Advanced Terraform Patterns Next Part 17: Service Mesh & Advanced Networking

Cookie Consent

Part 16: Multi-Cloud Architecture

Table of Contents

Why Multi-Cloud

Definitions That Matter

When Multi-Cloud Is Worth It

Multi-Cloud Strategy Patterns

Pattern 1: Best-of-Breed

Pattern 2: Active-Passive DR

Pattern 3: Workload Distribution

Pattern 4: Cloud-Agnostic

Abstraction Layers

Crossplane for Cloud-Agnostic Infrastructure

Multi-Cloud Networking

Cross-Cloud Connectivity Options

Terraform: AWS-to-Azure VPN Tunnel

DNS & Global Load Balancing

Multi-Cloud Identity & Security

Federated Identity Architecture

HashiCorp Vault for Centralized Secrets

Unified Policy with OPA

Multi-Cloud Data Strategies

Cross-Cloud Database Replication

Multi-Cloud with Terraform

Multi-Provider Project Structure

Multi-Cloud Kubernetes

Multi-Cluster Management Tools

GitOps for Multi-Cluster Deployments

Cost & Governance

FinOps Practices for Multi-Cloud

Common Anti-Patterns

Hands-On Exercises

Design a Multi-Cloud Strategy

Build a Multi-Cloud Terraform Project

Set Up Cross-Cloud VPN Connectivity

Implement Multi-Cloud Monitoring

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 16: Multi-Cloud Architecture

Table of Contents

Why Multi-Cloud

Definitions That Matter

When Multi-Cloud Is Worth It

Multi-Cloud Strategy Patterns

Pattern 1: Best-of-Breed

Pattern 2: Active-Passive DR

Pattern 3: Workload Distribution

Pattern 4: Cloud-Agnostic

Abstraction Layers

Crossplane for Cloud-Agnostic Infrastructure

Multi-Cloud Networking

Cross-Cloud Connectivity Options

Terraform: AWS-to-Azure VPN Tunnel

DNS & Global Load Balancing

Multi-Cloud Identity & Security

Federated Identity Architecture

HashiCorp Vault for Centralized Secrets

Unified Policy with OPA

Multi-Cloud Data Strategies

Cross-Cloud Database Replication

Multi-Cloud with Terraform

Multi-Provider Project Structure

Multi-Cloud Kubernetes

Multi-Cluster Management Tools

GitOps for Multi-Cluster Deployments

Cost & Governance

FinOps Practices for Multi-Cloud

Common Anti-Patterns

Hands-On Exercises

Design a Multi-Cloud Strategy

Build a Multi-Cloud Terraform Project

Set Up Cross-Cloud VPN Connectivity

Implement Multi-Cloud Monitoring

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 15: Advanced Terraform Patterns

Part 7: Cloud Networking Deep Dive

Part 5: Terraform Fundamentals