Back to Modern DevOps & Platform Engineering Series

Part 14: FinOps & Cloud Economics

May 15, 2026 Wasil Zafar 30 min read

Master cloud cost optimisation — resource right-sizing, Kubernetes cost attribution, showback and chargeback models, spot instances, reserved capacity, and building a sustainable FinOps practice.

Table of Contents

  1. Introduction to FinOps
  2. Cost Visibility & Allocation
  3. Resource Right-Sizing
  4. Rate Optimisation
  5. Showback & Chargeback
  6. Cost Automation
  7. Conclusion & Next Steps

What is FinOps?

FinOps (Financial Operations) is the practice of bringing financial accountability to cloud spending. It combines people, processes, and tools to enable organisations to make informed trade-offs between speed, cost, and quality. FinOps doesn't mean spending less — it means getting maximum value from every dollar spent on cloud infrastructure.

Think of FinOps like fuel efficiency in a fleet of vehicles. The goal isn't to drive less — it's to ensure every trip delivers value, routes are optimised, engines are tuned, and vehicles are right-sized for their cargo. Some trucks need to be large; the waste is running a 40-tonne truck for a 500kg delivery.

Key Insight: The average organisation wastes 30-35% of its cloud spend on idle, over-provisioned, or orphaned resources. FinOps isn't about cutting costs — it's about eliminating waste so that budget flows to the workloads that drive business value.

The FinOps Framework

FinOps Lifecycle — Inform, Optimise, Operate
flowchart LR
    Inform["Inform
Visibility & Allocation"] --> Optimize["Optimise
Right-Size & Rates"] Optimize --> Operate["Operate
Governance & Automation"] Operate -->|"Continuous"| Inform style Inform fill:#e8f4f4,stroke:#3B9797,color:#132440 style Optimize fill:#f0f4f8,stroke:#16476A,color:#132440 style Operate fill:#f0f4f8,stroke:#16476A,color:#132440
PhaseActivitiesKey Metrics
InformTag resources, allocate costs to teams, build dashboardsCost per team, cost per service, untagged spend %
OptimiseRight-size instances, purchase reservations, eliminate wasteCPU/memory utilisation, savings vs on-demand, idle resource cost
OperateBudget alerts, automated scaling, policy enforcementBudget variance, anomaly detection rate, cost per deployment

Cost Visibility & Allocation

Tagging Strategy

Tags are the foundation of cost allocation. Without consistent tagging, cloud bills are unattributable — just a large number that nobody owns. A robust tagging strategy requires mandatory tags enforced by policy.

# Kyverno policy — enforce cost allocation tags on all resources
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-cost-labels
  annotations:
    policies.kyverno.io/title: Require Cost Allocation Labels
    policies.kyverno.io/severity: high
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: require-cost-labels
      match:
        any:
          - resources:
              kinds: ["Deployment", "StatefulSet", "Job", "CronJob"]
      validate:
        message: >-
          Cost allocation labels are required: 'team', 'cost-center',
          'environment', and 'service'. Add these labels to metadata.labels.
        pattern:
          metadata:
            labels:
              team: "?*"
              cost-center: "?*"
              environment: "?*"
              service: "?*"

Kubernetes Cost Attribution

Kubernetes makes cost attribution challenging because multiple workloads share node resources. A single EC2 instance might run pods from five different teams. Kubecost and OpenCost solve this by measuring actual CPU, memory, and storage consumption per pod and attributing node costs proportionally.

# Install Kubecost via Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="YOUR_TOKEN" \
  --set prometheus.server.retention=30d

# Verify installation
kubectl get pods -n kubecost
# NAME                                    READY   STATUS    RESTARTS   AGE
# kubecost-cost-analyzer-xxx              1/1     Running   0          2m
# kubecost-prometheus-server-xxx          1/1     Running   0          2m

# Access the dashboard
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
echo "Kubecost dashboard at http://localhost:9090"
# Query Kubecost API for team-level cost allocation
curl -s "http://localhost:9090/model/allocation?window=7d&aggregate=label:team" | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
for alloc in data.get('data', [{}])[0].values():
    name = alloc.get('name', 'unallocated')
    total = alloc.get('totalCost', 0)
    cpu = alloc.get('cpuCost', 0)
    mem = alloc.get('ramCost', 0)
    print(f'{name:20s} Total: \${total:8.2f}  CPU: \${cpu:6.2f}  Memory: \${mem:6.2f}')
"
# team-payments         Total: $  342.50  CPU: $187.20  Memory: $155.30
# team-frontend         Total: $  218.75  CPU: $120.40  Memory: $ 98.35
# team-data             Total: $  567.80  CPU: $312.50  Memory: $255.30
# __unallocated__       Total: $   89.20  CPU: $ 45.10  Memory: $ 44.10

Resource Right-Sizing

Right-sizing is the single highest-impact FinOps optimisation. Most Kubernetes workloads request 2-5× more CPU and memory than they actually use — because developers set requests based on peak load estimates rather than actual consumption data.

Case Study Lyft

Lyft's Right-Sizing Programme

Lyft discovered that their Kubernetes clusters averaged 12% CPU utilisation — meaning 88% of compute capacity was paid for but unused. By deploying Vertical Pod Autoscaler (VPA) in recommendation mode, they identified that 73% of workloads had requests set 3× higher than p99 usage. A phased right-sizing campaign (automated recommendations → team review → gradual adjustment) reduced their compute bill by $4.2M annually while maintaining the same performance SLOs.

12% Utilisation $4.2M Savings VPA

Vertical Pod Autoscaler (VPA)

# VPA in recommendation mode — analyse first, resize later
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-service-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  updatePolicy:
    updateMode: "Off"   # Recommendation only — no automatic resizing
  resourcePolicy:
    containerPolicies:
      - containerName: payment-service
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2000m
          memory: 4Gi
# View VPA recommendations
kubectl describe vpa payment-service-vpa -n production

# Status:
#   Recommendation:
#     Container Recommendations:
#       Container Name: payment-service
#       Lower Bound:    Cpu: 85m,  Memory: 128Mi
#       Target:         Cpu: 150m, Memory: 256Mi    ← Use this
#       Upper Bound:    Cpu: 400m, Memory: 512Mi
#       Uncapped Target: Cpu: 150m, Memory: 256Mi
#
# Current requests: Cpu: 500m, Memory: 1Gi
# Recommendation:   Cpu: 150m, Memory: 256Mi
# Potential savings: 70% CPU, 75% memory

echo "Right-sizing analysis complete"

Rate Optimisation

Reserved Instances & Savings Plans

For workloads with predictable, steady-state resource consumption (databases, control planes, core services), reserved capacity offers 30-72% savings over on-demand pricing. The key is matching commitment levels to actual baseline usage — not peak capacity.

StrategyCommitmentDiscountFlexibilityBest For
On-DemandNone0%MaximumUnpredictable workloads
1-Year Reserved1 year30-40%LowStable, predictable services
3-Year Reserved3 years55-72%Very lowDatabases, infrastructure
Savings Plans$/hr spend20-66%MediumFlexible compute commitment
Spot/PreemptibleNone60-90%InterruptibleBatch, CI, fault-tolerant

Spot Instances for Kubernetes

# Karpenter provisioner — mix of on-demand and spot capacity
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        name: default
  limits:
    cpu: "1000"
    memory: 4000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Showback & Chargeback Models

Showback presents cost data to teams without financial consequences — creating awareness. Chargeback actually bills teams from their budget for resources consumed. Most organisations start with showback and graduate to chargeback as cost visibility matures.

Kubecost-Driven Showback Reports

# Generate weekly cost report per team using Kubecost API
curl -s "http://kubecost:9090/model/allocation?window=7d&aggregate=label:team&accumulate=true" | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
print('Weekly Kubernetes Cost Report')
print('=' * 60)
total = 0
for name, alloc in sorted(data.get('data', [{}])[0].items()):
    cost = alloc.get('totalCost', 0)
    eff = alloc.get('totalEfficiency', 0) * 100
    total += cost
    bar = '#' * int(eff / 5)
    print(f'{name:20s}  \${cost:8.2f}  Efficiency: {eff:5.1f}% {bar}')
print('=' * 60)
print(f'{\"TOTAL\":20s}  \${total:8.2f}')
"
# Weekly Kubernetes Cost Report
# ============================================================
# team-data              $  567.80  Efficiency:  72.3% ##############
# team-frontend          $  218.75  Efficiency:  45.2% #########
# team-payments          $  342.50  Efficiency:  68.1% #############
# __unallocated__        $   89.20  Efficiency:   0.0%
# ============================================================
# TOTAL                  $ 1218.25
Common Pitfall: Don't optimise costs without measuring the impact on performance. A team that right-sizes aggressively but causes latency spikes hasn't saved money — they've created an incident. Always pair cost metrics with performance SLOs. The goal is cost efficiency (cost per request, cost per transaction), not simply lower bills.

Cost Automation & Governance

Automated Idle Resource Cleanup

# CronJob to identify and report idle Kubernetes resources
apiVersion: batch/v1
kind: CronJob
metadata:
  name: idle-resource-reporter
  namespace: finops
spec:
  schedule: "0 8 * * 1"  # Every Monday at 8am
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: reporter
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  echo "=== Idle Resource Report ==="
                  echo ""
                  echo "Deployments with 0 replicas (>7 days):"
                  kubectl get deployments -A -o json | \
                    jq -r '.items[] | select(.spec.replicas == 0) |
                    "\(.metadata.namespace)/\(.metadata.name)"'
                  echo ""
                  echo "PVCs not mounted to any pod:"
                  kubectl get pvc -A -o json | \
                    jq -r '.items[] | select(.status.phase == "Bound") |
                    "\(.metadata.namespace)/\(.metadata.name)"' | \
                    while read pvc; do
                      ns=$(echo $pvc | cut -d/ -f1)
                      name=$(echo $pvc | cut -d/ -f2)
                      mounted=$(kubectl get pods -n $ns -o json | \
                        jq -r ".items[].spec.volumes[]? | select(.persistentVolumeClaim.claimName == \"$name\")" | wc -l)
                      [ "$mounted" -eq 0 ] && echo "  UNUSED: $pvc"
                    done
          restartPolicy: Never

Conclusion & Next Steps

FinOps is not a one-time exercise — it's a continuous practice of visibility, optimisation, and governance. The organisations that succeed at FinOps make cost a first-class engineering metric alongside latency, availability, and throughput.

  • Start with visibility — Tag everything, deploy Kubecost, and build dashboards that every team sees weekly.
  • Right-size first — VPA recommendations are the lowest-hanging fruit. Most workloads are 2-5× over-provisioned.
  • Commit strategically — Reserve baseline capacity with 1-year plans. Use spot for burst and batch workloads.
  • Automate governance — Budget alerts, idle resource cleanup, and cost policies prevent drift.
  • Measure efficiency, not just cost — Cost per request, cost per transaction, and utilisation percentages matter more than total spend.

Next in the Series

In Part 15: AIOps & Intelligent Automation, we'll explore ML-driven operations — anomaly detection, predictive alerting, automated incident response, chaos engineering, and intelligent runbook automation.