Why Policy Engines?
Every Kubernetes cluster has rules: "no containers running as root", "all images must come from our registry", "every Deployment must have resource limits", "PCI workloads must not share nodes with non-PCI workloads". Without a policy engine, these rules exist only as wiki pages and verbal agreements — violated within days.
A policy engine intercepts every API server admission request (create, update, delete) and evaluates it against codified rules. It can validate (accept/reject), mutate (add defaults), and generate (create companion resources). The two dominant engines in the Kubernetes ecosystem are OPA Gatekeeper and Kyverno.
kubectl apply from every team gets evaluated in milliseconds against every rule — governance scales at machine speed.
OPA Gatekeeper
Open Policy Agent (OPA) is a general-purpose policy engine — not Kubernetes-specific. Gatekeeper is the Kubernetes-native integration that runs OPA as a validating admission webhook, using ConstraintTemplate (the reusable rule definition in Rego) and Constraint (the parameterised application of that rule) CRDs.
OPA graduated from CNCF in 2021. Gatekeeper reached v3 stability in 2022 and is used at scale by Azure (AKS Azure Policy), Google (GKE Policy Controller), and major enterprises like Goldman Sachs and Intuit.
flowchart TD
User["kubectl apply"] --> API["K8s API Server"]
API --> Webhook["Gatekeeper Webhook
(Validating Admission)"]
Webhook --> OPA["OPA Engine
(Rego evaluation)"]
OPA --> Templates["ConstraintTemplates
(Rego rules)"]
OPA --> Constraints["Constraints
(Parameterised policy instances)"]
OPA --> Data["Audit Cache
(Replicated cluster state)"]
OPA -->|Allow/Deny| API
API --> etcd["etcd"]
style User fill:#e8f4f4,stroke:#3B9797,color:#132440
style API fill:#f0f4f8,stroke:#16476A,color:#132440
style Webhook fill:#f0f4f8,stroke:#16476A,color:#132440
style OPA fill:#132440,stroke:#132440,color:#ffffff
style Templates fill:#e8f4f4,stroke:#3B9797,color:#132440
style Constraints fill:#e8f4f4,stroke:#3B9797,color:#132440
style Data fill:#fff5f5,stroke:#BF092F,color:#132440
style etcd fill:#f0f4f8,stroke:#16476A,color:#132440
# Install Gatekeeper v3
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.16.0/deploy/gatekeeper.yaml
# Verify
kubectl get pods -n gatekeeper-system
kubectl get crd | grep gatekeeper
Writing Rego Policies
Rego is OPA's purpose-built policy language — declarative, logic-based, and designed for evaluating structured data (JSON/YAML). It feels like Prolog meets SQL. The learning curve is steeper than YAML but gives you full expressive power for complex policies.
# constrainttemplate-require-labels.yaml
# Step 1: Define the reusable rule (ConstraintTemplate)
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
description: "List of labels that must be present"
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
# Deny if any required label is missing
violation[{"msg": msg}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("Missing required labels: %v", [missing])
}
# constraint-require-team-label.yaml
# Step 2: Apply the rule with specific parameters
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-team-label
spec:
enforcementAction: deny # deny | dryrun | warn
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment", "StatefulSet"]
- apiGroups: [""]
kinds: ["Service"]
namespaceSelector:
matchExpressions:
- key: environment
operator: In
values: ["production", "staging"]
excludedNamespaces:
- kube-system
- gatekeeper-system
parameters:
labels:
- "team"
- "cost-center"
- "tier"
A more complex example — restricting container registries with per-namespace exceptions:
# constrainttemplate-allowed-repos.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sallowedrepos
spec:
crd:
spec:
names:
kind: K8sAllowedRepos
validation:
openAPIV3Schema:
type: object
properties:
repos:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sallowedrepos
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not image_allowed(container.image)
msg := sprintf("Container '%v' uses disallowed image '%v'. Allowed repos: %v",
[container.name, container.image, input.parameters.repos])
}
violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
not image_allowed(container.image)
msg := sprintf("Init container '%v' uses disallowed image '%v'. Allowed repos: %v",
[container.name, container.image, input.parameters.repos])
}
image_allowed(image) {
repo := input.parameters.repos[_]
startswith(image, repo)
}
Kyverno
Kyverno (Greek for "govern") takes a radically different approach: policies are written in native Kubernetes YAML — no separate language to learn. It joined CNCF as a sandbox project in 2020, reached incubating in 2022, and graduated in 2024. Kyverno is the fastest-growing policy engine in the CNCF ecosystem, particularly popular with teams that find Rego too steep a learning curve.
Kyverno handles four policy types in one system: validate (accept/reject), mutate (inject defaults), generate (create companion resources), and verifyImages (cosign/sigstore verification).
# Install Kyverno via Helm
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno \
--namespace kyverno \
--create-namespace \
--set admissionController.replicas=3 \
--set backgroundController.replicas=2
# Verify
kubectl get pods -n kyverno
kubectl get crd | grep kyverno
Kyverno Validate Rules
Validation policies check whether a resource matches desired constraints. They can run in Enforce mode (block violations) or Audit mode (report but allow).
# policy-disallow-privileged.yaml
# Block privileged containers in all namespaces (except system ones)
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged-containers
annotations:
policies.kyverno.io/title: Disallow Privileged Containers
policies.kyverno.io/category: Pod Security Standards
policies.kyverno.io/severity: high
policies.kyverno.io/subject: Pod
policies.kyverno.io/description: >-
Privileged containers bypass most kernel-level isolation.
This policy blocks all privileged containers across the cluster.
spec:
validationFailureAction: Enforce
background: true
rules:
- name: deny-privileged
match:
any:
- resources:
kinds:
- Pod
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
- istio-system
validate:
message: >-
Privileged mode is not allowed. Set
spec.containers[*].securityContext.privileged to false.
pattern:
spec:
containers:
- securityContext:
privileged: "false"
=(initContainers):
- securityContext:
privileged: "false"
=(ephemeralContainers):
- securityContext:
privileged: "false"
# policy-require-resource-limits.yaml
# Ensure every container specifies CPU and memory limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
annotations:
policies.kyverno.io/title: Require Resource Limits
policies.kyverno.io/category: Best Practices
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce
rules:
- name: check-limits
match:
any:
- resources:
kinds: [Pod]
validate:
message: "All containers must specify CPU and memory limits."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
Kyverno Mutate & Generate
Mutate policies inject or modify fields before the resource is persisted — perfect for adding defaults that teams would otherwise forget. Generate policies create companion resources automatically when a trigger resource appears.
# policy-add-default-limits.yaml
# Mutate: inject default resource limits if none specified
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-resource-limits
spec:
rules:
- name: set-memory-limits
match:
any:
- resources:
kinds: [Pod]
mutate:
patchStrategicMerge:
spec:
containers:
- (name): "*"
resources:
limits:
+(memory): "512Mi"
+(cpu): "500m"
requests:
+(memory): "128Mi"
+(cpu): "100m"
---
# policy-generate-networkpolicy.yaml
# Generate: create a default-deny NetworkPolicy for every new namespace
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-deny-netpol
spec:
rules:
- name: default-deny
match:
any:
- resources:
kinds: [Namespace]
exclude:
any:
- resources:
names: ["kube-*", "kyverno", "istio-system", "flux-system"]
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-all
namespace: "{{request.object.metadata.name}}"
synchronize: true # Keep in sync — recreate if deleted
data:
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Automated Tenant Onboarding with Kyverno Generate
A European bank uses a single Kyverno generate policy suite to handle the entire tenant onboarding workflow. When a new namespace is labelled tenant: true, Kyverno generates: (1) a default-deny NetworkPolicy, (2) a ResourceQuota based on the tier label, (3) a LimitRange, (4) an RBAC RoleBinding for the team's AD group, and (5) a ServiceMonitor for the observability stack. What previously required a 12-step Terraform module and two Jira tickets now happens in seconds — simply by creating a labelled namespace.
Image Verification
Both OPA Gatekeeper (via external data) and Kyverno (natively) can verify container image signatures and attestations. Kyverno's native verifyImages is the more ergonomic implementation — it checks cosign signatures directly without needing an external webhook.
# policy-verify-images.yaml
# Only allow images signed by the corporate signing key
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
annotations:
policies.kyverno.io/title: Verify Image Signatures
policies.kyverno.io/category: Supply Chain Security
policies.kyverno.io/severity: high
spec:
validationFailureAction: Enforce
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-signature
match:
any:
- resources:
kinds: [Pod]
verifyImages:
- imageReferences:
- "registry.corp.com/*"
- "ghcr.io/my-org/*"
attestors:
- count: 1
entries:
- keys:
publicKeys: |
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----
attestations:
- type: https://slsa.dev/provenance/v1
conditions:
- all:
- key: "{{ builder.id }}"
operator: Equals
value: "https://github.com/slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@refs/tags/v1.7.0"
- name: verify-sbom-attestation
match:
any:
- resources:
kinds: [Pod]
verifyImages:
- imageReferences:
- "registry.corp.com/*"
attestations:
- type: https://spdx.dev/Document
conditions:
- all:
- key: "{{ creationInfo.created }}"
operator: NotEquals
value: ""
OPA Gatekeeper vs Kyverno
| Dimension | OPA Gatekeeper | Kyverno |
|---|---|---|
| Policy language | Rego (purpose-built, logic-based) | Native Kubernetes YAML |
| Learning curve | Steep (new language) | Gentle (familiar YAML patterns) |
| Validation | Yes | Yes |
| Mutation | Yes (v3.7+) | Yes (first-class) |
| Generation | No (need separate controller) | Yes (native) |
| Image verification | Via external data provider | Native verifyImages |
| Audit/reporting | Built-in constraint status | PolicyReport CRD (standard) |
| CLI testing | opa test + gator | kyverno apply / kyverno test |
| Multi-cluster | Replicate CRDs via GitOps | Replicate CRDs via GitOps |
| Expressiveness ceiling | Very high (Rego is Turing-complete-ish) | High (CEL expressions + JMESPath) |
| CNCF status | Graduated (OPA) | Graduated |
| Best fit | Complex cross-resource rules; teams with Rego expertise | Teams preferring YAML; need mutation + generation |
Enterprise Patterns
Policy Library & GitOps Distribution
At enterprise scale, policies are managed as a Git repository of reusable templates distributed to every cluster via Flux or Argo CD. The pattern mirrors application GitOps: a policies/ directory per cluster, sourced from a central library.
# fleet-policies/base/kustomization.yaml
# Base policy library — shared across all clusters
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- disallow-privileged.yaml
- require-resource-limits.yaml
- require-labels.yaml
- allowed-registries.yaml
- default-deny-netpol-generate.yaml
- verify-image-signatures.yaml
- disallow-latest-tag.yaml
- require-probes.yaml
---
# fleet-policies/clusters/prod-eu/kustomization.yaml
# Production EU overlay — stricter enforcement
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- patch: |
- op: replace
path: /spec/validationFailureAction
value: Enforce
target:
kind: ClusterPolicy
annotationSelector: policies.kyverno.io/severity=high
---
# fleet-policies/clusters/dev/kustomization.yaml
# Dev overlay — audit only (don't block developers)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- patch: |
- op: replace
path: /spec/validationFailureAction
value: Audit
target:
kind: ClusterPolicy
Policy Exception Workflow
Every policy system needs an escape hatch. Kyverno v1.9 introduced PolicyException — a namespaced resource that grants specific workloads an exception from specific rules, with an audit trail.
# exception-legacy-payments.yaml
# Grant an exception for the legacy payments pod that requires privileged access
apiVersion: kyverno.io/v2
kind: PolicyException
metadata:
name: legacy-payments-privileged
namespace: payments
annotations:
exception.kyverno.io/reason: "Legacy PCI HSM driver requires privileged access. Migration to unprivileged driver tracked in JIRA-4521, ETA Q3 2026."
exception.kyverno.io/approved-by: "security-team@corp.com"
exception.kyverno.io/expires: "2026-09-30"
spec:
exceptions:
- policyName: disallow-privileged-containers
ruleNames:
- deny-privileged
match:
any:
- resources:
kinds: [Pod]
namespaces: [payments]
names: ["payments-hsm-*"]
Troubleshooting
OPA Gatekeeper
# Check constraint violations (audit mode findings)
kubectl get constraints
kubectl describe k8srequiredlabels require-team-label
# See what the audit controller found
kubectl get k8srequiredlabels require-team-label -o jsonpath='{.status.violations}' | jq .
# Test a resource against constraints locally
gator test -f my-deployment.yaml -f constraints/ -f templates/
# Check Gatekeeper controller logs
kubectl logs -n gatekeeper-system deploy/gatekeeper-controller-manager
# Verify webhook is registered
kubectl get validatingwebhookconfigurations | grep gatekeeper
Kyverno
# Check policy reports (which resources are violating)
kubectl get policyreport -A
kubectl get clusterpolicyreport
kubectl describe policyreport -n payments
# Test a policy against a resource locally (no cluster needed)
kyverno apply policy.yaml --resource deployment.yaml
# Run the full test suite
kyverno test ./policies/tests/
# Check admission controller logs
kubectl logs -n kyverno deploy/kyverno-admission-controller -f
# Why was my resource rejected?
kubectl get events -n payments --field-selector reason=PolicyViolation
# Force a background scan
kubectl annotate clusterpolicy require-resource-limits \
policies.kyverno.io/trigger=scan --overwrite
Common pitfalls:
- Policy blocks kube-system resources: Always exclude system namespaces. Blocking CoreDNS or kube-proxy is a cluster-breaking event.
- Webhook timeout (30s default): Complex Rego or image verification can exceed this. Increase
webhookTimeoutSecondsor use background scan for expensive checks. - Mutation ordering: When multiple mutate policies target the same resource, order matters. Kyverno processes policies alphabetically by name — prefix with numbers if ordering is critical (e.g.,
01-add-labels,02-add-limits). - Generate + synchronize loops: If a generated resource triggers another policy that modifies it, you get an infinite reconciliation loop. Use
preconditionsto break cycles. - False sense of security: Admission controllers only evaluate at create/update time. Resources that existed before the policy was installed remain non-compliant. Run background audits to catch pre-existing violations.