Introduction

When dealing with governance in Kubernetes clusters, organizations need mechanisms beyond authentication and authorization to enforce specific security and operational requirements. These policies might include:

  • Preventing root-level access across all pods
  • Blocking unauthorized operations in critical namespaces
  • Enforcing container registry restrictions
  • Automatically injecting security contexts
  • Validating resource configurations against compliance standards

The solution lies in Kubernetes Admission Controllers—components that intercept API requests before resource persistence, enabling policy enforcement at the cluster level.

In a recent enterprise project requiring comprehensive security policy implementation, we conducted a thorough evaluation of available approaches. This investigation revealed three viable paths: building custom admission controllers, leveraging OPA Gatekeeper’s Rego-based policies, or adopting Kyverno’s YAML-native approach.

While researching Kubernetes architecture, the power and flexibility of Admission Controllers became apparent as the foundation for robust cluster governance.

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to the persistence of the resource, but after the request is authenticated and authorized.

Kubernetes Admission Controllers - original image from kubernetes.io blog

Kubernetes Admission Controllers - original image from kubernetes.io blog

If we want to control any action in Kubernetes, we can rely on Admission Controllers to intercept requests, analyze them, and check if they comply with the rules. If the rules are met, the request passes; otherwise, it is blocked.

This investigation revealed three viable approaches for implementing Kubernetes governance:

  • Building custom admission controllers from scratch
  • Leveraging existing policy engines like OPA Gatekeeper and Kyverno
  • Hybrid approaches combining multiple solutions

This post provides a comprehensive comparison of these approaches, with practical implementation examples and production deployment guidance.

Implementing Our Own Admission Controllers

In this section, we’ll create two admission webhooks:

  1. A validating webhook that blocks all Pod operations in the “protected” namespace
  2. A mutating webhook that ensures all container images use our Azure Container Registry

Prerequisites

  • Go 1.21+
  • Docker
  • kubectl access to a Kubernetes cluster
  • openssl (for generating certificates)

Project Structure

First, let’s create our project structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mkdir custom-admission-webhook
cd custom-admission-webhook

# Initialize Go module
go mod init custom-admission-webhook

# Create directories
mkdir -p pkg/webhook
mkdir -p deployments/k8s
mkdir certs

Step 1: Creating the Webhook Server

First, let’s create our main webhook package in pkg/webhook/webhook.go:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
package webhook

import (
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "strings"

    admissionv1 "k8s.io/api/admission/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/runtime/serializer"
)

type WebhookServer struct {
    server *http.Server
}

// Webhook Server parameters
const (
    acrURL = "contoso.acr.io"
)

// Create a new WebhookServer instance
func NewWebhookServer(port int) *WebhookServer {
    return &WebhookServer{
        server: &http.Server{
            Addr: fmt.Sprintf(":%v", port),
        },
    }
}

// Start webhook server
func (ws *WebhookServer) Start() error {
    mux := http.NewServeMux()
    mux.HandleFunc("/validate", ws.validate)
    mux.HandleFunc("/mutate", ws.mutate)
    ws.server.Handler = mux

    return ws.server.ListenAndServeTLS(
        "/etc/webhook/certs/tls.crt",
        "/etc/webhook/certs/tls.key",
    )
}

// Validation webhook handler
func (ws *WebhookServer) validate(w http.ResponseWriter, r *http.Request) {
    admissionReview := ws.parseAdmissionReview(w, r)
    if admissionReview == nil {
        return
    }

    allowed := true
    var result *metav1.Status

    // Check if operation is in protected namespace
    if admissionReview.Request.Namespace == "protected" {
        allowed = false
        result = &metav1.Status{
            Message: "Operations on pods in 'protected' namespace are not allowed",
        }
    }

    admissionResponse := admissionv1.AdmissionResponse{
        Allowed: allowed,
        Result:  result,
    }
    
    admissionReview.Response = &admissionResponse
    ws.sendAdmissionResponse(w, admissionReview)
}

// Mutation webhook handler
func (ws *WebhookServer) mutate(w http.ResponseWriter, r *http.Request) {
    admissionReview := ws.parseAdmissionReview(w, r)
    if admissionReview == nil {
        return
    }

    var pod corev1.Pod
    if err := json.Unmarshal(admissionReview.Request.Object.Raw, &pod); err != nil {
        http.Error(w, "Could not unmarshal pod", http.StatusBadRequest)
        return
    }

    // Create patch operations
    var patches []map[string]string
    
    for i, container := range pod.Spec.Containers {
        newImage := ws.mutateImageName(container.Image)
        if newImage != container.Image {
            patch := map[string]string{
                "op":    "replace",
                "path":  fmt.Sprintf("/spec/containers/%d/image", i),
                "value": newImage,
            }
            patches = append(patches, patch)
        }
    }

    patchBytes, _ := json.Marshal(patches)
    
    admissionResponse := admissionv1.AdmissionResponse{
        Allowed: true,
        Patch:   patchBytes,
        PatchType: func() *admissionv1.PatchType {
            pt := admissionv1.PatchTypeJSONPatch
            return &pt
        }(),
    }

    admissionReview.Response = &admissionResponse
    ws.sendAdmissionResponse(w, admissionReview)
}

// Helper function to mutate image names
func (ws *WebhookServer) mutateImageName(image string) string {
    // If no registry specified, add ACR
    if !strings.Contains(image, "/") {
        return fmt.Sprintf("%s/%s", acrURL, image)
    }

    // If registry specified but not ACR, replace it
    parts := strings.SplitN(image, "/", 2)
    if !strings.Contains(parts[0], "acr.io") {
        return fmt.Sprintf("%s/%s", acrURL, parts[1])
    }

    return image
}

// Helper functions for parsing and sending admission reviews
func (ws *WebhookServer) parseAdmissionReview(w http.ResponseWriter, r *http.Request) *admissionv1.AdmissionReview {
    body, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Could not read request body", http.StatusBadRequest)
        return nil
    }

    admissionReview := &admissionv1.AdmissionReview{}
    if err := json.Unmarshal(body, admissionReview); err != nil {
        http.Error(w, "Could not parse admission review", http.StatusBadRequest)
        return nil
    }

    return admissionReview
}

func (ws *WebhookServer) sendAdmissionResponse(w http.ResponseWriter, admissionReview *admissionv1.AdmissionReview) {
    response, err := json.Marshal(admissionReview)
    if err != nil {
        http.Error(w, fmt.Sprintf("Could not marshal response: %v", err), http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "application/json")
    w.Write(response)
}

Step 2: Creating the Main Application

Create main.go in the root directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package main

import (
    "log"
    "custom-admission-webhook/pkg/webhook"
)

func main() {
    ws := webhook.NewWebhookServer(8443)
    log.Fatal(ws.Start())
}

Step 3: Generate TLS Certificates

Create a script gen-certs.sh to generate the required certificates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/bin/bash

# Generate CA key and certificate
openssl genrsa -out ca.key 2048
openssl req -new -x509 -days 365 -key ca.key -subj "/O=Admission Webhook Demo/CN=Admission Webhook Demo CA" -out ca.crt

# Generate server key and certificate signing request
openssl genrsa -out webhook-server-tls.key 2048
openssl req -new -key webhook-server-tls.key -subj "/O=Admission Webhook Demo/CN=admission-webhook.default.svc" -out webhook-server-tls.csr

# Sign the certificate
openssl x509 -req -days 365 -in webhook-server-tls.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out webhook-server-tls.crt

# Create k8s secret
kubectl create secret tls webhook-server-tls \
    --cert=webhook-server-tls.crt \
    --key=webhook-server-tls.key \
    --namespace=default

Step 4: Create Kubernetes Manifests

Create the deployment manifest in deployments/k8s/deployment.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: apps/v1
kind: Deployment
metadata:
  name: admission-webhook
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: admission-webhook
  template:
    metadata:
      labels:
        app: admission-webhook
    spec:
      containers:
      - name: webhook
        image: nebrass/admission-webhook:latest
        ports:
        - containerPort: 8443
        volumeMounts:
        - name: webhook-tls
          mountPath: /etc/webhook/certs
          readOnly: true
      volumes:
      - name: webhook-tls
        secret:
          secretName: webhook-server-tls
---
apiVersion: v1
kind: Service
metadata:
  name: admission-webhook
  namespace: default
spec:
  ports:
  - port: 443
    targetPort: 8443
  selector:
    app: admission-webhook

Create the webhook configurations in deployments/k8s/webhooks.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: pod-policy-webhook
webhooks:
- name: pod-policy.example.com
  admissionReviewVersions: ["v1"]
  sideEffects: None
  clientConfig:
    service:
      name: admission-webhook
      namespace: default
      path: "/validate"
    caBundle: ${CA_BUNDLE}
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE", "UPDATE"]
    resources: ["pods"]
    scope: "Namespaced"
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: image-mutating-webhook
webhooks:
- name: image-mutating.example.com
  admissionReviewVersions: ["v1"]
  sideEffects: None
  clientConfig:
    service:
      name: admission-webhook
      namespace: default
      path: "/mutate"
    caBundle: ${CA_BUNDLE}
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE"]
    resources: ["pods"]
    scope: "Namespaced"

Step 5: Build and Deploy

Create a Dockerfile:

1
2
3
4
5
6
7
8
9
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o webhook

FROM alpine:3.18
WORKDIR /app
COPY --from=builder /app/webhook .
ENTRYPOINT ["/app/webhook"]

Deploy everything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Generate certificates
./gen-certs.sh

# Build and push Docker image
docker build -t nebrass/admission-webhook:latest .
docker push nebrass/admission-webhook:latest

# Replace CA_BUNDLE in webhook configurations
export CA_BUNDLE=$(base64 < ca.crt | tr -d '\n')
# For Linux:
sed -i "s/\${CA_BUNDLE}/${CA_BUNDLE}/g" deployments/k8s/webhooks.yaml
# For macOS:
# sed -i '' "s/\${CA_BUNDLE}/${CA_BUNDLE}/g" deployments/k8s/webhooks.yaml

# Apply Kubernetes manifests
kubectl apply -f deployments/k8s/deployment.yaml
kubectl apply -f deployments/k8s/webhooks.yaml

Testing the Webhooks

Test the validating webhook:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# test-pod-protected.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: protected
spec:
  containers:
  - name: nginx
    image: nginx:latest
1
2
# This should be blocked
kubectl apply -f test-pod-protected.yaml

Test the mutating webhook:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# test-pod-mutation.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:latest
  - name: redis
    image: registry.example.com/redis:latest
1
2
3
# Apply and check the modified images
kubectl apply -f test-pod-mutation.yaml
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].image}'

The output should show both images now using the contoso.acr.io registry.


Using a Policy Engine

While the custom admission controller approach demonstrates the fundamental concepts, production deployments often benefit from purpose-built policy engines that provide additional features and operational benefits.

While building custom admission controllers gives you the most flexibility, it comes with significant development and maintenance overhead. Policy engines provide ready-to-use solutions that can be deployed directly onto your Kubernetes clusters. These engines handle the complexities of integrating with the Kubernetes admission controller system, maintaining webhook services, and providing a framework for defining and managing policies.

At their core, policy engines implement the Kubernetes admission webhook interface, but abstract away the complexities involved in writing, deploying, and maintaining custom webhooks. They typically provide:

  1. Policy Definition Framework: A structured way to define policies (rules)
  2. Policy Distribution: Methods to distribute policies across clusters
  3. Policy Enforcement: Integration with Kubernetes admission control
  4. Reporting: Visibility into policy violations and compliance
  5. Testing: Tools to validate policies before enforcement

Let’s explore two of the most popular policy engines: OPA Gatekeeper and Kyverno.


OPA Gatekeeper: The Legendary

As the first major policy engine for Kubernetes, OPA Gatekeeper has established itself as a mature solution with broad ecosystem support.

Open Policy Agent (OPA) Gatekeeper is a policy controller for Kubernetes that enforces policies defined using the Rego language. Gatekeeper combines the OPA policy engine with a specialized Kubernetes controller to provide a robust policy framework.

Architecture Deep Dive

Gatekeeper consists of several components:

  1. Webhook Server: Intercepts admission requests to the Kubernetes API server
  2. Controller Manager: Manages policy lifecycles and synchronization
  3. Audit: Periodically evaluates existing resources against policies
  4. Policy Engine: The core OPA engine that evaluates Rego policies

When a request hits the Kubernetes API server, it flows through these stages:

  1. Authentication → Authorization → Admission Control (Gatekeeper) → Object Schema Validation → Persistence

Gatekeeper integrates at the admission control stage, receiving the admission request, evaluating it against defined policies, and either allowing, denying, or modifying the request before it reaches persistence.

Understanding Rego

Rego is the policy language used by OPA. It’s a declarative language specifically designed for policy definitions, with roots in Datalog. Some key concepts:

  • Rules: Define policy decisions and evaluations
  • Documents: Structured data being evaluated
  • Packages: Group related rules
  • Imports: Include functionality from other packages
  • Virtual Documents: Computed values

Here’s a simple Rego policy that demonstrates its logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
package example

# Default deny
default allow = false

# Allow if conditions are met
allow {
    input.user == "admin"
    input.action == "read"
}

allow {
    input.user == "operator"
    input.resource == "metrics"
    input.action == "read"
}

This policy implements a default-deny approach, only allowing access when specific conditions are met. The allow rule is true if any of the defined conditions match, implementing an OR relationship between rule blocks.

Installing OPA Gatekeeper

Let’s walk through the installation process of Gatekeeper in detail:

1
2
3
4
# Install Gatekeeper using Helm
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
helm install gatekeeper gatekeeper/gatekeeper --namespace gatekeeper-system --create-namespace

Under the hood, this Helm chart deploys:

  1. A validating webhook configuration
  2. The Gatekeeper controller deployment
  3. Required CRDs for constraint templates and constraints
  4. RBAC permissions for Gatekeeper to access Kubernetes resources

You can verify the installation by checking the deployed pods:

1
kubectl get pods -n gatekeeper-system

To understand what’s happening behind the scenes, let’s examine the validating webhook configuration:

1
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml

This output shows how Gatekeeper registers itself to intercept API requests for evaluation.

Creating a Constraint Template

Gatekeeper’s policy system uses a two-tier approach that separates policy logic from implementation:

In Gatekeeper, policies are defined using two custom resources that work together:

  1. ConstraintTemplate: Defines the policy logic in Rego
  2. Constraint: Instance of a template with specific parameters

The template defines the schema and the Rego code that implements the policy logic. Think of it as a class definition in programming, while constraints are instances of that class.

Let’s create a template that enforces our ACR policy with detailed explanations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sacrregistry
spec:
  crd:
    # This section defines the CRD that will be created
    spec:
      names:
        kind: K8sAcrRegistry  # This defines the kind that constraints will use
      validation:
        # OpenAPI schema for the parameters
        openAPIV3Schema:
          properties:
            registry:
              type: string  # Constraints will provide a string parameter named 'registry'
  targets:
    # This identifies what this template targets
    - target: admission.k8s.gatekeeper.sh
      # The Rego policy
      rego: |
        package k8sacrregistry
        
        # Define a violation if container images don't use the required registry
        violation[{"msg": msg}] {
          # Extract container from input
          container := input.review.object.spec.containers[_]
          # Check if the image starts with the required registry
          not startswith(container.image, input.parameters.registry)
          # Construct the violation message
          msg := sprintf("Container image '%v' doesn't use the required registry '%v'", [container.image, input.parameters.registry])
        }
        
        # Repeat the same check for initContainers
        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          not startswith(container.image, input.parameters.registry)
          msg := sprintf("InitContainer image '%v' doesn't use the required registry '%v'", [container.image, input.parameters.registry])
        }

A Rego policy works by defining a violation rule that will be non-empty if the policy is violated. The violation rule outputs a set of objects with a msg field describing each violation. In this case, we’re checking if container images start with the specified registry.

The key parts of this policy:

  1. Package Declaration: package k8sacrregistry names the policy
  2. Rule Definition: violation[{"msg": msg}] defines the rule structure
  3. Variable Binding: container := input.review.object.spec.containers[_] iterates through containers
  4. Condition: not startswith(container.image, input.parameters.registry) defines when there’s a violation
  5. Message Formatting: sprintf() creates a human-readable error message

The underscore in containers[_] is a special Rego syntax that iterates through all elements in the array.

Creating a Constraint

Now, let’s apply the template with specific parameters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAcrRegistry
metadata:
  name: require-acr-registry
spec:
  match:
    # Define which resources this constraint applies to
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    # Apply only to these namespaces
    namespaces:
      - "default"
      - "production"
    # You can also exclude namespaces or other resources
    excludedNamespaces:
      - "kube-system"
  parameters:
    # The registry parameter defined in the template
    registry: "contoso.acr.io"

This constraint applies the K8sAcrRegistry template (which we defined earlier) only to Pod resources in the “default” and “production” namespaces, with an exception for the “kube-system” namespace. The parameters section provides the registry value that will be passed to the template as input.parameters.registry.

Debugging Gatekeeper Policies

When a policy doesn’t work as expected, you can troubleshoot by:

  1. Checking the constraint status:

    1
    
    kubectl get K8sAcrRegistry require-acr-registry -o yaml
    

    This shows the current state, including any violations.

  2. Examining Gatekeeper logs:

    1
    
    kubectl logs -n gatekeeper-system -l control-plane=controller-manager
    
  3. Testing policies with the OPA Playground (https://play.openpolicyagent.org/) before deploying.


Kyverno

As a newer entrant to the policy engine space, Kyverno takes a fundamentally different approach by embracing Kubernetes-native YAML instead of domain-specific languages.

Kyverno is a policy engine specifically designed for Kubernetes. Unlike OPA Gatekeeper, Kyverno uses a YAML-based policy language that aligns with Kubernetes’ native resource definitions. This design choice makes Kyverno particularly accessible to teams already familiar with Kubernetes manifests.

Kyverno Architecture

Kyverno is implemented as a Kubernetes dynamic admission controller, consisting of several components:

  1. Webhook Server: Registers with the Kubernetes API server to intercept admission requests
  2. Policy Controller: Manages policy lifecycle and status reporting
  3. Background Scanner: Periodically scans existing resources for policy violations
  4. Report Controller: Generates policy reports
  5. Webhooks: Two main webhooks - validating and mutating

The workflow when a request comes in:

  1. Kubernetes API server receives a request
  2. API server forwards the request to the Kyverno webhook server
  3. Kyverno evaluates the request against applicable policies
  4. Based on the policy type (validate, mutate, generate), Kyverno takes appropriate action
  5. The response is sent back to the API server for further processing

Policy Structure Deep Dive

A Kyverno policy consists of several components:

  1. Metadata: Standard Kubernetes resource metadata
  2. Spec: Contains policy configuration
    • Rules: The core of the policy, defining what actions to take
    • ValidationFailureAction: How to handle validation failures (enforce/audit)
    • Background: Whether to apply to existing resources
    • FailurePolicy: How to handle webhook failures

Each rule within a policy can have multiple components:

  • Match/Exclude: Define resources the rule applies to
  • Validate: Define validation rules
  • Mutate: Define mutation rules
  • Generate: Define resources to generate
  • VerifyImages: Define image verification rules

The pattern language used in Kyverno policies is designed to match the structure of Kubernetes resources, making it intuitive to define rules that target specific fields and values.

Installing Kyverno

Let’s install Kyverno with a detailed understanding of what’s happening:

1
2
3
4
5
# Install Kyverno using Helm
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
# Install with default configuration
helm install kyverno kyverno/kyverno --namespace kyverno --create-namespace

For production deployments, use custom Helm values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# values-production.yaml
replicaCount: 3

resources:
  limits:
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 256Mi

config:
  webhooks:
    - namespaceSelector:
        matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values:
          - kube-system
          - kyverno

# Enable high availability
antiAffinity:
  enable: true

# Configure webhook timeouts
webhookAnnotations:
  admissionregistration.k8s.io/ca-injection-retry-count: "5"

# Deploy with production values
# helm install kyverno kyverno/kyverno -f values-production.yaml --namespace kyverno --create-namespace

This command installs:

  • The Kyverno deployment with the webhook server and controllers
  • Required CRDs for policies and reports
  • Service accounts and RBAC permissions
  • Webhook configurations

You can verify the installation:

1
2
3
4
5
6
7
8
# Check pods
kubectl get pods -n kyverno

# Check CRDs
kubectl get crds | grep kyverno

# Check webhook configurations
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations | grep kyverno

Policy Types and JMESPath Expressions

Kyverno policies support three main action types:

  1. Validate: Check if resources meet specific criteria
  2. Mutate: Modify resources to comply with requirements
  3. Generate: Create additional resources based on triggers

Kyverno uses a combination of pattern matching and JMESPath expressions for complex policy definitions. JMESPath is a query language for JSON that allows you to extract and transform elements from JSON documents. In Kyverno, JMESPath enables complex conditional logic and data manipulation.

For example:

1
2
3
4
5
6
7
validate:
  message: "Deployment must have at least 3 replicas"
  deny:
    conditions:
      - key: "{{ request.object.spec.replicas || '0' }}"
        operator: Less
        value: 3

In this example, {{ request.object.spec.replicas || '0' }} is a JMESPath expression that:

  1. Accesses the replicas field from the request object
  2. Provides a default value of ‘0’ if the field is not present
  3. The condition then checks if this value is less than 3

Testing Policies with Kyverno CLI

Before deploying policies to your cluster, you can test them locally using the Kyverno CLI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Install Kyverno CLI
# macOS
brew install kyverno

# Linux
curl -L https://github.com/kyverno/kyverno/releases/latest/download/kyverno-cli_linux_x86_64.tar.gz | tar -xz
sudo mv kyverno /usr/local/bin/

# Test a policy against resource manifests
kyverno test --policy policy.yaml --resource resource.yaml

# Apply policies in dry-run mode
kyverno apply policy.yaml --resource resource.yaml

# Validate policy syntax
kyverno validate policy.yaml

The CLI is particularly useful for CI/CD integration:

1
2
3
4
5
# .github/workflows/policy-test.yml
- name: Test Kyverno Policies
  run: |
    kyverno test --policy policies/ --resource test-resources/
    kyverno validate policies/

Creating a Simple Policy

Let’s create a policy to enforce our ACR registry requirement with a detailed explanation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-acr-registry
spec:
  # Define how validation failures are handled
  validationFailureAction: enforce  # 'enforce' blocks non-compliant resources, 'audit' only reports
  # Apply to existing resources
  background: true  # Check existing resources, not just new ones
  rules:
  - name: check-registry
    # Define which resources this rule applies to
    match:
      resources:
        kinds:
        - Pod
    # Define what to validate
    validate:
      # Message shown if validation fails
      message: "Image must use the contoso.acr.io registry"
      # Pattern to match against the resource
      pattern:
        spec:
          containers:
          - image: "contoso.acr.io/*"  # All images must start with this registry

This policy uses Kyverno’s pattern matching to require that all container images use the contoso.acr.io registry. The pattern works by checking if the actual resource matches the specified pattern. The asterisk (*) is a wildcard that matches any string, so contoso.acr.io/* matches any image that starts with contoso.acr.io/.

Advanced Pattern Matching

Kyverno supports sophisticated pattern matching with wildcards, logical operators, and negation:

1
2
3
4
5
6
7
8
validate:
  pattern:
    spec:
      containers:
      - name: "*"  # Match any container name
        image: "contoso.acr.io/*"  # Must use ACR
        !(securityContext):
          runAsUser: 0  # Block if securityContext.runAsUser is 0 (root)

The above pattern:

  1. Matches any container (name: "*")
  2. Requires the image to use contoso.acr.io
  3. Uses negation (!) to check that if securityContext is present, it doesn’t run as root

Kyverno also supports logical operators in its pattern matching:

1
2
3
4
5
validate:
  pattern:
    spec:
      (containers):
        - (image): "contoso.acr.io/*" | "microsoft/*"  # Must use ACR OR Microsoft registry

The | operator represents a logical OR, requiring images to come from either contoso.acr.io or microsoft.


Gatekeeper vs Kyverno: A Technical Deep Dive

With both solutions now examined in detail, let’s compare their technical capabilities and operational characteristics:

Both OPA Gatekeeper and Kyverno are excellent policy engines, but understanding their fundamental differences can help you choose the right tool for your requirements. Let’s compare them across multiple dimensions.

The YAML philosophy vs domain-specific languages

YAML-Native Policy Development

Kyverno’s fundamental design philosophy centers on leveraging existing Kubernetes knowledge rather than introducing new languages. While OPA Gatekeeper requires learning Rego—a Prolog-inspired declarative language—Kyverno policies are written as standard Kubernetes YAML resources.

This approach dramatically reduces the learning curve for platform engineers already familiar with Kubernetes manifests. Teams can immediately begin writing effective policies without investing time in learning domain-specific languages.

Consider a policy requiring specific labels on pods. In Kyverno, this is expressed naturally:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-app-label
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Label 'app.kubernetes.io/name' is required"
      pattern:
        metadata:
          labels:
            app.kubernetes.io/name: "?*"

The same policy in Gatekeeper requires both a ConstraintTemplate defining the Rego logic and a Constraint resource applying it—significantly more complex for the same outcome.

Kyverno extends this simplicity with JMESPath expressions and over 50 custom functions for advanced logic, including:

  • Time operations and date calculations
  • Arithmetic functions and comparisons
  • String manipulation and pattern matching
  • External data integration
  • Cryptographic operations

This provides sufficient expressiveness for most policy requirements while maintaining readability and accessibility.


Advanced Kyverno Capabilities

Beyond basic validation, Kyverno offers sophisticated features that address real-world governance challenges in enterprise environments.

Mutation policies with surgical precision

Kyverno’s mutation capabilities go beyond simple field additions. The engine supports strategic merge patches with conditional anchors, enabling “if-then” logic that only modifies resources when specific conditions are met. The +() anchor notation adds fields only if they don’t exist, while =() ensures equality and >() performs existence checks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-security-defaults
spec:
  rules:
  - name: set-security-context
    match:
      any:
      - resources:
          kinds:
          - Pod
    mutate:
      patchStrategicMerge:
        spec:
          securityContext:
            +(runAsNonRoot): true
            +(fsGroup): 2000
          containers:
          - name: "*"
            securityContext:
              +(allowPrivilegeEscalation): false
              +(readOnlyRootFilesystem): true

This policy adds security defaults without overwriting existing configurations, essential for gradual security hardening in production environments.

Resource generation and synchronization

Generation policies create new resources triggered by cluster events, maintaining relationships between resources automatically. A common pattern generates NetworkPolicies for new namespaces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-network-policy
spec:
  rules:
  - name: default-deny
    match:
      any:
      - resources:
          kinds:
          - Namespace
    exclude:
      any:
      - resources:
          namespaces:
          - kube-system
          - kyverno
    generate:
      synchronize: true
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      name: default-deny
      namespace: "{{request.object.metadata.name}}"
      data:
        spec:
          podSelector: {}
          policyTypes:
          - Ingress
          - Egress

The synchronize: true flag ensures generated resources remain in sync with the policy definition, preventing drift and unauthorized modifications.

Supply chain security with image verification

Kyverno natively integrates with Sigstore Cosign and Notary for cryptographic image verification, enforcing supply chain security at admission time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-images
spec:
  rules:
  - name: check-signature
    match:
      any:
      - resources:
          kinds:
          - Pod
    verifyImages:
    - type: Cosign
      imageReferences:
      - "registry.io/production/*"
      attestors:
      - entries:
        - keyless:
            subject: "https://github.com/myorg/app/.github/workflows/release.yaml@refs/tags/*"
            issuer: "https://token.actions.githubusercontent.com"
      attestations:
      - type: "vuln/trivy"
        conditions:
        - all:
          - key: "{{ time_since('', '{{metadata.scanCompletedOn}}', '') }}"
            operator: LessThanOrEquals
            value: "24h"

This policy verifies both image signatures and vulnerability scan attestations, ensuring only recently scanned, signed images run in production.

Policy exceptions for real-world flexibility

PolicyException resources provide declarative exemptions without modifying core policies, essential for handling edge cases in production:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
  name: legacy-app-exception
  namespace: production
spec:
  exceptions:
  - policyName: require-security-context
    ruleNames:
    - set-security-context
  match:
    any:
    - resources:
        kinds:
        - Pod
        names:
        - legacy-app-*

Performance Benchmarks and Production Metrics

Kyverno Performance Data

Recent Kyverno optimizations demonstrate significant performance improvements. Version 1.12 achieves average latency of 15.52ms for admission webhooks under moderate load, representing an 8X improvement from 127.95ms in earlier versions. These optimizations include:

  • Migration from JSON marshaling to in-memory Golang maps
  • Adoption of jsoniter for faster JSON processing
  • Dynamic webhook configuration reducing unnecessary API calls
  • Optimized policy matching algorithms

Stress Testing Results (500 virtual users, 10,000 iterations):

  • Average latency: <200ms with three replicas
  • Memory usage: Maximum 471Mi per pod
  • CPU usage: Peak 4.8 cores across all replicas
  • Throughput: >5,000 admission requests/second

Comparative Analysis

Resource efficiency comparisons show Kyverno generally uses less memory and CPU than Gatekeeper for equivalent policy complexity:

MetricKyvernoOPA Gatekeeper
Average Memory/Pod256Mi384Mi
Admission Latency15.52ms28-45ms
CPU per 1K requests0.1 cores0.15 cores
Scaling ModelStatelessRequires inventory

Sources: Kubernetes SIG-Auth benchmarks, Nirmata performance testing, community user reports

Enterprise Scaling Characteristics

Kyverno’s architecture provides distinct advantages for large-scale deployments:

While Gatekeeper requires syncing all cluster information into memory for its inventory system, Kyverno’s stateless admission processing scales more efficiently. The dynamic webhook configuration means Kyverno only processes resources matching active policies, reducing unnecessary overhead.

Large-scale deployment benefits:

  • Reports Server offloads policy reports from etcd
  • Support for >10,000 policy reports with minimal API server impact
  • Independent background processing with configurable worker pools
  • Cleanup controller operates without blocking admission operations
  • Horizontal scaling of admission controllers without leader election

Production deployment examples:

  • Adevinta: 50+ clusters, 15,000+ nodes, reports 40% resource efficiency improvement
  • DoD Big Bang: Multi-cluster federation with 200+ policies
  • Compass Platform: Migration from Gatekeeper showing 60% reduction in policy management overhead

Kyverno vs Gatekeeper: technical decision matrix

The choice between Kyverno and OPA Gatekeeper depends on specific organizational needs and technical requirements. Here’s a comprehensive comparison:

AspectKyvernoOPA Gatekeeper
Learning CurveMinimal - uses YAML/Kubernetes patternsSteep - requires learning Rego language
Policy ComplexityHandles standard to moderate complexity wellExcels at highly complex computational logic
Mutation SupportFull-featured with strategic merge and JSON patchesBeta support with limitations
Resource GenerationNative support with synchronizationNot available
Image VerificationBuilt-in Cosign/Notary integrationRequires external tooling
Policy ExceptionsNative PolicyException resourcesManual policy modification required
Performance15.52ms avg latency, 256Mi memory28-45ms avg latency, 384Mi memory
Scaling ModelStateless, dynamic webhook configurationInventory-based, higher memory requirements
Multi-platformKubernetes-focusedPart of broader OPA ecosystem
Enterprise SupportNirmata Enterprise for KyvernoMultiple vendors (Styra, Red Hat, Google)
Community280+ policies, 5k+ GitHub stars3k+ GitHub stars, broader OPA ecosystem

When to Choose Kyverno

Kyverno is the optimal choice for most Kubernetes-focused organizations, particularly those prioritizing rapid deployment and team productivity.

Kyverno excels when your organization prioritizes rapid policy implementation with minimal training overhead. Teams already proficient in Kubernetes YAML can immediately write effective policies without learning new languages. Kyverno is particularly strong for organizations requiring comprehensive mutation capabilities, automatic resource generation, or native image verification.

Real-world migrations from Gatekeeper to Kyverno, such as Adevinta’s transition, cite improved resource efficiency, better mutation capabilities, and enhanced team productivity as key benefits. The platform’s native Kubernetes integration means policies work seamlessly with existing GitOps workflows, and the extensive library of 280+ community policies accelerates adoption.

Choose Kyverno for standard Kubernetes governance needs including Pod Security Standards enforcement, resource quota management, automatic sidecar injection, and supply chain security. The built-in policy exceptions mechanism handles edge cases elegantly without compromising security baselines.


When Gatekeeper Might Be Better

Despite Kyverno’s advantages, specific scenarios may warrant choosing OPA Gatekeeper:

Gatekeeper remains relevant for organizations with existing OPA investments across multiple platforms. If you’re already using OPA for Terraform validation, API gateway policies, or application-level authorization, maintaining language consistency with Rego provides operational benefits.

Complex scenarios requiring sophisticated computational logic—such as cost optimization algorithms, graph traversal for RBAC validation, or integration with machine learning models—benefit from Rego’s programming capabilities. Financial services organizations implementing complex compliance calculations or multi-step validation processes may find Gatekeeper’s expressiveness necessary.

Organizations with dedicated policy engineering teams who can invest in Rego expertise may prefer Gatekeeper’s power and flexibility, especially when policies need to work across Kubernetes, cloud APIs, and application layers uniformly.


Production Deployment Best Practices

Successful policy engine deployments require careful attention to high availability, performance, and operational considerations.

High Availability Architecture

For production deployments, implement these high availability practices:

Replica Configuration: Deploy Kyverno with minimum three replicas for the admission controller to ensure availability and distribute load. The admission controller processes webhooks in parallel without leader election, while background and reports controllers use leader election for consistency.

Resource Allocation: Configure appropriate resource limits based on your cluster size and policy complexity:

1
2
3
4
5
6
7
resources:
  limits:
    memory: 512Mi  # Increase from default 384Mi for high load
    # Avoid CPU limits for better performance
  requests:
    memory: 256Mi
    cpu: 100m

Webhook configuration and failure handling

Configure webhook failure policies based on criticality. Security-critical policies should use failurePolicy: Fail to block non-compliant resources, while policies in audit mode can use failurePolicy: Ignore to prevent disruption:

1
2
3
4
5
6
apiVersion: kyverno.io/v1
kind: ClusterPolicy
spec:
  failurePolicy: Fail  # Secure by default
  webhookConfiguration:
    timeoutSeconds: 10  # Adjust based on policy complexity

Namespace exclusions for system stability

Exclude critical system namespaces to prevent cluster lockout scenarios:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
config:
  webhooks:
    namespaceSelector:
      matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values:
        - kube-system
        - kyverno
        - calico-system

Monitoring and observability

Implement comprehensive monitoring using Prometheus metrics and Grafana dashboards. Key metrics include kyverno_admission_requests_total for request volume, kyverno_admission_review_duration_seconds for latency tracking, and kyverno_policy_results_total for policy evaluation outcomes. Enable distributed tracing with OpenTelemetry for complex policy debugging in production.

Policy optimization strategies

Optimize policy performance by avoiding wildcard matches, specifying exact operations needed, and structuring match logic with simple comparisons first:

1
2
3
4
5
6
7
# Optimized matching
match:
  any:
  - resources:
      kinds: ["Deployment", "StatefulSet"]  # Specific kinds
      namespaces: ["production"]  # Specific namespaces
      operations: ["CREATE", "UPDATE"]  # Only needed operations

Migration strategies

Organizations moving from Gatekeeper to Kyverno should adopt a phased approach. Deploy Kyverno alongside Gatekeeper initially, with new policies in audit mode. Kyverno provides mapping guides for over 50 common Gatekeeper policies, simplifying translation. The Compass platform successfully migrated by running both engines in parallel during transition, gradually moving policies while monitoring performance and compliance metrics.

For enterprises requiring both tools, namespace segmentation works well—use Kyverno for standard governance and Gatekeeper for complex computational policies. The DoD’s Big Bang project demonstrates successful parallel operation with careful webhook management and monitoring.

Recommendations for different scenarios

For most Kubernetes-focused organizations, Kyverno provides the best balance of functionality, performance, and ease of use. Its YAML-based approach, comprehensive feature set including mutation and generation, and native Kubernetes integration make it ideal for teams seeking rapid policy implementation without extensive training.

For organizations with complex policy logic requirements spanning multiple platforms, Gatekeeper’s Rego language and OPA ecosystem integration may justify the additional complexity. Evaluate whether the computational requirements truly exceed Kyverno’s capabilities before committing to the steeper learning curve.

For greenfield deployments, start with Kyverno unless you have specific requirements for Rego’s computational capabilities. The extensive community policy library, superior documentation, and lower operational overhead provide faster time-to-value.

For existing Gatekeeper users, evaluate migration if you’re experiencing challenges with mutation capabilities, resource efficiency, or team adoption. Kyverno’s growing ecosystem and active development make it an increasingly attractive alternative for Kubernetes-native policy management.

Troubleshooting Common Issues

Policy Not Working as Expected

Symptoms: Policies appear to be applied but don’t block or modify resources as intended.

Common Causes and Solutions:

  1. Incorrect resource matching:

    1
    2
    3
    
    # Check policy status
    kubectl get cpol policy-name -o yaml
    # Look for conditions and status fields
    
  2. Webhook configuration issues:

    1
    2
    3
    4
    
    # Verify webhook registration
    kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations
    # Check Kyverno pods status
    kubectl get pods -n kyverno
    
  3. Policy exceptions overriding rules:

    1
    2
    
    # List all policy exceptions
    kubectl get pol-ex --all-namespaces
    

Performance Issues

High webhook latency symptoms:

  • Slow resource creation/updates
  • Timeout errors in kubectl commands
  • High CPU usage in Kyverno pods

Optimization strategies:

  1. Increase resource limits:

    1
    2
    3
    4
    
    resources:
      limits:
        memory: 1Gi
        cpu: 1000m
    
  2. Optimize policy matching:

    1
    2
    3
    4
    5
    6
    
    # Be specific about resource types and operations
    match:
      any:
      - resources:
          kinds: ["Deployment"]  # Instead of ["*"]
          operations: ["CREATE"]  # Instead of ["*"]
    
  3. Enable policy profiling:

    1
    2
    
    # Check admission review duration metrics
    kubectl top pods -n kyverno
    

Image Verification Failures

Common issues with verifyImages policies:

  1. Certificate or keyless verification problems:

    1
    2
    
    # Check Kyverno logs for specific verification errors
    kubectl logs -n kyverno deployment/kyverno-admission-controller
    
  2. Network connectivity to registries:

    • Ensure Kyverno pods can reach container registries
    • Check for proxy configurations if needed
  3. Missing attestations:

    • Verify images have required signatures/attestations
    • Check issuer and subject configurations

Debugging Policy Logic

  1. Use kubectl explain for CRD schemas:

    1
    
    kubectl explain clusterpolicy.spec.rules.validate.pattern
    
  2. Test policies with audit mode first:

    1
    2
    
    spec:
      validationFailureAction: Audit  # Use before Enforce
    
  3. Check policy reports for violations:

    1
    
    kubectl get policyreports,clusterpolicyreports
    

Resource Generation Issues

When generate policies don’t create resources:

  1. Check generation controller logs:

    1
    
    kubectl logs -n kyverno deployment/kyverno-background-controller
    
  2. Verify RBAC permissions:

    1
    2
    
    # Ensure Kyverno has permissions to create target resources
    kubectl auth can-i create networkpolicies --as=system:serviceaccount:kyverno:kyverno-background-controller
    
  3. Review UpdateRequest resources:

    1
    
    kubectl get updaterequests -n kyverno
    

Additional Resources

Official Documentation

Community Resources

Tools and Integrations

Conclusion

Implementing robust governance policies in Kubernetes is essential for production deployments. This comprehensive comparison of custom admission controllers, OPA Gatekeeper, and Kyverno reveals clear patterns for different organizational needs.

Kyverno emerges as the optimal choice for most Kubernetes-focused organizations. Its YAML-native approach eliminates the learning curve associated with domain-specific languages, while comprehensive features including mutation, generation, and image verification address real-world governance requirements. Performance benchmarks showing 15.52ms average latency and superior resource efficiency make it production-ready for enterprise deployments.

OPA Gatekeeper remains relevant for specific scenarios requiring complex computational logic or multi-platform policy consistency. Organizations with existing OPA investments or policies requiring sophisticated algorithmic decisions may justify the additional complexity of Rego.

Custom admission controllers provide maximum flexibility but require significant development and maintenance overhead. They’re best suited for highly specialized requirements that exceed the capabilities of policy engines.

The evidence from production deployments, performance benchmarks, and user adoption strongly indicates that Kyverno provides superior value for most Kubernetes policy management use cases. Its intuitive YAML-based approach, comprehensive functionality, and excellent performance characteristics make it the pragmatic choice for modern Kubernetes environments seeking robust, maintainable governance solutions.