Published on

A kubernetes mutating webhook that overrides CPU requests

Authors
  • avatar
    Name
    Christophe Schmitz
    Twitter

TL;DR

If you work with Kubernetes, one day, you will have a use case where you need a way to validate or mutate a k8s resource sent to k8s api, before it gets ingested by k8s. In my case, I want to override all pod cpu requests for the needs of my underspeced OpenShift “lab” cluster. If you have a need to validate, or mutate a k8s resource on the fly, read this post, it will debunk a few assumptions, and give you what you need to get started. Otherwise, just bookmark this blog post and come back when you are ready.

Table of Contents

Key concepts:

What are admission controllers?

Quoting the k8s documentation, “An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. Admission controllers may be validating, mutating, or both.”

Sounds cool.

I like to think of a validating admission controller like a fancy night club bouncer who checks the kind of shoes you are wearing and decides to let you in, or not.

A bouncer refusing entry to a dude in flip flop. What was he thinking?

Wearing flip flop (we call them thongs in Australia, how weird) would get you rejected, in other words, not validated.

I like to think of a mutating admission controller like the same bouncer giving hats to customers that want to join a hat-only party but didn’t bring their hat.

A bouncer adding a hat to customer before they get in a hat-only party

Those customers get mutated with a red hat, before they get in. Now they look cool and ready to rock the party.

Going back to the first bouncer, If you paid close attention to the flip flop image, and I know you did, you would have noticed that someone inside the club is already wearing flip flop, right?

While this can be blamed on Dall-E 2 that failed to follow my prompt to the letter, this is also the opportunity for me to remind you: Admission webhook don't care about what's already inside your cluster, it only looks at what resources want to get in. So if you enable a webhook a time t1, everything that was created before t1 remains. For my cpu requests webhook, that means I will have to rolling restart some pods if I want to get their cpu requests overwritten.

Now, it’s unlikely someone (you, me) would ever write an admission controller: admission controllers are written in GO, and compiled with the kube-apiserver. You may activate / desactive the admission controllers availables. The full list is available here https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#what-does-each-admission-controller-do Among this list, two are very important for us today: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. Those controller calls mutating or validating webhook, and that webhook is what we are going to write. It's easy, even I can do it, probably because I don't need to write it in Go language.

What are admission webhook?

Admission webhook are simple custom HTTP callback that are triggered by the Kubernetes API server when handling API requests. In this blog, I will write a mutating admission webhook in python, that mutates the CPU requests of pods. Unlike admission controllers, that are part of the kube-apiserver, and written in Go language, admission webhook can be written in any language, and are pretty straightforward. They just need to provide an API endpoint, and manipulate jsons. Too easy!.

The requests received on the /mutate/ endpoint is a JSON AdmissionReview. For a pod creation, it may look like:

{
  "kind": "AdmissionReview",
  "apiVersion": "admission.k8s.io/v1beta1",
  "request": {
    "uid": "0df28fbd-5f5f-11e8-bc74-36e6bb280816",
    "kind": {
      "group": "",
      "version": "v1",
      "kind": "Pod"
    },
    "namespace": "dummy",
    "operation": "CREATE",
    //...
    "object": { // the pod object
      "metadata": {
         //...
      },
      "spec": {
        // the pod spec, potentially defining some CPU requests
      },
      "status": {}
    },
  }
}

The response may look like

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "0df28fbd-5f5f-11e8-bc74-36e6bb280816",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "some-base64-encoded-patch-array"
  }
}

where the patch is some base64 encoded patch array that could look like:

[
  {
    "op": "add",
    "path": "/spec/containers/0/resources/requests",
    "value": {}
  },
  {
    "op": "add",
    "path": "/spec/containers/0/resources/requests/cpu",
    "value": "0.001"
  },
  {
    "op": "replace",
    "path": "/spec/containers/1/resources/requests/cpu",
    "value": "0.001"
  }
]

Why writing a CPU request mutating webhook to set those CPU requests low?

I am running a small OpenShift Lab cluster, with limited CPU and Memory. Everytime I install an Operator, such as GitOps, ACS, ACM, those operators request a certain amount of CPU and Memory for some of it's pod. The request is a guaranteed amount of resource for the pod, weither it's actually being used or not. There is a very good reason why those requests (and limits) are being made. It guarantees that the pods will function correctly under moderate or heavy loads. However, say I have 16 CPU, and after installing a few Operators, 16 CPU are already requested, there is no more resource available to schedule new workload with CPU requests, even if the cluster is actually idle, with say just 1 CPU being used. It's very quickly limiting for a "Lab" environment, which is why I wrote this admission webhook. In doing so, I am aware I am responsable of closely monitoring CPU usages. Clearly, the mutating webhook will need to be configured to ignore certain critical namespaces to protect critical resources such as etcd, api-server.

Why not edit the deployments, statefullsets, and deamonsets instead of the pod?

Quite often, those resources are controlled by parent resources, and editing them may not stick. On the other hand, pods are usually indirectly created by deployments, statefullsets, and deamonsets. Editing pods on the fly with a mutating webhook is perfectly fine. In fact, it's a common strategy used to inject sidecar containers in pods, as done by the Istio project for example, that is until this project gets rid of sidecars injection in an upcoming version, but you got the idea.

Why not use the ClusterResourceOverride operator?

This is a great operator that can mutate the ratio between limits and requests. However, I needed something a bit more aggressive. Besides, I anticipate I will need fine grain control at a later stage, and the ClusterResourceOverride only lets specify one override resource that will be applied to selected namespaces. But different namespaces can't have different override resources. Overall, I'll need more control.

How about memory?

Unlike CPU, if a pod doesn't have the minimum requirement memory allocated, it may run, but later run out of memory and be killed by the OOM killer. Think for example about a java app with it's heap size requirements. For those reasons, it's much more dangerous to rewrite memory requests on the fly. I will probably do it at some points, but I would need to be able to override memory request differently for various containers.

Why not just remove the CPU requests of the pod?

It's probably valid. I guess by setting the request to 0.001, I'll be able to infer that any container with that value has been mutated by my webhook.

Writing and deploying the webhook in 5 easy steps.

  1. Write the http api server. Very simple in Python, choose your prefered languages
  2. Containerize this app. Simple too.
  3. Write and deploy the deployment / services. Simple
  4. Get some certificates. Those can be injected, so very easy
  5. Configure the webhook with a ValidatingWebhookConfiguration / MutatingWebhookConfiguration

Let's get started.

1. Write the http api server.

I am using python because I am not very smart. Bright people will of course use Clojure for an added challenge. The point is, it doesn't have to be written in Go, unlike what some blog post out there seems to imply. The code should listen to POST requests, for example on the /mutate path, parse the admission, and return an ... admissionReview that now includes some patch statements. I am using the pydantic python library to parse the json requests, and the fastAPI to manage the API requests. Finally, uvicorn is used as an embedded webserver to serve the requests, uvicorn is so popular right now.


from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
import json
import os
import base64

app = FastAPI()

class Container(BaseModel):
    name: str
    resources: dict = {}

class PodSpec(BaseModel):
    containers: list[Container]
    initContainers: list[Container] = []

class Pod(BaseModel):
    spec: PodSpec

class AdmissionReviewRequest(BaseModel):
    request: dict

class AdmissionReviewResponse(BaseModel):
    uid: str
    allowed: bool
    patchType: str
    patch: str

class AdmissionReview(BaseModel):
    apiVersion: str
    kind: str
    response: AdmissionReviewResponse

def ensure_path(patch, base_path, keys):
    for i, key in enumerate(keys):
        path = f"{base_path}/{'/'.join(keys[:i+1])}"
        patch.append({
            "op": "add",
            "path": path,
            "value": {}
        })

@app.post("/mutate")
async def mutate_pod(admission_review: AdmissionReviewRequest):
    pod = Pod(**admission_review.request["object"])

    patches = []

    # Modify the CPU requests to 0.001 for all containers
    for i, container in enumerate(pod.spec.containers):
        base_path = f"/spec/containers/{i}/resources"
        if "requests" not in container.resources:
            ensure_path(patches, base_path, ["requests"])
        patches.append({
            "op": "add" if "cpu" not in container.resources.get("requests", {}) else "replace",
            "path": f"/spec/containers/{i}/resources/requests/cpu",
            "value": "0.001"
        })

    # Modify the CPU requests to 0.001 for all initContainers
    for i, container in enumerate(pod.spec.initContainers):
        base_path = f"/spec/initContainers/{i}/resources"
        if "requests" not in container.resources:
            ensure_path(patches, base_path, ["requests"])
        patches.append({
            "op": "add" if "cpu" not in container.resources.get("requests", {}) else "replace",
            "path": f"/spec/initContainers/{i}/resources/requests/cpu",
            "value": "0.001"
        })

    # Base64 encode the patch
    patch_str = json.dumps(patches)
    patch_bytes = patch_str.encode('utf-8')
    patch_base64 = base64.b64encode(patch_bytes).decode('utf-8')

    # Construct the response
    response = AdmissionReviewResponse(
        uid=admission_review.request["uid"],
        allowed=True,
        patchType="JSONPatch",
        patch=patch_base64
    )

    admission_review_response = AdmissionReview(
        apiVersion="admission.k8s.io/v1",
        kind="AdmissionReview",
        response=response
    )

    return admission_review_response

if __name__ == "__main__":
    cert_file = os.getenv("TLS_CERT_FILE")
    key_file = os.getenv("TLS_KEY_FILE")

    if not cert_file or not key_file:
        raise ValueError("TLS_CERT_FILE and TLS_KEY_FILE environment variables must be set")

    uvicorn.run(app, host="0.0.0.0", port=8000, ssl_keyfile=key_file, ssl_certfile=cert_file)

I am not going to comment much the code above, after all it does the work form my cpu requests rewrite needs, your code will look different. Just a few pointers:

  • you will need to have your http server to use ssl, with a key and certificate to provide one way or another. OpenShift will reject your requests otherwise.
  • you will need to encode the patch array with base64.
  • my mutate_pod section creates or overrides cpu requests to 0.001, for containers and init containers. Kids, don't try that at home work.

You can see how to run and test this code in my github project page: https://github.com/joelapatatechaude/mutating-webhook-cpu In particular, there is an example folder with some admissionreviewrequest.

2. Containerize the app

I am using the following containerfile:

FROM registry.redhat.io/ubi9/ubi-minimal
RUN microdnf install python pip -y
RUN pip install fastapi uvicorn
EXPOSE 8000

COPY webhook.py /opt/webhook.py
CMD python /opt/webhook.py

I am using a Red Hat universal base image - minimal -. It's slim and secured. You might need to create a Red Hat account to access it. It's worth it, just read this if you need convincing: https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image You should then push that container to a registry, as it will be pulled by the pod, according to the deployment (see next section)

3. Deploy that container.

Below is an example deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mutate-cpu
  namespace: admission-webhook
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mutate-cpu
  template:
    metadata:
      labels:
        app: mutate-cpu
    spec:
      volumes:
        - name: my-tls
          secret:
            secretName: my-tls
      containers:
        - name: mutate-cpu
          resources:
            requests:
              cpu: 0.001
          image: quay.io/rh_ee_cschmitz/cpu-request-override:blogpost
          volumeMounts:
            - mountPath: '/mount'
              name: my-tls
              readOnly: true
          env:
            - name: TLS_CERT_FILE
              value: '/mount/tls.crt'
            - name: TLS_KEY_FILE
              value: '/mount/tls.key'
          imagePullPolicy: Always
          ports:
            - containerPort: 8000

The only thing to notice here is that I am mounting my certificate and key as env variables that will be consumed by my python web server. One way or another, you will need to get some certs. OpenShift won't let you run your admission webhook server without tls. You don't need to manually create / maintain those certs thoughts, OpenShift can do that for you, read the next section.

4. Get some certificates. Those can be injected, so very easy

OpenShift services that are annotated with:

service.beta.openshift.io/serving-cert-secret-name=<secret_name>

will have a secret named... secret_name automatically generated, that contains... a cert and a key. Read more about this here.

So when creating the mutating cpu-request webhook service, all is needed is to add that annotation on the service:

apiVersion: v1
kind: Service
metadata:
  name: mutate-cpu
  namespace: admission-webhook
  annotations:
    service.beta.openshift.io/serving-cert-secret-name: my-tls
spec:
  selector:
    app: mutate-cpu
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

And now, I have a my-tls secret with a tls.crt and tls.key data, which you saw are mounted by my pods (see the deployment section above)

5. Configure the webhook with a ValidatingWebhookConfiguration

The last thing to do is to register this webhook. This is done via the MutatingWebhookConfiguration

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  namespace: admission-webhook
  name: mutate-cpu
  annotations:
    service.beta.openshift.io/inject-cabundle: 'true'
webhooks:
  - name: mutate-pods.example.com
    clientConfig:
      service:
        name: mutate-cpu
        namespace: admission-webhook
        path: '/mutate'
        port: 8000
    rules:
      - operations: ['CREATE', 'UPDATE']
        apiGroups: ['']
        apiVersions: ['v1']
        resources: ['pods']
        scope: Namespaced
    admissionReviewVersions:
      - v1
    sideEffects: None
    namespaceSelector:
      matchLabels:
        mutate-cpu: 'true'
    failurePolicy: Ignore
    timeoutSeconds: 3

A few important things:

  • The annotation service.beta.openshift.io/inject-cabundle will ensure that the CA used to signed the certificates from the service (see section above) are injected. Therefore the service certificates will be validated / trusted.
  • failurePolicy is set to Ignore. This is important in this case as I don't want to prevent pod creation in the unlikely event there is a bug in my python code. On the other hand, for webhook which job is to check some critical security settings before validating a request, it may be important to set this to Fail (which is the default btw)
  • I lowered the timeoutSeconds to 3. It's my belief that my http server should respond quickly, or be ignored otherwise.
  • It's probably a safe practice to scope those webhook to specific namespaces. This can be done via a namspaceSelector. In my case, only namespaces with the label mutate-cpu will have their pod mutated by my webhook. Obviously, I would want to avoid certain critical namespace for which CPU requests are absolutely critical for the health of the cluster (openshift-etcd, openshift-apiserver, ...)

And that's it, now all I have to do is to relabel some target namespace, with something like:

oc label namespace/my-namespace mutate-cpu=true

and that should do the trick. Remember about the bouncer and the flipflop. Only new pods will get their cpu mutated. For existing pods in that namespace, you may need to do a rolling restart.

Concluding remarks

Like many things, getting an admission webhook to work the first time may seems a bit tricky. But once the pattern is understood, it becomes very easy to write more. Since then, for me, quite often, when confronted with some kubernetes challenges, I regularly think, oh! I can fix that with a mutatting / validating webhook. Everything looks like a nail to my new hammer.

Hammer and nails

Most of the code is available at https://github.com/joelapatatechaude/mutating-webhook-cpu , including the kubernetes resources to create, which are located somewhere under the helm directory.