< back home
Crated on ; Last modified on
This used to be a different page. Go to cka old notes to check those out.

~ cka prep

Table of Contents

Sections


K8s in a Nutshell

K8s is a container orchestration tool. Today there are a lot of microservices architectures. We can have a container per service and managing all those containers can be hard. K8s takes care of the scalability, security, persistence and load balancing.

When k8s is triggered to create a container, it will delegate it to the container runtime engine via a CRI (container runtime interface).

Features

High-Level Architecture

There are two types of nodes (these can be vm, baremetal machines, whatever you call a computer):

Control Plane Nodes

These have different components to do their job:

Components Shared by Nodes

There are some components that are shared by the nodes, whether they are control or workers.

Advantages on Using k8s


Interacting with K8s

API Primitives and Objects

K8s has api resources which are the building blocks of the cluster. These are your pods, deployments, services, and so on.

Every k8s primitive follows a general structure:

kubectl

This is how we talk to the api. Usually you do:

k <verb> <resource> <name>

Keep in mind we usually have different stuff in different namespaces, so we are always appending -n <some namespace> to the command.

The name of an object has to be unique across all objects of the same resource within a namespace.

Managing Objects

There are two ways to manage objects: the imperative or the declarative way.

Imperative

The imperative is where you use commands to make stuff happen in the cluster. Say you want to create an nginx pod you would do:

k run --image=nginx:latest nginx --port=80

This would create the pod in the cluster when you hit enter. In my professional experience, you hardly ever create stuff like that. The only time I use it is to create temporary pods to test something.

There are other verbs which you might use a bit more. edit brings up the raw config of the resource and you can change it on the fly. Although I would recommend just do this for testing things. Hopefully your team has the manifests under a version control system, if you edit stuff like this it would mess it up.

There is also patch which I have never used, but it... "Update fields of a resource using strategic merge patch, a JSON merge patch, or a JSON patch."

There is also delete which -- as you probably guess already -- deletes the resource. Usually the object gets a 30 sec grace period for it to die. But if it does not the kubelet will try to kill it forcefully.

If you do:

k delete pod nginx --now

It will ignore the grace period.

Declarative

This is where you have a bunch of yamls which are your definitions of resources. The cool thing about this is that you can version control them. Say you have a nginx-deploy.yaml. You can create it in the cluster with:

k apply -f nginx-deploy.yaml

This gives you more flexibility on what you are doing. Since you can just go to the file change stuff and apply it again.

Hybrid

Usually I use a hybrid approach, most of the imperative commands have this --dry-run=client -o yaml flag that you can append to the command and it will render the yaml manifest. You can redirect that to a file and start working on that. You open the yaml with your favourite text editor, and then mount volumes and stuff like that.

There are more ways to manage the resources for example you can use kustomize to render different values based on the same manifest, or with helm to bring up complete apps/releases to just cluster. Probably we will go over them later in the book.


Cluster Installation and Upgrade

Get Infrastructure Ready

There are a million ways of doing this. I used terraform to create some droplets in digital ocean and packer with ansible to build an image that would let everything ready for me to run the kubeadm commands.

kubeadm is the tool to create a cluster.

Here is a non-comprehensive list of what is needed before running kubeadm stuff.

Extension Interfaces

There are some things k8s does not have by default. You need to install this extensions as needed.

Setup Cluster

Once you have kubeadm in your system everything else is pretty straight forward. You just ssh to your control plane and run:


sudo kubeadm init --pod-network-cidr=10.244.0.0/16

This runs some preflight checks to see if everything is working properly, if not it will likely print a message telling you about what is wrong. In my case it complained about /proc/sys/net/ipv4/ip_forward being disabled. But was able to fix it by just doing echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward.

Where does the cidr comes from? I had exactly the same question. It seems that it will depend on the CNF you will install, but do not quote me on that.

Once the command runs successfully, it will print next steps:


Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join ip.control.panel.and:port --token some-cool-token \
    --discovery-token-ca-cert-hash sha256:some-cool-hash

Just follow those steps. You ssh into the workers and join them with that command. If you lost the tokens for some reason you can reprint them with:


kubeadm token create --print-join-command

Now, before joining the workers, you need to install the CNI, you can pick any of the ones on the k8s add-ons docs.

Installing them is nothing fancy, your literally just run a `k apply -f some-mainfest` and be done with it. I went with calico for no particular reason.

Highly Available (HA) Cluster

The control plane is like the most important part of the cluster, since if it fails, you are not even going to be able to talk to the API to do stuff. We can add redundancy to improve this. Here is where HA architectures come into play.

There are two

Remember etcd is key-value db k8s uses to store all of its info.

Stacked etcd topology

You have at least three control planes each with its own etcd in the same node. All the nodes running at the same time and the workers talk to them through a load balancer, if one dies, we still have others.

stacked etcd topology Diagram representing this arch, that I stole from the k8s docs on this topic.

External etcd topology

Per control plane we have two nodes, one that runs etcd and one that runs the actual control plane stuff. They communicate through the kube-apiserver api.

This topology require more nodes, and that means a bit more manage overhead.

stacked etcd topology Diagram representing this arch, that I stole from the k8s docs on this topic.

Upgrading Cluster Version

It is recommended to upgrade from a minor version to a next higher one, say, 1.18.0 to 1.19.0, or from a patch version to a higher one, 1.18.0 to 1.18.3

The high level plan is this:

  1. Upgrade a primary control plane node

  2. In case of HA, upgrade additional control planes

  3. Upgrade worker nodes

One last thing before going to the steps. You are going to see that when we drain a node we use the --ignore-daemonsets flag. Which begs the question, what is a daemonset?

A daemonset defines pods needed for node-local stuff, say you want to have a daemon on each node that collects logs. You can deploy a daemonset for it. When we drain a node to upgrade we tell it to not kick out of there the daemonsets, since we might actually need those for the node to operate properly.

Upgrade Control Planes

  1. ssh into the node

  2. k get nodes to check current version

  3. Use your package manager apt/dnf and upgrade kubeadm

  4. Check which kubeadm versions are available to upgrade to

                
    $ sudo kubeadm upgrade plan
    ...
    [upgrade] Fetching available versions to upgrade to
    [upgrade/versions] Cluster version: v1.18.20
    [upgrade/versions] kubeadm version: v1.19.0
    I0708 17:32:53.037895   17430 version.go:252] remote version is much newer: \
    v1.21.2; falling back to: stable-1.19
    [upgrade/versions] Latest stable version: v1.19.12
    [upgrade/versions] Latest version in the v1.18 series: v1.18.20
    ...
    You can now apply the upgrade by executing the following command:
    
        kubeadm upgrade apply v1.19.12
    
    Note: Before you can perform this upgrade, you have to update kubeadm to v1.19.12.
    ...
                
                

  5. Upgrade it kubeadm upgrade apply v1.19.12

  6. Then we need to drain the node. Which means we mark the node as unschedulable, and new pods wont arrive.

                
    kubectl drain kube-control-plane --ignore-daemonsets
                
                

  7. Use your package manager to upgrade both kubelet and kubectl to the same version

  8. Restart and reload kubelet daemon with systemctl

  9. Mark node as schedulable again

                
    k uncordon kube-control-plane
                
                

  10. k get nodes should show the new version

Upgrade Workers

  1. ssh into the node

  2. k get nodes to check current version

  3. Use your package manager apt/dnf and upgrade kubeadm

  4. Do kubeadm upgrade node to upgrade the kubelet configuration

  5. Drain the node as we did with the control plane

                
    kubectl drain worker-node --ignore-daemonsets
                
                

  6. Use your package manager to upgrade both kubelet and kubectl to the same version

  7. Restart and reload kubelet daemon with systemctl

  8. Mark node as schedulable again

                
    k uncordon worker-node
                
                

  9. k get nodes should show the new version

etcd

etcd is a key-value store used as k8s backing store for all the cluster information. They are a stand alone project with its own docs. Since it is used for backup, we need to know how to use it in order to restore or backup the cluster.

There are two cli's we will be working with etcdcutl and etcdutl.

kubeadm will setup etcd as pods managed directly by the kubelet daemon (known as static pods). You can actually see them by runnin g

Backing up etcd cluster

All k8s data is stored in etcd, this includes sensitive data, therefore the snapshots created by etcd are encrypted.

In order to talk to etcd we can ssh into the control plane, then do etcdctl version to verify it is installed.

If you went with kubeadm as your installation way, you can see that there is a pod in the kube-system namespace that concerns etcd. If you describe it you will some information relevant to connect to etcd.


k describe pod etcd-cka-control-plane -n kube-system | grep '\-\-'
      --listen-client-urls=https://10.2.0.9:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

If we want to talk to etcd from outside the control plane node, we will need the --listen-client-urls addresses. If you are inside the node, you can skip that. We are going to need the path to all the TLS things. A simple command you can test if you have everything right is the following

ETCDCTL_API=3 etcdctl --endpoints 10.2.0.9:2379 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    member list
    bbf4baa696b33a2e, started, control-plane, https://10.2.0.9:2380, https://10.2.0.9:2379

Since the certificates are inside a path which your user probably does not have access, you will have to sudo it.

The you can create an snapshot by running the snapshot save /path/to/new/snapshot command.


ETCDCTL_API=3 etcdctl --endpoints https://162.243.29.89:2379 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt  \
    snapshot save snapshot.db
2025-08-09 17:04:29.201951 I | clientv3: opened snapshot stream; downloading
2025-08-09 17:04:29.241278 I | clientv3: completed snapshot read; closing
Snapshot saved at snapshot.db

Restore from snapshot

We will use the etcdutl to restore a snapshot.


etcdutl --data-dir /path/to/be/restored/to snapshot restore snapshot.db

We also need to point the etcd pod to this new path we have restored the info to. You can find the manifest for the etcd pod under /etc/kubernetes/manifests/etcd.yaml. There is a volume called etcd-data, point it to the new path, and restart the pod

Control Access to the k8s API

The way anyone talks with k8s, is through the API, it does not matter if you are a human, or a service account you all talk to the k8s http api. When a request gets to the server it goes through some stages, shown in the docs diagram which I copy pasted here: access-control-overview

Transport Security

All the requests go through TLS, by default the API will run on 0.0.0.0:6443 but this can be changed , with --secure-port and the --bind-address flags.

When you kubeadm init your cluster, k8s will create its own Certificate Authority (CA) and its key (/etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key respectively). It will use this to sign the certificates used by the API server.

Inside your .kube/config file you will need a copy of that certificate, this verifies that the API's certificate is authentic and was signed with the clusters CA.

Authentication

Once we have TLS, we can continue with authentication. The cluster admin may setup different authentication modules, if so they will be tried sequentially to see any suffices.

K8s may use the whole http request to authenticate, although most modules only use the headers.

If all the modules failed, then a 401 will be returned. If it is successful, the user is authenticated as an specific username.

Authorization

Once the request has passed the authentication stage, it is time to see if it can in fact do the action it was trying to accomplish. The request will must include its username, a requested action, and the resource affected by the action. The request then will be authorized if there is an existing policy that declares that the user has permissions to do the action it is intended to.

There are different authorization modules, the administrator can setup many in one cluster, they will be tried one by one and if all fails a 403 will be returned

Admission Control

If the authorization is successful, then we jump to admission controles. They are basically a piece of code that will check the data arriving in a request that modifies a resource. They do not control requests to read resources, only those that modify them. They usually just validate stuff. The thing is that if one fails the request is rejected, it is not like the others stages where we try one by one.

Auditing

Generate a chronological set of records, documenting everything that is happening.

Using RBAC Authorization

Role-based access control is a way of controlling access to network resources based on the roles an individual has. The rbac.authorization.k8s.io api group, allows you to set them up dynamically in the k8s cluster.

API objects

RBAC introduces 4 new object types to the cluster, Role, ClusterRole, RoleBinding, ClusterRoleBinding.

One last tip, you can always do


 k auth can-i get pod/logs --as="some-subject" -n "ns" # can-i verb resource

To check if the role is working as expected.

Service Account

A service account is a non-human account that provides an identity in a k8s cluster. Pods can use them to do requests against k8s api, to authenticate against a image registry.

They are represented in the k8s cluster with the ServiceAccount object. They are namespaced, lightweight, portable.

There is also this default service account created in every namespace. If you try to delete it the control plane replaces it. This account is assign to all pods if you do not manually assign one, and has api discovery permissions.

You can use RBAC to add roles to it. It is just another subject you can include in the roles manifests.

To use one you just have to:

If you need its identity for an external service you need to get a token.

k create token "sa-name" -n test-token

To assign one to a pod just add the spec.serviceAccountName field.

Operators and Custom Resources

Custom Resources

A resource is an endpoint in k8s api, that manages objects of the same type. One example is the pods resource; it is an endpoint of the api and you use it to create, destroy, list pod objects.

Then a custom resource is an extension of k8s native api. You can create your own resources for your own needs. Custom resources can be created and destroyed dynamically on a running cluster, and once installed you can use kubectl to manage them as you would manage any other resource.

Say, you might have one to create a database custom resource to represent and manage it inside of your cluster.

Custom Controllers

A custom resource by itself will only represent some structured data. To make them work in a true declarative state you need to add also a controller.

In an imperative API you tell the server to do something and it does it. In a declarative api, like k8s is, you tell it the state you want to accomplish, in this case using the custom resources endpoints, and then there will be a controller that makes sure that state is true.


One small disclaimer, using a custom resource is not a one size fits all solution, you have to check if you really need to implement this as a CRD or maybe just a separated API or even a configmap would do the trick, the k8s docs have a section to help you decide whether you really need them or not.

It is outside of the scope of the exam but there are two ways of creating a custom resource.

The first one is simple as you do not need to code anything just define the custom resource in a manifest and the k8s api will handle storage and stuff like that. Go to the official docs for more info.

Operator Pattern

The operator pattern is creating a custom controller to manage a custom resource.

One example would be, deploying an application on demand. This would look something like, we have a new custom resource called ApplicationDeployment where the developer specifies the application they want to deploy. Now when they k apply -f it, there would be a controller that takes care of all the deployment of the app.

There are many operators already created by the community, you can find several in the OperatorHub. A popular one is ArgoCD, this defines custom resources such as Application where you can point to a git repository , and it will make sure the code is in sync with that repository among other things. Popular on organizations using GitOps.

Helm and Kustomize

Helm

Helm is a package manager for k8s. This means that you can use it similar to apt or dnf to install full working packages in the k8s cluster.

Usually a deployment of a full service in a k8s cluster would involve multiple resources, services pods configmaps. It would be a bit complicated to deploy all of them using kubectl. With helm you can deploy full working solutions with just a few commands.

Say you want to deploy jenkins in your cluster. You could just look in the ArtifactHub for jenkins, and follow the instructions for installing the chart. It typically looks something like the following.

Kustomize

Kustomize allow you to manage multiple k8s manifests in an easy way. It has different capabilities.

It is really not worth going that much into detail since this will likely not come into the certification.

Just a few quick things, the heart of this is the kustomization.yaml file, there you will list all the resources kustomize will use to render the templates.

You can also render how the manifests would look without having to apply them with


kustomize bulid /path/to/kustomization.yaml # or
k kustomize /path/to/kustomization.yaml

Here is a short example on how you can start using this, say to add the same namespace to two different manfiests.


% tail -n +1  kustomization.yaml pod.yaml configmap.yaml
== kustomization.yaml ==
namespace: kustom
resources:
- pod.yaml
- configmap.yaml

== pod.yaml ==
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx:1.21.1
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

== configmap.yaml ==
apiVersion: v1
data:
  dir: /etc/logs/traffic-log.txt #/etc/logs/traffic.log
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: logs-config


Workloads

A workload is an application running inside a k8s cluster. Wether your application has different components running or just one, you will run it inside a set of Pods. A Pod is nothing more than a set of containers.

Pods have a defined life-cycle, meaning if you kill one or it dies do to some issue, it is not going to respawn by itself or anything.

To make life easier, k8s has a set of different controllers that will help manage this pods. Say, always keeping 3 of them alive, even if one is killed, spin up another one to take its place. You can use workload resources to make this happen. The workload resources will configure these controllers depending on what you want to do. We will go more in depth on each, but here is a brief intro into each of them.

Pods


A pod is like a set of containers with shared namespaces and shared file systems. You can run just one or multiple containers in one Pod.

A Pod's Lifecycle

Pods are consider ephemeral, pods are created, assigned a unique id (UID), scheduled to run to nodes where they will live until their termination. If a node dies, the pods that lived there, or were scheduled to live there, will be marked for deletion.

While a pod is running, kubelet can restart its containers to handle some kind of faults.

Pods are only scheduled one in their lifetime; assigning a pod to a node is called binding, and the process of selecting which node the pod should go to is known as scheduling. Once a pod is scheduled to a node they are bound until either of them dies.

A pod is never "re-scheduled", it is simply killed and replaced by maybe a super similar one but the UID will be different.

Pod phases

There are several pods phases:

Phase Description
Pending The Pod has been accepted by k8s, but one or more containers are not ready to run. This means it might be waiting for scheduling or downloading an image from a registry.
Running The Pod has been bound to a Node, all the containers have been created. At least one of them is running, or in the process of starting/restarting.
Succeeded All containers in the Pod have been terminated in success.
Failed All containers in the Pod have been terminated, but one ore more terminated in failure.
Unknown We could not get the state of the pod, usually an error with communicating with the Node the pod is running on.
CrashLoopBackOff and Terminating are not actually phases of a pod. Make sure to not confuse status with phase.

Pods handling issues

Similar to every living thing on this green Earth, a Pod will be presented with issues along its time in this world filled with thorns and thistles. Maybe, as us, even its own life will depend on how well it is able to solve them. This unnecessary biblical de-tour begs the question, how does it handle problems with containers?

The pods spec has a restartPolicy. This will determine how k8s reacts to containers exiting due to errors.

  1. Initial Crash, k8s immediately tries to restart it based on the restartPolicy

  2. Repeated Crashes, if it keeps failing, it will add an exponential backoff delay for the next restarts

  3. CrashLoopBackOff state, this indicates the backoff delay mechanism is in effect.

  4. Backoff reset, if a container manages to stay alive for a certain duration of time, the backoff delay is restarted.

Troubleshooting is its own separated section, but here are some reasons a Pod might be CrashLoopBackOff ing.

How to debug this? Check the logs, events, ensure the configuration is set up properly, check resources limits, debug application. Maybe even run the image locally, see if it is working fine.

A restartPolicy can be Never, Always, OnFailure.

Container Probes

A probe is a diagnostic periodically performed by the kubelet. There are three types the livenessProbe, readinessProbe, startupProbe.

Pretty self explanatory, maybe the only thing to clarify is that the startupProbe indicates if the app inside a container started. All the other probes will be disabled until this is done. Usually this one is used for containers that take a long time to start.

And the readinessProbe indicates whether the container is ready to respond to requests.

There are 4 check mechanisms.

  1. exec: exec a command inside the container, if successful return 0.

  2. grpc: performs remote call using gRPC.

  3. httpGet: makes a http GET request against the pod ip to a given endpoint.

  4. tcpSocket: perform a tcp check, considers successful if the port is open

Containers

Init Containers

An init container is one (or and array) of containers that will run before your main application containers. They will run until completion, meaning they cannot live side by side with your main containers. Those are sidecars, which we will talk about later.

They run sequentially, and if one fails kubelet will restart that init container until it succeed, if the restartPolicy is set to never. Then when it fails the whole pod will be treated as failed.

They have all the fields and features of regular containers, they just do not have probes.

They are useful to setup different stuff in your application. Like set up things in volumes and stuff like that. Maybe download a file or something.

Here is an example form the docs where the init container waits for a svc in k8s to be up and running before starting these pods containers.


apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app.kubernetes.io/name: MyApp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]

Sidecar Containers

Sidecar Containers are containers that run side by side with the application container, they can be used for logging, data sync, monitoring, and so on. Typically you only have one application container per pod. For example, if you have a web app that requires a web server, you would have you web app in the app container and use a sidecar container as the web server.

The implementation is super simple, just add a restartPolicy to an init container and there you have it. The cool thing is that they will still be run sequentially, but the one with the restartPolicy will keep running.

You can also use probes on these containers opposed to regular init containers.

When a pod dies, first the app container will be deleted, then the sidecar containers on opposed order to what they were spawned on.

You can have multiple containers as the main containers of the pod, so when to use this. The app containers are meant for executing primary application logic. That is why usually you just have one and use sidecars for anything else.

Workload Management


Your applications will run inside pods, the k8s api offers different resources to help you manage them. Say a pod dies, you would not like to have to go in the middle of the night and start it again. We can use k8s objects that will help us manage them.

ReplicaSet

A replicaset's purpose is to keep a stable set of replica pods running at any given time. You tell it how many and how does the pod look and it will remove/create pods to maintain this state.

In a replicaset's fields you specify a selector, which tells the replicaset how to identify pods it can acquire and a number specifying how many pods it should be maintaining.

You are not likely to create a replicaset by itself, you usually create higher level resources like a Deployment that then will use ReplicaSets.

Deployments

A deployment manages a set of pods that run an application. You describe the desired state for the application, say, how many pods need to be available? and the controller will maintain that state.

There are some fields worth mentioning in a deployments' manifest. First, the spec.replicas field will specify the number of replicas, of course. We also have spec.selector which tells the replicaset how to find the pods to manage, usually this one will match the label set in the spec.template pod template.

Once you k apply deployment, there are some commands that will come in handy.

If you update an image from the deployment k set image deployment/some-deploy nginx=nginx:latest and do k get rs you will see that there are two now. The one with the previous image that now marks its pods as 0 and the new one with the pods marked as the spec.replicas says.

The deployment controller ensures that only an specific number of pods are down wile being updated. By default it makes sure that at least 3/4 of the desired number of pods are up. Meaning only 1/4 can be unavailable.

When updating, the controller will look for existing replicasets that control certain spec.label but do not match the existing spec.template, and scale those down. While a new replicaset with the new spec.template is scaled up.

Rollback

If you update a deployment but it is not going the way you wanted, you can easily go back to the previous version of your deployment. First you need to check the rollout history, to choose what version will you rollback to.


kubectl rollout history deployment/nginx-deployment
You can see more details on a rollout by using the same command but with the --revision=n command.

If you decide to rollback you can do


kubectl rollout undo deployment/nginx-deployment --to-revision=n

Scale

You can scale the replicas in a deployment with


kubectl scale deployment/nginx-deployment --replicas=10
If Horizontal Pod Autoscaler is setup you can setup it up based on cpu/memory usage.

kubectl autoscale deployment/nginx-deployment --min=10 --max=15 --cpu-percent=80

Strategy

The spec.strategy field, will tell you the type of strategy used to replace old pods by new pods. I can either be, Recreate or RollingUpdate. The latter is the default value.

If Recreate, all pods are killed before new ones are created.

If RollingUpdate, one replicaset is scaled down while a new one is scaled up. You can specify maxUnavailable and maxSurge to control this.

StatefulSets

There are similar to deployments, but they maintain a sticky identity to each of the pods they create.

You will use stateful sets if you need:

Compute Resource Management

In a pod you can specify how much of a resource (RAM an CPU) a container needs. You can set a request of certain amount of resources, and the scheduler uses this information to know on which node to put it. You can also set a limit on how many resources a container can use, and kubelet will make sure the running container does not exceeds those.

A pod may use more resources than it requested, as long as the node has enough of them there will be no issue.

Limits work differently though. They are enforced by the linux kernel. For cpu they are hard limits, the kernel will restrict access to the CPU based on its limit by CPU throttling. For memory the kernel uses out of memory (OOM) kills. This does not mean that as soon as the container exceeds the memory it is killed, the kernel will only kill it if it detects memory pressure.

You specify cpu and memory limits/resources using specific units. Kuberenetes CPU units, and bytes respectively.

Usually you specify the limits at container level


spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
But since v1.32 you can also set them a pod level

spec.resources.limits.cpu
spec.resources.limits.memory
spec.resources.requests.cpu
spec.resources.requests.memory

Quotas

You can limit the resources by namespace using a k8s resource called ResourceQuota. These are not limited to only memory and cpu, since you can limit the amount of objects that can be created in a namespace, say only create 10 pods or something like that.

Users need to specify the resource limit or request on their workloads if not the API may not give permission to create them. This can be a bit painful for developers, so you can define LimitRange to set defaults on pods that do not specifically set the requirements.

Important note on all these, they do not apply to running pods, they only apply to new pods. So if you have some deployment and then set a LimitRange expecting the pods from that deployment to apply it you are wrong.

Here is an example on how they look like:


apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-resource-constraint
spec:
  limits:
  - default: # this section defines default limits
      cpu: 500m
    defaultRequest: # this section defines default requests
      cpu: 500m
    max: # max and min define the limit range
      cpu: "1"
    min:
      cpu: 100m
    type: Container

One last thing, the LimitRange wont check if your limits make sense. If you specify a limit less than your request, it will let you fail.


Network

The k8s network model has several pieces:

Network Policies

You can specify how a pod is allowed to communicate to different entities over the network using Network Policies. They are dependant on the network plugin you used, but you usually can specify which namespaces, which pods, or which IP blocks are allowed to send and receive traffic to/from a pod.

you go for the namespaces/pods NetworkPolicy you will use a selector to tell what traffic is allowed.

If you go with the IP blocks you will define CIDR ranges.

Two things worth mentioning, the pod will always allow traffic between the node and itself, and a pod cannot block access to itself.

Pod Isolation

There are two types of pod isolation, egress and ingress. They are declared independently.

By default a pod will allow all outbound (egress) and inbound (ingress) connections. You can create NetworkPolicy resources where the selector matches a pod and applies its rules to, they are accumulative.

NetworkPolicy

Here is an example from the docs


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector:
        matchLabels:
          project: myproject
    - podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 6379
  egress:
  - to:
    - ipBlock:
        cidr: 10.0.0.0/24
    ports:
    - protocol: TCP
      port: 5978

A few fields worth mentioning:

Be careful, these two configs are different.


ingress:
- from:
  - namespaceSelector:
      matchLabels:
        user: alice
    podSelector:
      matchLabels:
        role: client
Here are accepting traffic from pods that are in the namespace with the label user: alice and also they need to have the label role: client.


ingress:
- from:
  - namespaceSelector:
      matchLabels:
        user: alice
  - podSelector:
      matchLabels:
        role: client
Here you are accepting traffic from pods that are in the namespace with that label or from pods that have that label.

For the ipBlock the IP blocks you select must be cluster-external IPs since Pod IPs are ephemeral.

One last thing worth mentioning, to target a namespace by name you will have to use the immutable label kubernetes.io/metadata.name.

Services

If you do a deployment to your cluster that serves as a backend for an application you want to access over the network. It can get tricky, because the pods of a replicaset are ephemeral, their IPs will change all the time. Your front end is not expected to update the address every time something happens.

This is why we have services, these allow you to select a group of pods using label selectors (so if one is killed and spawned it will still be picked up) and assign an IP that wont change in your cluster.

This way, pods can be killed and respawned but the frontend only has to keep track of 1 address.

In the service definition you will specify the selector and ports you want to use target.


apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app.kubernetes.io/name: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

The pods selected by that label will form a resource called EndpointSlices which basically will do the mapping, the controller for that service will update the virtual IPs if they change.

You can define names for ports inside pods, which then you can use as a reference in the service.

You can also create the services for a deployment with the k expose command.

Service Types

Gateway API

The Gateway API are some k8s resources that provide traffic routing, and make network services available. They are role-oriented, meaning each level of resource is supposedly manage by different personas, infra engineer, cluster admin, and developers. Here is a list of the 3 levels.

Ingress

An Ingress is an object that manges external access to a svc. Here you can define a hostname, tls among other things.

To make them work in your cluster you need to first have an Ingress Class, there are a few you can choose from like ingress-nginx controller.

The imperative way for creating one is actually a good way to understand them to. Look at the command:


k create ing website-api --rule='website.com/api=my-svc:8080'
The rule part is will tell the cluster that if a request gets, where the host is webiste.com and the path is /api it should map it to the svc called my-svc on port 8080. Basically host/path=service:port.

In that example we are not specifying tls, but you can do it by pointing to a secret of that type.

There are some path types, like if the /api you specified should be a exact match (Exact) or can be a prefix (Prefix)and stuff like that.

CoreDNS

You can talk with pods and services within the cluster using its dns. It is as simple as following this structure name.namespace.type.cluster.local

In order to do this, k8s runs a DNS server implementation called CoreDNS, if you get the pods from kube-system you will be able to see the pod that is running this. The config is in a cm called coredns in the same namespace.


~ Table of Contents

↑ go to the top