cka prep

API Primitives and Objects

K8s has api resources which are the building blocks of the cluster. These are your pods, deployments, services, and so on.

Every k8s primitive follows a general structure:

api version:
- defines the structure of a primitive and uses it to validate the correctness of the data.
- you can do k api-versions to see the versions compatible with your cluster
kind
- the type of the primitive
metadata
- name, namespace, high level info
- here you see the UID, which is an id k8s generates for each object.
spec
- the desired state of the resource
status
- actual state of the resource

kubectl

This is how we talk to the api. Usually you do:

k <verb> <resource> <name>

Keep in mind we usually have different stuff in different namespaces, so we are always appending -n <some namespace> to the command.

The name of an object has to be unique across all objects of the same resource within a namespace.

Managing Objects

There are two ways to manage objects: the imperative or the declarative way.

Imperative

The imperative is where you use commands to make stuff happen in the cluster. Say you want to create an nginx pod you would do:

k run --image=nginx:latest nginx --port=80

This would create the pod in the cluster when you hit enter. In my professional experience, you hardly ever create stuff like that. The only time I use it is to create temporary pods to test something.

There are other verbs which you might use a bit more. edit brings up the raw config of the resource and you can change it on the fly. Although I would recommend just do this for testing things. Hopefully your team has the manifests under a version control system, if you edit stuff like this it would mess it up.

There is also patch which I have never used, but it... "Update fields of a resource using strategic merge patch, a JSON merge patch, or a JSON patch."

There is also delete which -- as you probably guess already -- deletes the resource. Usually the object gets a 30 sec grace period for it to die. But if it does not the kubelet will try to kill it forcefully.

If you do:

k delete pod nginx --now

It will ignore the grace period.

Declarative

This is where you have a bunch of yamls which are your definitions of resources. The cool thing about this is that you can version control them. Say you have a nginx-deploy.yaml. You can create it in the cluster with:

k apply -f nginx-deploy.yaml

This gives you more flexibility on what you are doing. Since you can just go to the file change stuff and apply it again.

Hybrid

Usually I use a hybrid approach, most of the imperative commands have this --dry-run=client -o yaml flag that you can append to the command and it will render the yaml manifest. You can redirect that to a file and start working on that. You open the yaml with your favourite text editor, and then mount volumes and stuff like that.

There are more ways to manage the resources for example you can use kustomize to render different values based on the same manifest, or with helm to bring up complete apps/releases to just cluster. Probably we will go over them later in the book.

Cluster Installation and Upgrade

Get Infrastructure Ready

There are a million ways of doing this. I used terraform to create some droplets in digital ocean and packer with ansible to build an image that would let everything ready for me to run the kubeadm commands.

kubeadm is the tool to create a cluster.

Here is a non-comprehensive list of what is needed before running kubeadm stuff.

Open ports needed for k8s to work
Disable swap; otherwise kubelet is going to fail to start
Install a container runtime, like containerd
Install kubeadm

Extension Interfaces

There are some things k8s does not have by default. You need to install this extensions as needed.

Container Network Interface (CNI)
- This manages the network interaction between containers.
Container Network Interface (CRI)
- This is the piece that interacts with the Container Runtime (containerd, or other) to tell it what to do. Kill, create containers and so on.
Container Storage Interface (CSI)
- This is the standard to implement plugins for interacting with block/file storage

Setup Cluster

Once you have kubeadm in your system everything else is pretty straight forward. You just ssh to your control plane and run:


sudo kubeadm init --pod-network-cidr=10.244.0.0/16

This runs some preflight checks to see if everything is working properly, if not it will likely print a message telling you about what is wrong. In my case it complained about /proc/sys/net/ipv4/ip_forward being disabled. But was able to fix it by just doing echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward.

Where does the cidr comes from? I had exactly the same question. It seems that it will depend on the CNF you will install, but do not quote me on that.

Once the command runs successfully, it will print next steps:


Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join ip.control.panel.and:port --token some-cool-token \
    --discovery-token-ca-cert-hash sha256:some-cool-hash

Just follow those steps. You ssh into the workers and join them with that command. If you lost the tokens for some reason you can reprint them with:


kubeadm token create --print-join-command

Now, before joining the workers, you need to install the CNI, you can pick any of the ones on the k8s add-ons docs.

Installing them is nothing fancy, your literally just run a `k apply -f some-mainfest` and be done with it. I went with calico for no particular reason.

Highly Available (HA) Cluster

The control plane is like the most important part of the cluster, since if it fails, you are not even going to be able to talk to the API to do stuff. We can add redundancy to improve this. Here is where HA architectures come into play.

There are two

Stacked etcd topology
External etcd topology

Remember etcd is key-value db k8s uses to store all of its info.

Stacked etcd topology

You have at least three control planes each with its own etcd in the same node. All the nodes running at the same time and the workers talk to them through a load balancer, if one dies, we still have others.

Diagram representing this arch, that I stole from the k8s docs on this topic.

External etcd topology

Per control plane we have two nodes, one that runs etcd and one that runs the actual control plane stuff. They communicate through the kube-apiserver api.

This topology require more nodes, and that means a bit more manage overhead.

Diagram representing this arch, that I stole from the k8s docs on this topic.

Upgrading Cluster Version

It is recommended to upgrade from a minor version to a next higher one, say, 1.18.0 to 1.19.0, or from a patch version to a higher one, 1.18.0 to 1.18.3

The high level plan is this:

Upgrade a primary control plane node
In case of HA, upgrade additional control planes
Upgrade worker nodes

One last thing before going to the steps. You are going to see that when we drain a node we use the --ignore-daemonsets flag. Which begs the question, what is a daemonset?

A daemonset defines pods needed for node-local stuff, say you want to have a daemon on each node that collects logs. You can deploy a daemonset for it. When we drain a node to upgrade we tell it to not kick out of there the daemonsets, since we might actually need those for the node to operate properly.

Upgrade Control Planes

ssh into the node
k get nodes to check current version
Use your package manager apt/dnf and upgrade kubeadm

Check which kubeadm versions are available to upgrade to

            
$ sudo kubeadm upgrade plan
...
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.18.20
[upgrade/versions] kubeadm version: v1.19.0
I0708 17:32:53.037895   17430 version.go:252] remote version is much newer: \
v1.21.2; falling back to: stable-1.19
[upgrade/versions] Latest stable version: v1.19.12
[upgrade/versions] Latest version in the v1.18 series: v1.18.20
...
You can now apply the upgrade by executing the following command:

    kubeadm upgrade apply v1.19.12

Note: Before you can perform this upgrade, you have to update kubeadm to v1.19.12.
...

Upgrade it kubeadm upgrade apply v1.19.12

Then we need to drain the node. Which means we mark the node as unschedulable, and new pods wont arrive.

            
kubectl drain kube-control-plane --ignore-daemonsets

Use your package manager to upgrade both kubelet and kubectl to the same version
Restart and reload kubelet daemon with systemctl

Mark node as schedulable again

            
k uncordon kube-control-plane

k get nodes should show the new version

Upgrade Workers

ssh into the node
k get nodes to check current version
Use your package manager apt/dnf and upgrade kubeadm
Do kubeadm upgrade node to upgrade the kubelet configuration

Drain the node as we did with the control plane

            
kubectl drain worker-node --ignore-daemonsets

Use your package manager to upgrade both kubelet and kubectl to the same version
Restart and reload kubelet daemon with systemctl

Mark node as schedulable again

            
k uncordon worker-node

k get nodes should show the new version

etcd

etcd is a key-value store used as k8s backing store for all the cluster information. They are a stand alone project with its own docs. Since it is used for backup, we need to know how to use it in order to restore or backup the cluster.

There are two cli's we will be working with etcdcutl and etcdutl.

etcdctl: primary way to interact with etcd over the network.
etcdutl: designed to operate with etcd data files directly, not over the network.

kubeadm will setup etcd as pods managed directly by the kubelet daemon (known as static pods). You can actually see them by runnin g

Backing up etcd cluster

All k8s data is stored in etcd, this includes sensitive data, therefore the snapshots created by etcd are encrypted.

In order to talk to etcd we can ssh into the control plane, then do etcdctl version to verify it is installed.

If you went with kubeadm as your installation way, you can see that there is a pod in the kube-system namespace that concerns etcd. If you describe it you will some information relevant to connect to etcd.


k describe pod etcd-cka-control-plane -n kube-system | grep '\-\-'
      --listen-client-urls=https://10.2.0.9:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

If we want to talk to etcd from outside the control plane node, we will need the --listen-client-urls addresses. If you are inside the node, you can skip that. We are going to need the path to all the TLS things. A simple command you can test if you have everything right is the following


ETCDCTL_API=3 etcdctl --endpoints 10.2.0.9:2379 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    member list
    bbf4baa696b33a2e, started, control-plane, https://10.2.0.9:2380, https://10.2.0.9:2379

Since the certificates are inside a path which your user probably does not have access, you will have to sudo it.

The you can create an snapshot by running the snapshot save /path/to/new/snapshot command.


ETCDCTL_API=3 etcdctl --endpoints https://162.243.29.89:2379 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt  \
    snapshot save snapshot.db
2025-08-09 17:04:29.201951 I | clientv3: opened snapshot stream; downloading
2025-08-09 17:04:29.241278 I | clientv3: completed snapshot read; closing
Snapshot saved at snapshot.db

Restore from snapshot

We will use the etcdutl to restore a snapshot.


etcdutl --data-dir /path/to/be/restored/to snapshot restore snapshot.db

We also need to point the etcd pod to this new path we have restored the info to. You can find the manifest for the etcd pod under /etc/kubernetes/manifests/etcd.yaml. There is a volume called etcd-data, point it to the new path, and restart the pod

Control Access to the k8s API

The way anyone talks with k8s, is through the API, it does not matter if you are a human, or a service account you all talk to the k8s http api. When a request gets to the server it goes through some stages, shown in the docs diagram which I copy pasted here: access-control-overview

Transport Security

All the requests go through TLS, by default the API will run on 0.0.0.0:6443 but this can be changed , with --secure-port and the --bind-address flags.

When you kubeadm init your cluster, k8s will create its own Certificate Authority (CA) and its key (/etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key respectively). It will use this to sign the certificates used by the API server.

Inside your .kube/config file you will need a copy of that certificate, this verifies that the API's certificate is authentic and was signed with the clusters CA.

Authentication

Once we have TLS, we can continue with authentication. The cluster admin may setup different authentication modules, if so they will be tried sequentially to see any suffices.

K8s may use the whole http request to authenticate, although most modules only use the headers.

If all the modules failed, then a 401 will be returned. If it is successful, the user is authenticated as an specific username.

Authorization

Once the request has passed the authentication stage, it is time to see if it can in fact do the action it was trying to accomplish. The request will must include its username, a requested action, and the resource affected by the action. The request then will be authorized if there is an existing policy that declares that the user has permissions to do the action it is intended to.

There are different authorization modules, the administrator can setup many in one cluster, they will be tried one by one and if all fails a 403 will be returned

Admission Control

If the authorization is successful, then we jump to admission controles. They are basically a piece of code that will check the data arriving in a request that modifies a resource. They do not control requests to read resources, only those that modify them. They usually just validate stuff. The thing is that if one fails the request is rejected, it is not like the others stages where we try one by one.

Auditing

Generate a chronological set of records, documenting everything that is happening.

Using RBAC Authorization

Role-based access control is a way of controlling access to network resources based on the roles an individual has. The rbac.authorization.k8s.io api group, allows you to set them up dynamically in the k8s cluster.

API objects

RBAC introduces 4 new object types to the cluster, Role, ClusterRole, RoleBinding, ClusterRoleBinding.

Role and ClusterRole
These represent a set of permissions. The only difference between the two is that Role defines the permissions for a namespace, and ClusterRole is not limited to a namespace.
- Role
  Here is the command for creating a role to get and watch all the pods in the nginx namespace.
```
            
k create role --dry-run=client -o yaml pod-reader --resource=pod --verb=get,watch -n nginx
            
            
```
  To be honest you might be better just going to the docs and copy the manifest from there, since it can get a bit long to write all the verbs and resources in one command.
- ClusterRole
  Since these are not bound to a namespaces you can also use them to set permissions to things like nodes and persistent volumes.
  
  The command is super similar, just we do not specify a namespace
```
          
k create clusterrole secret-watcher --resource=secret --verb=get,list --dry-run=client -o yaml
          
          
```
  Another thing specific to ClusterRoles is that you can aggregate them. When you create one you can add a label to it. Then you can create another one that uses that label.
```
          
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: monitoring-aggregate
aggregationRule:
  clusterRoleSelectors:
  - matchLabels:
      rbac.example.com/aggregate-to-monitoring: "true"
rules: []
          
          
```
  In this example we are adding all ClusterRoles that have the label rbac.example.com/aggregate-to-monitoring: "true".
RoleBinding and ClusterRoleBinding
Once you created your Role object you can bind it to a user, or service account. This makes Roles reusable. For example, you can create a pod-read-only, and bind it to many subjects (users, groups or service accounts).

A RoleBinding may bind any Role in the same namespace, buuut you can also use them to bind ClusterRoles, to a single namespace.

You can create them as you would expect
```
      
k create rolebinding pod-reader --dry-run=client -o yaml  --role=pod-readonly --user=jose
      
      
```
Remember you can always --help stuff, or copy an example from the wiki

You cannot patch/edit an existing rolebinding to change the roles. You have to delete it and create one again. There is this kubectl auth reconcile which will do that for you.

One last tip, you can always do


 k auth can-i get pod/logs --as="some-subject" -n "ns" # can-i verb resource

To check if the role is working as expected.

Service Account

A service account is a non-human account that provides an identity in a k8s cluster. Pods can use them to do requests against k8s api, to authenticate against a image registry.

They are represented in the k8s cluster with the ServiceAccount object. They are namespaced, lightweight, portable.

There is also this default service account created in every namespace. If you try to delete it the control plane replaces it. This account is assign to all pods if you do not manually assign one, and has api discovery permissions.

You can use RBAC to add roles to it. It is just another subject you can include in the roles manifests.

To use one you just have to:

Create the service account, in a declarative or imperative way
Give it roles with RBAC
Assign it to a pod during its creation

If you need its identity for an external service you need to get a token.


k create token "sa-name" -n test-token

To assign one to a pod just add the spec.serviceAccountName field.

Operators and Custom Resources

Helm and Kustomize

Helm

Helm is a package manager for k8s. This means that you can use it similar to apt or dnf to install full working packages in the k8s cluster.

Usually a deployment of a full service in a k8s cluster would involve multiple resources, services pods configmaps. It would be a bit complicated to deploy all of them using kubectl. With helm you can deploy full working solutions with just a few commands.

Say you want to deploy jenkins in your cluster. You could just look in the ArtifactHub for jenkins, and follow the instructions for installing the chart. It typically looks something like the following.

We first need to add the repo for helm to keep track of it.


helm repo add jenkins https://charts.jenkins.io
helm repo update

Then you can just install it specifying a name for the release. Do not forget that you are using your kubeconfig configuration so the namespace and cluster you are pointing to will be the target of this operation.
```
helm install my-jenkins jenkins/jenkins --version 5.8.25 # helm install [RELEASE_NAME] jenkins/jenkins [flags]
        
```
It will create all the k8s resources needed for it to work. The cool thing about this is that you can customize it a bit by passing values to certain variables for the package. Say you want to change the admin user, it varies on the package of course but here you can do something like:
```
helm install my-jenkins jenkinsci/jenkins --version 4.6.4 \
    --set controller.adminUser=boss --set controller.adminPassword=password \
    -n jenkins --create-namespace
        
```

You can discover a list of all the values too.


helm show values jenkinsci/jenkins

You can list the installed packages
```
helm list
            
```

There is also a simple way to upgrade a release


helm repo update; # so we have the most up-to-date version
helm upgrade my-jenkins jenkinsci/jenkins --version 5.8.26

Finally you can remove them by just doing the uninstall subcommand
```
helm uninstall my-jenkins
            
```

Kustomize

Kustomize allow you to manage multiple k8s manifests in an easy way. It has different capabilities.

You can build configmaps and other resources out of files.
You can patch different values, say the DNS for an application based on different overlays/environments.

It is really not worth going that much into detail since this will likely not come into the certification.

Just a few quick things, the heart of this is the kustomization.yaml file, there you will list all the resources kustomize will use to render the templates.

You can also render how the manifests would look without having to apply them with


kustomize bulid /path/to/kustomization.yaml # or
k kustomize /path/to/kustomization.yaml

Here is a short example on how you can start using this, say to add the same namespace to two different manfiests.


% tail -n +1  kustomization.yaml pod.yaml configmap.yaml
== kustomization.yaml ==
namespace: kustom
resources:
- pod.yaml
- configmap.yaml

== pod.yaml ==
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx:1.21.1
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

== configmap.yaml ==
apiVersion: v1
data:
  dir: /etc/logs/traffic-log.txt #/etc/logs/traffic.log
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: logs-config

Workloads

A workload is an application running inside a k8s cluster. Wether your application has different components running or just one, you will run it inside a set of Pods. A Pod is nothing more than a set of containers.

Pods have a defined life-cycle, meaning if you kill one or it dies do to some issue, it is not going to respawn by itself or anything.

To make life easier, k8s has a set of different controllers that will help manage this pods. Say, always keeping 3 of them alive, even if one is killed, spin up another one to take its place. You can use workload resources to make this happen. The workload resources will configure these controllers depending on what you want to do. We will go more in depth on each, but here is a brief intro into each of them.

Deployment and ReplicaSet, these are good more managing workloads where pods are replaceable/interchangeable, stateless applications.
StatefulSet, this will help you run applications where pods do keep track of the state. Useful when mounting Persistent Volumes to different pods, so they stay consistent.
DaemonSet, pods that provide some functionality to Nodes, maybe for networking, or to manage the node. These are like daemons that will be assigned to each Node.
Job and CronJob, define tasks that will run until completion and then stop.

Pods

A pod is like a set of containers with shared namespaces and shared file systems. You can run just one or multiple containers in one Pod.

A Pod's Lifecycle

Pods are consider ephemeral, pods are created, assigned a unique id (UID), scheduled to run to nodes where they will live until their termination. If a node dies, the pods that lived there, or were scheduled to live there, will be marked for deletion.

While a pod is running, kubelet can restart its containers to handle some kind of faults.

Pods are only scheduled one in their lifetime; assigning a pod to a node is called binding, and the process of selecting which node the pod should go to is known as scheduling. Once a pod is scheduled to a node they are bound until either of them dies.

A pod is never "re-scheduled", it is simply killed and replaced by maybe a super similar one but the UID will be different.

Pod phases

There are several pods phases:

Phase	Description
Pending	The Pod has been accepted by k8s, but one or more containers are not ready to run. This means it might be waiting for scheduling or downloading an image from a registry.
Running	The Pod has been bound to a Node, all the containers have been created. At least one of them is running, or in the process of starting/restarting.
Succeeded	All containers in the Pod have been terminated in success.
Failed	All containers in the Pod have been terminated, but one ore more terminated in failure.
Unknown	We could not get the state of the pod, usually an error with communicating with the Node the pod is running on.

CrashLoopBackOff and Terminating are not actually phases of a pod. Make sure to not confuse status with phase.

Pods handling issues

Similar to every living thing on this green Earth, a Pod will be presented with issues along its time in this world filled with thorns and thistles. Maybe, as us, even its own life will depend on how well it is able to solve them. This unnecessary biblical de-tour begs the question, how does it handle problems with containers?

The pods spec has a restartPolicy. This will determine how k8s reacts to containers exiting due to errors.

Initial Crash, k8s immediately tries to restart it based on the restartPolicy
Repeated Crashes, if it keeps failing, it will add an exponential backoff delay for the next restarts
CrashLoopBackOff state, this indicates the backoff delay mechanism is in effect.
Backoff reset, if a container manages to stay alive for a certain duration of time, the backoff delay is restarted.

Troubleshooting is its own separated section, but here are some reasons a Pod might be CrashLoopBackOff ing.

Application errors are causing the container to exit
Configuration errors, missing files, or env vars
Resources, the container may not have enough memory or cpu to start
Healthchecks are failing if the application doesn't start serving in time.

How to debug this? Check the logs, events, ensure the configuration is set up properly, check resources limits, debug application. Maybe even run the image locally, see if it is working fine.

A restartPolicy can be Never, Always, OnFailure.

Container Probes

A probe is a diagnostic periodically performed by the kubelet. There are three types the livenessProbe, readinessProbe, startupProbe.

Pretty self explanatory, maybe the only thing to clarify is that the startupProbe indicates if the app inside a container started. All the other probes will be disabled until this is done. Usually this one is used for containers that take a long time to start.

And the readinessProbe indicates whether the container is ready to respond to requests.

There are 4 check mechanisms.

exec: exec a command inside the container, if successful return 0.
grpc: performs remote call using gRPC.
httpGet: makes a http GET request against the pod ip to a given endpoint.
tcpSocket: perform a tcp check, considers successful if the port is open

Containers

Init Containers

An init container is one (or and array) of containers that will run before your main application containers. They will run until completion, meaning they cannot live side by side with your main containers. Those are sidecars, which we will talk about later.

They run sequentially, and if one fails kubelet will restart that init container until it succeed, if the restartPolicy is set to never. Then when it fails the whole pod will be treated as failed.

They have all the fields and features of regular containers, they just do not have probes.

They are useful to setup different stuff in your application. Like set up things in volumes and stuff like that. Maybe download a file or something.

Here is an example form the docs where the init container waits for a svc in k8s to be up and running before starting these pods containers.


apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app.kubernetes.io/name: MyApp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]

Sidecar Containers

Sidecar Containers are containers that run side by side with the application container, they can be used for logging, data sync, monitoring, and so on. Typically you only have one application container per pod. For example, if you have a web app that requires a web server, you would have you web app in the app container and use a sidecar container as the web server.

The implementation is super simple, just add a restartPolicy to an init container and there you have it. The cool thing is that they will still be run sequentially, but the one with the restartPolicy will keep running.

You can also use probes on these containers opposed to regular init containers.

When a pod dies, first the app container will be deleted, then the sidecar containers on opposed order to what they were spawned on.

You can have multiple containers as the main containers of the pod, so when to use this. The app containers are meant for executing primary application logic. That is why usually you just have one and use sidecars for anything else.

Workload Management

Your applications will run inside pods, the k8s api offers different resources to help you manage them. Say a pod dies, you would not like to have to go in the middle of the night and start it again. We can use k8s objects that will help us manage them.

ReplicaSet

A replicaset's purpose is to keep a stable set of replica pods running at any given time. You tell it how many and how does the pod look and it will remove/create pods to maintain this state.

In a replicaset's fields you specify a selector, which tells the replicaset how to identify pods it can acquire and a number specifying how many pods it should be maintaining.

You are not likely to create a replicaset by itself, you usually create higher level resources like a Deployment that then will use ReplicaSets.

Deployments

A deployment manages a set of pods that run an application. You describe the desired state for the application, say, how many pods need to be available? and the controller will maintain that state.

There are some fields worth mentioning in a deployments' manifest. First, the spec.replicas field will specify the number of replicas, of course. We also have spec.selector which tells the replicaset how to find the pods to manage, usually this one will match the label set in the spec.template pod template.

Once you k apply deployment, there are some commands that will come in handy.

k get deploy will give you overall information on the deployments; how many replicasets have been created, how many are available, the age.
k rollout status deployment/some-deployment will print a message telling you how many replicas have been rollout, and similar stuff.
k get rs will print the status for the replicasets

If you update an image from the deployment k set image deployment/some-deploy nginx=nginx:latest and do k get rs you will see that there are two now. The one with the previous image that now marks its pods as 0 and the new one with the pods marked as the spec.replicas says.

The deployment controller ensures that only an specific number of pods are down wile being updated. By default it makes sure that at least 3/4 of the desired number of pods are up. Meaning only 1/4 can be unavailable.

When updating, the controller will look for existing replicasets that control certain spec.label but do not match the existing spec.template, and scale those down. While a new replicaset with the new spec.template is scaled up.

Rollback

If you update a deployment but it is not going the way you wanted, you can easily go back to the previous version of your deployment. First you need to check the rollout history, to choose what version will you rollback to.


kubectl rollout history deployment/nginx-deployment

You can see more details on a rollout by using the same command but with the --revision=n command.

If you decide to rollback you can do


kubectl rollout undo deployment/nginx-deployment --to-revision=n

Scale

You can scale the replicas in a deployment with


kubectl scale deployment/nginx-deployment --replicas=10

If Horizontal Pod Autoscaler is setup you can setup it up based on cpu/memory usage.


kubectl autoscale deployment/nginx-deployment --min=10 --max=15 --cpu-percent=80

Strategy

The spec.strategy field, will tell you the type of strategy used to replace old pods by new pods. I can either be, Recreate or RollingUpdate. The latter is the default value.

If Recreate, all pods are killed before new ones are created.

If RollingUpdate, one replicaset is scaled down while a new one is scaled up. You can specify maxUnavailable and maxSurge to control this.

Max Unavailable: tells how many pods can be unavailable during the updating process. If set 30%, the deployment will scale down the old replicaset to 70% of its capacity, and will not scale it further down until the new pods in the new replicaset are ready. Making sure that always at least 70% of the pods are available.
Max Surge: this specifies how much the number of pods can go over the limit specified in the replicaset. Say you have 10 pods running and set this to 3; when the upadate starts the controller will scale the total number of pods to 13. If this number is higher, the update will be faster, but at the expense of using more resources.

StatefulSets

There are similar to deployments, but they maintain a sticky identity to each of the pods they create.

You will use stateful sets if you need:

If you need stable network identities, meaning, your pods have the same name after (re)scheduling, opposed to a random hash at the end of their name.
You want to specify a PVC per pod, and do not have them fight for one pre-defined.

Compute Resource Management

In a pod you can specify how much of a resource (RAM an CPU) a container needs. You can set a request of certain amount of resources, and the scheduler uses this information to know on which node to put it. You can also set a limit on how many resources a container can use, and kubelet will make sure the running container does not exceeds those.

A pod may use more resources than it requested, as long as the node has enough of them there will be no issue.

Limits work differently though. They are enforced by the linux kernel. For cpu they are hard limits, the kernel will restrict access to the CPU based on its limit by CPU throttling. For memory the kernel uses out of memory (OOM) kills. This does not mean that as soon as the container exceeds the memory it is killed, the kernel will only kill it if it detects memory pressure.

You specify cpu and memory limits/resources using specific units. Kuberenetes CPU units, and bytes respectively.

Usually you specify the limits at container level


spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory

But since v1.32 you can also set them a pod level


spec.resources.limits.cpu
spec.resources.limits.memory
spec.resources.requests.cpu
spec.resources.requests.memory

Quotas

You can limit the resources by namespace using a k8s resource called ResourceQuota. These are not limited to only memory and cpu, since you can limit the amount of objects that can be created in a namespace, say only create 10 pods or something like that.

Users need to specify the resource limit or request on their workloads if not the API may not give permission to create them. This can be a bit painful for developers, so you can define LimitRange to set defaults on pods that do not specifically set the requirements.

Important note on all these, they do not apply to running pods, they only apply to new pods. So if you have some deployment and then set a LimitRange expecting the pods from that deployment to apply it you are wrong.

Here is an example on how they look like:


apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-resource-constraint
spec:
  limits:
  - default: # this section defines default limits
      cpu: 500m
    defaultRequest: # this section defines default requests
      cpu: 500m
    max: # max and min define the limit range
      cpu: "1"
    min:
      cpu: 100m
    type: Container

One last thing, the LimitRange wont check if your limits make sense. If you specify a limit less than your request, it will let you fail.

Network

The k8s network model has several pieces:

Each pod had its own unique cluster-wide ip.

A pod has its own private network which is shared by all the containers running in the pod; they can talk to each other using localhost.
The pod network handles communication between pods. It makes sure pods can communicate with each other regardless of the node they are in. This also allows for node deamons to talk to the pods living on the same node.
The Service api, provides a long lived IP address/hostname for a service implemented by pods. The pods can be replaced but the service will stay the same.

There is another object called EndpointSlice which provides information about the pods currently working for a service.
The Gateway API, (or its predecessor, Ingress), allows you to make a svc accessible to clients outside the cluster.
NetworkPolicy allows you to control traffic between pods

Network Policies

You can specify how a pod is allowed to communicate to different entities over the network using Network Policies. They are dependant on the network plugin you used, but you usually can specify which namespaces, which pods, or which IP blocks are allowed to send and receive traffic to/from a pod.

you go for the namespaces/pods NetworkPolicy you will use a selector to tell what traffic is allowed.

If you go with the IP blocks you will define CIDR ranges.

Two things worth mentioning, the pod will always allow traffic between the node and itself, and a pod cannot block access to itself.

Pod Isolation

There are two types of pod isolation, egress and ingress. They are declared independently.

egress will tell us who the pod is allowed to send traffic to; meaning who it can speak to.
ingress will tell us who the pod is allowed to receive traffic from; meaning who it can listen from.

By default a pod will allow all outbound (egress) and inbound (ingress) connections. You can create NetworkPolicy resources where the selector matches a pod and applies its rules to, they are accumulative.

NetworkPolicy

Here is an example from the docs


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector:
        matchLabels:
          project: myproject
    - podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 6379
  egress:
  - to:
    - ipBlock:
        cidr: 10.0.0.0/24
    ports:
    - protocol: TCP
      port: 5978

A few fields worth mentioning:

spec.podSelector: selects the group of pods to which the policy will apply. An empty one will select all pods in the namespace.
spec.policyTypes: this may include Ingress, Egress, or both.
spec.egress: list of allowed egress rules. Has a to and ports sections.
spec.ingress: list of allowed ingress rules. Has a from and ports sections.

Be careful, these two configs are different.


ingress:
- from:
  - namespaceSelector:
      matchLabels:
        user: alice
    podSelector:
      matchLabels:
        role: client

Here are accepting traffic from pods that are in the namespace with the label user: alice and also they need to have the label role: client.


ingress:
- from:
  - namespaceSelector:
      matchLabels:
        user: alice
  - podSelector:
      matchLabels:
        role: client

Here you are accepting traffic from pods that are in the namespace with that label or from pods that have that label.

For the ipBlock the IP blocks you select must be cluster-external IPs since Pod IPs are ephemeral.

One last thing worth mentioning, to target a namespace by name you will have to use the immutable label kubernetes.io/metadata.name.

Services

If you do a deployment to your cluster that serves as a backend for an application you want to access over the network. It can get tricky, because the pods of a replicaset are ephemeral, their IPs will change all the time. Your front end is not expected to update the address every time something happens.

This is why we have services, these allow you to select a group of pods using label selectors (so if one is killed and spawned it will still be picked up) and assign an IP that wont change in your cluster.

This way, pods can be killed and respawned but the frontend only has to keep track of 1 address.

In the service definition you will specify the selector and ports you want to use target.


apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app.kubernetes.io/name: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

The pods selected by that label will form a resource called EndpointSlices which basically will do the mapping, the controller for that service will update the virtual IPs if they change.

You can define names for ports inside pods, which then you can use as a reference in the service.

You can also create the services for a deployment with the k expose command.

Service Types

ClusterIP makes the service reachable from within the cluster.
NodePort, map the service to a port on the node, this will give it outside access.
LoadBalancer exposes the service externally using a load balancer. k8s does not offer a load balancing component you will have to use a cloud provider or something.
ExternalName, map the serivce to the externalName field, say api.foo.example, this setup the cluster's DNS server to return a CNAME record with that hostname value.

Gateway API

The Gateway API are some k8s resources that provide traffic routing, and make network services available. They are role-oriented, meaning each level of resource is supposedly manage by different personas, infra engineer, cluster admin, and developers. Here is a list of the 3 levels.

GatewayClass: these are managed by the infra engineer, they are similar to a StorageClass as in they are not limited to namespaces, they are cluster-scoped, and usually given by the cloud provider. This is how the cloud provider handles requests from the outside world.

They are as simple as this:
```
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: example-class
  spec:
    controllerName: example.com/gateway-controller
            
```
Gateway: these describe how traffic can be translated to Services within the cluster.
```
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: example-class
  listeners:
  - name: http
    protocol: HTTP
    port: 80
```
This basically is saying, create a gateway using the class specified there, and listen on port 80.

HTTPRoute, tells the behaviour of http requests from the gateway listener.


apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example-httproute
spec:
  parentRefs:
  - name: example-gateway
  hostnames:
  - "www.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /login
    backendRefs:
    - name: example-svc
      port: 8080

Here we are telling the gateway we just created, that if in the Host: of the request you find www.example.com and that the path is /login then you should use example-svc on port 8080

Ingress

An Ingress is an object that manges external access to a svc. Here you can define a hostname, tls among other things.

To make them work in your cluster you need to first have an Ingress Class, there are a few you can choose from like ingress-nginx controller.

The imperative way for creating one is actually a good way to understand them to. Look at the command:


k create ing website-api --rule='website.com/api=my-svc:8080'

The rule part is will tell the cluster that if a request gets, where the host is webiste.com and the path is /api it should map it to the svc called my-svc on port 8080. Basically host/path=service:port.

In that example we are not specifying tls, but you can do it by pointing to a secret of that type.

There are some path types, like if the /api you specified should be a exact match (Exact) or can be a prefix (Prefix)and stuff like that.

CoreDNS

You can talk with pods and services within the cluster using its dns. It is as simple as following this structure name.namespace.type.cluster.local

In order to do this, k8s runs a DNS server implementation called CoreDNS, if you get the pods from kube-system you will be able to see the pod that is running this. The config is in a cm called coredns in the same namespace.

~ cka prep

~ Table of Contents