It is essential that you become one with `kubectl`, `kubeadm`, and `etcdctl`
When killing objects you may want to not wait for it to be shut down gracefully, you can use `--force` to kill the object.
Typical tasks you would expect a k8s admin to know, understanding the architectural components, setting up a cluster from scratch, and maintaining a cluster going forward
To control who can access what on the cluster we need to establish certain policies
RBAC defines policies for users, groups and processes, by letting them (or disallowing them) to manage certain API resources.
RBAC has 3 building blocks:
Users and groups are not stored in `etcd` (the k8s db), they are meant for processes running outside of the cluster. On the other hand, service accounts are k8s objects and are used by processes running inside of the cluster.
As stated before there is no k8s object for user, instead the job of creating the credential and distributing to the users is done externally by the admin.
There are different ways k8s can authenticate a user:
You wont have to create a user in the exam, but here are the steps for creating one using the OpenSSL method.
Go to k8s control plane node and create a temporary dir that will have the keys.
mkdir cert && cd cert
Create a private key using openssl (username.key)
openssl genrsa -out johndoe.key 2048
Create a cerficiate sign request (a .csr file) with the key from the previous step
openssl req -new -key johndoe.key -out johndoe.csr -subj
"/CN=johndoe/O=cka-study-guide"
Sign the .csr with the k8s CA (it usually is under /etc/kubernetes/pki)
openssl x509 -req -in johndoe.csr -CA ca.crt -CAkey ca.key
-CAcreateserial -out johndoe.crt -days 364
Add it to your kubeconfig file
kubectl config set-credentials johndoe
--client-certificate=johndoe.crt --client-key=johndoe.key
The users we created in the last paragraphs are meant to be used by humans, if a pod or a svc needs to authenticate against the k8s cluster we need to create a service account.
A k8s cluster already comes with a `sa` called default. Any pod that does not explicitly assign a service account uses the default service account
It is super simple to create one with the imperative approach
$ k create sa build-bot
When creating a service account a secret holding the API token will be created too. The Secret and token names use the Service Account name as a prefix.
To assing a service account to a pod you can do it imperatively by:
$ k run build-observer --image=alpine --restart=Never --serviceaccount=build-bot
Or add it to the manifest under `spec.serviceAccountName`.
vvv One more important thing. vvv
If you want to make a call to k8s api from within a pod using your serviceaccount. You will need to create a token, and then do the requests using the internal dns.
k exec -it mypod -- /bin/bash
# curl -k -H "Authorization: Bearer the_token_you_just_got" https://kubernetes.default.svc/api/v1/namespaces
We have these two primitives:
Declares the api resources, and their operations. E.g. Allow listing and deleting pods
Connects or binds the roles to a Subject
There are some default roles:
rw access to resources across all namespaces
rw access to resources in namespaces including roles and rolebinding
rw access to resources in namespace except roles and rolebindings.
ro access to resources in namespace except roles, rolebindings and secrets.
Imperative mode for roles.
k create role read-only --verb=list,get,watch --resource=pods,deployments,services
There is also `--resource-name` as a flag, where you can specify names of pods.
Imperative mode for rolebindings.
$ k create rolebinding read-only-binding --role=read-only --user=johndoe
If then you do a `get` you wont see the subject, you need to render the details to see that.
You can of course use `describe` and so on to check each of the primitives once created, but there is also this little neat command: $ k auth can-i this will give specific info on user permissions
$ kubectl auth can-i --list --as johndoe
$ kubectl auth can-i list pods --as johndoe
Yes
Roles and RoleBinding are namespace specific. If we want to define the same but for the whole cluster we need to use ClusterRole and ClusterRoleBinding. The configuration elements are the same.
Finally, we can aggregate different roles with label selection. Say you have one role that lets you list pods, and other that lets you delete pods.
You can aggregate them with an `aggregationRule`
Here is an example from Benjamin Muschko's Oreilly's Certified Kubernetes Administrator (CKA) Study Guide.
YAML manifest for listing pods
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: list-pods namespace: rbac-example labels: rbac-pod-list: "true" rules: - apiGroups: - "" resources: - pods verbs: - list
YAML manifest for deleting svc's
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: delete-services namespace: rbac-example labels: rbac-service-delete: "true" rules: - apiGroups: - "" resources: - services verbs: - delete
YAML manifest aggregating them.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: pods-services-aggregation-rules namespace: rbac-example aggregationRule: clusterRoleSelectors: - matchLabels: rbac-pod-list: "true" - matchLabels: rbac-service-delete: "true" rules: []
Common tasks an admin is expected to perform are:
For bootstrapping oprations we use `kubeadm`. You will need to provision the underlying infra before, using ansible or terraform.
For the exam you can expect `kubeadm` to be installed already.
To start a cluster basically on your master machine, you need on have a container runtime up and running, such as `containerd`. Then you follow some steps:
sudo kubeadm init --pod-network-cidr 172.18.0.0/16
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubeadm join "some_ip" --token some_token \ --discovery-token-ca-cert-hash some:cert
Finally we need to deploy a Container network interface so pods can talk to each other. There are a lot of options, Flannel, Calico, and so on.
At the end of the day, whichever your choose, probably you will only have to run a `k apply -f` command.
If we have a cluster, with just one node as a control plane, it can become risky. If the control plane node dies, we are not going to be able to talk with k8s API; neither the workers.
There is this concept of High-availability (HA) clusters, these help with scalability and redundancy. Due to the complexity of setting them up, it is not likely you are going to have to perform the steps during the exam. But this talks about the idea of having multiple control planes nodes. There are different architectures, here I will briefly explain some.
This involves creating two or more control plane nodes where etcd is located inside the node. These will talk to the workers via a load balancer.
This involves creating two or more control plane nodes where etcd is located outside the node. These will talk to the workers via a load balancer. Do note that for these you need a extra node per control plane. Since we will have an etcd node per control node.
The main difference is that etcd is outside the control nodes, so if a control node fails, etcd will still be there.
Only upgrade from a minor version to a next higher one. 1.18 -> 1.19 , or from a patch version to a higher one. 1.18.0 -> 1.18.3. Abstain from jumping up multiple minor versions, to avoid unexpected side effects.
If you are managing a high available cluster, you need to upgrade one control plane node at a time.
ssh to the control plane
get current version, you can do `k get nodes`
upgrade kubeadm to the version you want with your package manager.
sudo apt-get install -y kubeadm=1.19.0-00
Check which versions are available to upgrade to
$ sudo kubeadm upgrade plan ... [upgrade] Fetching available versions to upgrade to [upgrade/versions] Cluster version: v1.18.20 [upgrade/versions] kubeadm version: v1.19.0 I0708 17:32:53.037895 17430 version.go:252] remote version is much newer: \ v1.21.2; falling back to: stable-1.19 [upgrade/versions] Latest stable version: v1.19.12 [upgrade/versions] Latest version in the v1.18 series: v1.18.20 ... You can now apply the upgrade by executing the following command: kubeadm upgrade apply v1.19.12 Note: Before you can perform this upgrade, you have to update kubeadm to v1.19.12. ...
upgrade it `sudo kubeadm upgrade apply v1.19.0`
we need to drain the node.
if the concept of drain is new to you, you are not alone, it basically means, that the given node will be marked unschedulable to prevent new pods from arriving.
kubectl drain kube-control-plane --ignore-daemonsets
Upgrade kubelet and kubectl to the same version. Again, using your package manager.
Restart and reaload `kubelet` process using systemclt
Mark node as schedulable again
`k uncordon kube-control-plane`
If you do `k get nodes` again now you should see that node with the new version.
ssh to the node
upgrade kubeadm using your package manager
upgrade the node using kubeadm
sudo kubeadm upgrade node
we need to drain (make it unscheduable) the node.
kubectl drain worker-node --ignore-daemonsets
Upgrade kubelet and kubectl to the same version.
Restart and reaload `kubelet` process using systemclt
Mark node as schedulable again
`k uncordon worker-node`
If you do `k get nodes` again now you should see that node with the new version.
`etcd` is where k8s stores both the declared and observed states of the cluster. It uses a distributed key-value store.
It is important to have a backup, in case of some issue. The backup process should happen periodically and in short time frames.
We will use the etcdctl tool to do this.:
ssh into the control pane
check you have a higher than 3.4 etcdctl version installed
Get and describe the pod etcd is running
`k get pods -n kube-system`
`k describe pod etcd-smth l-n kube-system`
Look for the value under --listen-client-urls. We are going to need to the path for the server.crt, .key, and the ca.crt.
$ sudo ETCDCTL_API=3 etcdctl --cacert=--cert= --key= \ snapshot save
It would look something like this:
sudo ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ snapshot save /opt/etcd-backup.db
Store the backup in a safe place
ssh into the system
run the `snapshot restore` command
sudo ETCDCTL_API=3 etcdctl --data-dir=/var/lib/from-backup snapshot restore \ /opt/etcd-backup.db
Now we need to change the etcd manifest, to point to our backup
It is mounted as a volume, with the `etcd-data` name.
These manifests are declared under /etc/kubernetes/manifests. The pod will be re-created (automatically?) pointing to the backup directory
Know how to define RBAC rules
Know how to create and manage a k8s cluster
Practice backing up and restoring etcd
Since we will have a node logs, pods logs, container logs it makes sense to think about how we will handle these streams. In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods or containers. This process is called cluster-level logging.
One other thing worth mentioning, log rotation, the kubelet daemon is responsible for rotating container logs and managing the logging directory structure. It tells the container runtime where to write container logs.
You can configure `containerLogMaxSize` (max size of file) and `containerLogMaxFiles` (how many log files can be created)
So for Cluster-Level logging we have these options:
Use a node-level logging agent
Create a DeamonSet that will act as an agent per node. It will look inside a directory where the logs are stored and push them to a logging backend.
Include dedicated sidecar container for handling logs in a pod
Push the logs directly from the app to a back-end (you have to modify the app code itself)
Creating a Pod is pretty simple, as long as the yaml is right k8s will try to create it. Nevertheless we need to verify the correct behaviour. The first thing is to verify the high-level runtime information of the Pod.
Check the resources, the deployment, the pods, check the status column. Check the number of restarts.
If the number of restarts is greater than 0, then you might want to check the logic of the liveness probe. Identify why the restart was necessary.
Here are some of the common status errors that one might come across.
ImagePullBackOff or ErrImagePull: check correct image name, check if image exists in registry, verify network access from node to registry, check auth.
CrashLoopBackOff: this means the application or command run in container crashes. Check the command executed. Make sure that the container can be created. Like run it with podman or smth
CreateContainerConfigError: the configmap or secret referenced by the container cannot be found. Double check if the objects exists in the namespace.
So a quick list on what to do:
Check status
Check events
Inspect the logs of the pod
If nothing else works, exec into a pods container, see if everything is behaving like it should
Ensure the label selector matches the assigned labels of the Pods
A really good command for checking this is the `k get endpoints` command. It will tell you which pods is your service using.
Check the type of a Service.
The default is `ClusterIP`, if so only the pods within that node can talk to the service.
Check the port mapping, meaning check that the service target port is the same as the port exposed by the pod.
The default is `ClusterIP`, if so only the pods within that node can talk to the service.