K8s is a container orchestration tool. Today there are a lot of microservices architectures. We can have a container per service and managing all those containers can be hard. K8s takes care of the scalability, security, persistence and load balancing.
When k8s is triggered to create a container, it will delegate it to the container runtime engine via a CRI (container runtime interface).
There are two types of nodes (these can be vm, baremetal machines, whatever you call a computer):
kubectl something
you are talking to this API.
These have different components to do their job:
There are some components that are shared by the nodes, whether they are control or workers.
K8s has api resources which are the building blocks of the cluster. These are your pods, deployments, services, and so on.
Every k8s primitive follows a general structure:
k api-versions
to see the versions
compatible with your cluster
This is how we talk to the api. Usually you do:
k <verb> <resource> <name>
Keep in mind we usually have different stuff in different namespaces, so we
are always appending -n <some namespace>
to the command.
The name of an object has to be unique across all objects of the same resource within a namespace.
There are two ways to manage objects: the imperative or the declarative way.
The imperative is where you use commands to make stuff happen in the cluster. Say you want to create an nginx pod you would do:
k run --image=nginx:latest nginx --port=80
This would create the pod in the cluster when you hit enter. In my professional experience, you hardly ever create stuff like that. The only time I use it is to create temporary pods to test something.
There are other verbs which you might use a bit more. edit
brings
up the raw config of the resource and you can change it on the fly. Although
I would recommend just do this for testing things. Hopefully your team has
the manifests under a version control system, if you edit stuff like this it
would mess it up.
There is also patch
which I have never used, but it... "Update
fields of a resource using strategic merge patch, a JSON merge patch, or a
JSON patch."
There is also delete
which -- as you probably guess already --
deletes the resource. Usually the object gets a 30 sec grace period for it to
die. But if it does not the kubelet will try to kill it forcefully.
If you do:
k delete pod nginx --now
It will ignore the grace period.
This is where you have a bunch of yaml
s which are your
definitions of resources. The cool thing about this is that you can version
control them. Say you have a nginx-deploy.yaml
. You can create
it in the cluster with:
k apply -f nginx-deploy.yaml
This gives you more flexibility on what you are doing. Since you can just go to the file change stuff and apply it again.
Usually I use a hybrid approach, most of the imperative commands have this
--dry-run=client -o yaml
flag that you can append to the command
and it will render the yaml manifest. You can redirect that to a file and
start working on that. You open the yaml with your favourite text editor, and
then mount volumes and stuff like that.
There are more ways to manage the resources for example you can use kustomize to render different values based on the same manifest, or with helm to bring up complete apps/releases to just cluster. Probably we will go over them later in the book.
There are a million ways of doing this. I used terraform to create some
droplets in digital ocean and packer with ansible to build an image that would
let everything ready for me to run the kubeadm
commands.
kubeadm
is the tool to create a cluster.
Here is a non-comprehensive list of what is needed before running
kubeadm
stuff.
Open ports needed for k8s to work
Disable swap; otherwise kubelet is going to fail to start
Install a container runtime, like containerd
Install kubeadm
There are some things k8s does not have by default. You need to install this extensions as needed.
Container Network Interface (CNI)
Container Network Interface (CRI)
Container Storage Interface (CSI)
Once you have kubeadm
in your system everything else is pretty
straight forward. You just ssh to your control plane and run:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
This runs some preflight checks to see if everything is working properly, if
not it will likely print a message telling you about what is wrong. In my case
it complained about /proc/sys/net/ipv4/ip_forward
being disabled.
But was able to fix it by just doing echo 1 | sudo tee
/proc/sys/net/ipv4/ip_forward
.
Where does the cidr
comes from? I had exactly the same question.
It seems that it will depend on the CNF you will install, but do not quote me
on that.
Once the command runs successfully, it will print next steps:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join ip.control.panel.and:port --token some-cool-token \
--discovery-token-ca-cert-hash sha256:some-cool-hash
Just follow those steps. You ssh into the workers and join them with that command. If you lost the tokens for some reason you can reprint them with:
kubeadm token create --print-join-command
Now, before joining the workers, you need to install the CNI, you can pick any of the ones on the k8s add-ons docs.
Installing them is nothing fancy, your literally just run a `k apply -f some-mainfest` and be done with it. I went with calico for no particular reason.
The control plane is like the most important part of the cluster, since if it fails, you are not even going to be able to talk to the API to do stuff. We can add redundancy to improve this. Here is where HA architectures come into play.
There are two
Stacked etcd topology
External etcd topology
You have at least three control planes each with its own etcd in the same node. All the nodes running at the same time and the workers talk to them through a load balancer, if one dies, we still have others.
Per control plane we have two nodes, one that runs etcd and one that runs the
actual control plane stuff. They communicate through the
kube-apiserver
api.
This topology require more nodes, and that means a bit more manage overhead.