Docker and Kubernetes Security/Chapter 1

Chapter 1

14 min read

Introduction to Containers and Container Security

Introduction to Kubernetes

Following Docker's impact on the software industry, Google open-sourced Kubernetes in 2014. Kubernetes is a container orchestration platform, designed for ru...

Following Docker's impact on the software industry, Google open-sourced Kubernetes in 2014. Kubernetes is a container orchestration platform, designed for running and managing containers at scale. Kubernetes makes deployment of a containerized application to a cluster of machines easy. It also has integrated support for service discovery, load balancing, and scaling.

Kubernetes is a Greek word that means helmsman or pilot, helm being a ship's steering wheel. Speaking of helms, there is another tool usually used with Kubernetes called Helm. Helm makes writing Kubernetes manifests more DRY and reusable, by separating the common parts of the manifests into templates. As you can imagine, the Kubernetes ship can get really wet at times.

Some of the benefits of using Kubernetes are:

Automation: Kubernetes makes deploying application to a cluster of machines easy. So, instead of SSHing into the machine, getting the last version of the code (e.g. with git pull), building the application, and running it, you can just run a single command and Kubernetes will take care of the rest.
High Availability: Kubernetes can run multiple instances of the same application. If one of the instances fails, Kubernetes will restart it. Servers can fail, containers will die, but you can't kill the helmsman that easily!
Out of the Box: Kubernetes comes with a lot of features out of the box. It has support for service discovery, load balancing, and scaling. You can also extend it with plugins and custom resources.

Now that we learned Kubernetes is so awesome as a whole, let's learn about its pieces.

Kubernetes Terminology

In this part, we're going to discuss the different building blocks of a Kubernetes application. As Kubernetes is already overwhelming for beginners, we're going to skip details about how it operates.

Let's start with a list of basic Kubernetes concepts:

Pod: A pod is the smallest unit of deployment in Kubernetes. It's usually used to run a single container. It's also used to run multiple containers that are tightly coupled and need to share resources.
Service: A service is an abstraction that defines a logical set of pods and a policy to access them. It's used to expose a pod to the outside world.
Deployment: A deployment is a declarative way to manage pods. It's used to manage pods that are part of the same application, and to scale them up and down. If you create a deployment with 3 replicas, Kubernetes will make sure that there are always 3 pods running.
Namespace: A namespace is a way to divide cluster resources between multiple users. It's used to separate different environments like production and staging.
Ingress: An ingress is an API object that manages external access to the services in a cluster. It's used to expose a service to the outside world.

These terms sound overwhelmingly complicated, because they are. But they'll make more sense when we start using them. So, let's do it!

Getting Started with Kubernetes

A Kubernetes cluster usually consists of multiple machines. There are two types of machines in a Kubernetes cluster:

Master node: It's the machine that runs the Kubernetes control plane. It's responsible for managing the cluster.
Worker node: It's the machine that runs the Kubernetes worker. It's responsible for running the containers.

In the case one of the machines fails, the other machines can take over its responsibilities. So, the cluster is highly available.

To test Kubernetes locally (on a single machine), there are the following options:

Docker Desktop: This tool is a cross-platform GUI for Docker. It comes with a single-node Kubernetes cluster.
Minikube: It's a tool that runs a single-node Kubernetes cluster inside a virtual machine. It's CLI-based and can be used on Linux, macOS, and Windows.
Kind: Stands for Kubernetes in Docker. It's a tool that runs a single-node Kubernetes cluster inside a Docker container.

At the time of writing, there are other options to run Kubernetes locally as well, like MicroK8s, K3s, and K0s.

And of course there is the option of running Kubernetes on a cloud provider like AWS, GCP, or Azure. Or perhaps smaller cloud providers like DigitalOcean or Linode, for a more cost-effective solution.

Now that we have a Kubernetes cluster running, let's get our hands dirty with some Kubernetes commands.

Kubernetes Hello World

Before starting with the example, we need to have kubectl installed. We mentioned how to install it in the technical requirements section.

The kubectl can connect to one or more Kubernetes clusters, so we have a concept of context in kubectl. To connect to different clusters, we need to switch between contexts. We can list the available contexts with the following command:

$ kubectl config get-contexts

The output on my machine is something like this:

CURRENT   NAME             CLUSTER              AUTHINFO
          docker-desktop   docker-desktop       docker-desktop
*         minikube         minikube             minikube

Let's switch to the docker-desktop context:

$ kubectl config use-context docker-desktop

If you want to use a different cluster, you can of course use the context of your choice. Now let's create a pod with the kubectl run command:

$ kubectl run hello-world --image=hello-world

The output should be something like this:

pod/hello-world created

We can list the pods with the kubectl get pods command:

$ kubectl get pods

The output should be something like this:

NAME          READY   STATUS             RESTARTS      AGE
hello-world   0/1     CrashLoopBackOff   3 (26s ago)   72s

The pod is in the CrashLoopBackOff state. This is expected, as the hello-world image doesn't have a process that keeps running. It just prints a message and exits. We can get more information about the pod with the kubectl describe pod command:

$ kubectl describe pod hello-world

The output should be something like this:

Events:
  Type     Reason     Age     Message
  ----     ------     ----    -------
  Normal   Scheduled  37s     Successfully assigned hello-world to docker-desktop
  Normal   Pulled     34s     Successfully pulled image "hello-world" in 1.7959s
  Normal   Pulled     33s     Successfully pulled image "hello-world" in 1.2793s
  Normal   Pulling    18s     Pulling image "hello-world"
  Normal   Created    17s     Created container hello-world
  Normal   Started    17s     Started container hello-world
  Normal   Pulled     17s     Successfully pulled image "hello-world" in 1.1725s
  Warning  BackOff    5s      Back-off restarting failed container

We can look at the logs of the pod with the kubectl logs command:

$ kubectl logs hello-world

The output should be something like this:

Hello from Docker!

The rest of the output is omitted, as it's the same output we got when we ran the docker run hello-world command.

Now let's delete the pod with the kubectl delete pod command:

$ kubectl delete pod hello-world

In real-world applications, we usually don't create pods directly. We use deployments instead. And to create deployments, we use YAML manifests. Even a step further, we usually don't create deployments directly. We use Helm charts instead. And to create Helm charts, we use YAML manifests as well.

Now let's create a deployment with the kubectl create deployment command:

$ kubectl create deployment hello-world --image=busybox -- /bin/sh -c \
  "while true; do echo Hello World; sleep 1; done"

Then let's list all the Kubernetes resources with the kubectl get all command:

NAME                                READY   STATUS    RESTARTS   AGE
pod/hello-world-57d969cbb9-tkdg4    1/1     Running   0          9s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   214d

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/hello-world    1/1     1            1           10s

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-57d969cbb9    1         1         1       10s

You can see there is a pod, a deployment, a service, and a replicaset. Let's look into each one of them in more details.

Pod

You can think of a pod as a container. It's called a pod (and not a container) in Kubernetes terminology, because it could potentially run multiple containers. But in most cases, it runs only one container.

To get more information about the pod, we can use the kubectl describe pod command:

$ kubectl describe pod hello-world-57d969cbb9-tkdg4

You should use your own pod name here. The output should be something like this:

Name:             hello-world-57d969cbb9-tkdg4
Namespace:        default
Priority:         0
Service Account:  default
Node:             docker-desktop/192.168.65.4
Start Time:       Sun, 17 Sep 2023 23:24:17 +0200
Labels:           app=hello-world
                  pod-template-hash=57d969cbb9
Annotations:      <none>
Status:           Running
IP:               10.1.1.0
IPs:
  IP:           10.1.1.0
Controlled By:  ReplicaSet/hello-world-57d969cbb9
Containers:
  busybox:
    Container ID:  docker://b683e2c5613ceb50dfc3166bed0506394cc4dc8688
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:b683e2c5613cecc4dc8688
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      while true; do echo Hello World; sleep 1; done
    State:          Running
      Started:      Sun, 17 Sep 2023 23:24:19 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qtjkm

There is even more description on the terminal, but we copied only up to the Containers section. You can see that the pod has a container called busybox. It's the container we created with the kubectl create deployment command. You can also see the command. We also have the labels and annotations, that would help us to identify the pod. We could initially describe the pod by its labels:

$ kubectl describe pod -l app=hello-world

Deployment

Deployments are ways to manage pods. They are declarative, so we don't need to create pods directly. In a deployment, we specify the number of replicas we want, and Kubernetes will make sure that there are always that many replicas running.

To get more information about the deployment, we can use the kubectl describe deployment command:

$ kubectl describe deployment hello-world

The output should be something like this:

Name:                   hello-world
Namespace:              default
CreationTimestamp:      Sun, 17 Sep 2023 23:24:17 +0200
Labels:                 app=hello-world
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=hello-world
Replicas:               1 desired | 1 updated | 1 total | 1 available
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=hello-world
  Containers:
   busybox:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      while true; do echo Hello World; sleep 1; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   hello-world-57d969cbb9 (1/1 replicas created)
Events:          <none>

The deployment has information like the number of replicas, the strategy for updating the replicas, and the pod template.

Service

Services are logical objects that make containers accessible from outside the cluster. I say "logical" because services don't have a physical representation. They are just some rules that Kubernetes uses to route traffic to the pods.

To get more information about the service, we can use the kubectl describe service command:

$ kubectl describe service kubernetes

The output should be something like this:

Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.0.1
IPs:               10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.65.4:6443
Session Affinity:  None
Events:            <none>

The service contains details such as the IP address, port, and endpoints. The endpoints are the pods that are managed by the service. Here we have type of ClusterIP. It's the default type for services. It means that the service is only accessible from inside the cluster. We can change the type to NodePort to make the service accessible from outside the cluster.

Replicaset

Replicasets differ from deployments in that they are not declarative. They are used by deployments to manage pods.

To get more information about the replicaset, we can use the kubectl describe replicaset command:

$ kubectl describe replicaset hello-world-57d969cbb9

The output should be something like this:

Name:           hello-world-57d969cbb9
Namespace:      default
Selector:       app=hello-world,pod-template-hash=57d969cbb9
Labels:         app=hello-world
                pod-template-hash=57d969cbb9
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/hello-world
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=hello-world
           pod-template-hash=57d969cbb9
  Containers:
   busybox:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      while true; do echo Hello World; sleep 1; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:           <none>

The replicaset has information like the number of replicas, the pod template, and the deployment that manages it.

We can say, only the pods are physical objects in Kubernetes. The rest are logical stuff.

Kubernetes Hello World with Helm

The standard way of creating Kubernetes applications is using Helm charts. Helm charts are packages of Kubernetes manifests.

First, make sure helm is installed on your machine. Then let's clone the following git repository that contains a link shortener application:

$ git clone https://github.com/aerabi/link-shortener-js.git

Then let's go to the link-shortener-js directory:

$ cd link-shortener-js

On the root of the repository, there is a directory called chart. Let's take a look into it:

$ tree chart

The output should be something like this:

chart
    Chart.yaml
    templates
        _helpers.tpl
        deployment.yaml
        ingress.yaml
        service.yaml
    values.yaml

2 directories, 6 files

Let's run the following command to install the application in the link-shortener namespace:

$ kubectl create namespace link-shortener
$ helm install link-shortener-js chart --namespace link-shortener

The output should be something like this:

NAME: link-shortener-js
LAST DEPLOYED: Sun Sep 24 11:55:14 2023
NAMESPACE: link-shortener
STATUS: deployed
REVISION: 1
TEST SUITE: None

Let's list everything in the link-shortener namespace:

$ kubectl get all -n link-shortener

The output should be something like this:

NAME                                    READY   STATUS    RESTARTS   AGE
pod/link-shortener-js-5458ff4bb-nsrx2   1/1     Running   0          2m17s

NAME                        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/link-shortener-js   10.102.219.230   <none>        3000/TCP   2m17s

NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/link-shortener-js   1/1     1            1           2m17s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/link-shortener-js-5458ff4bb   1         1         1       2m17s

Let's describe the service:

$ kubectl describe service link-shortener-js -n link-shortener

The output should be something like this:

Name:              link-shortener-js
Namespace:         link-shortener
Labels:            app.kubernetes.io/instance=link-shortener-js
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=link-shortener-js
                   app.kubernetes.io/version=hashmap
                   helm.sh/chart=link-shortener-js-0.1.0
Annotations:       meta.helm.sh/release-name: link-shortener-js
                   meta.helm.sh/release-namespace: link-shortener
Selector:          app.kubernetes.io/instance=link-shortener-js,
                   app.kubernetes.io/name=link-shortener-js
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.102.219.230
IPs:               10.102.219.230
Port:              api  3000/TCP
TargetPort:        3000/TCP
Endpoints:         10.1.1.1:3000
Session Affinity:  None
Events:            <none>

As you can see, the service is only accessible from inside the cluster. To make it accessible from outside the cluster, we need to change the service type to NodePort. Go to the chart/values.yaml file and change the service.type value to NodePort.

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

service:
  type: NodePort # <-- Change this line

ports:
  internal:
    name: "api"
    number: 3000

Then run the following command to upgrade the application:

$ helm upgrade link-shortener-js chart --namespace link-shortener

The output should be something like this:

Release "link-shortener-js" has been upgraded. Happy Helming!
NAME: link-shortener-js
LAST DEPLOYED: Sun Sep 24 12:24:18 2023
NAMESPACE: link-shortener
STATUS: deployed
REVISION: 2
TEST SUITE: None

Let's list everything in the link-shortener namespace again:

$ kubectl get all -n link-shortener

You can see that the service is now of type NodePort. The port 3000 is mapped to the port 32045. You can access the service on the port 32045 now.

$ curl localhost:32045

The output should be the following:

Hello World!

Voilà! You have a Kubernetes application running. Next stop is talking about container security.

Exercises

In the example above, investigate the Chart.yaml file. What's the name of the chart? What's the version of the chart?
In the example above, investigate the templates/deployment.yaml file. What's the name of the deployment? What's the name of the pod template? What's the name of the container? To find out, you should investigate _helpers.tpl file as well. The _helpers.tpl file is a template that is used by the other templates. It's usually used to define reusable parts of the templates.
The templates/deployment.yaml file has a replicas field. The value is: {{ .Values.replicaCount }} which means it reads the value from the values.yaml file. What's the value of the replicaCount in the values.yaml file? Change the value to 2 and upgrade the application with the helm upgrade command. What would you expect to be the output of the kubectl get all command?