Docker and Kubernetes Security/Chapter 1

Chapter 1

19 min read

Introduction to Containers and Container Security

Introduction to Containers

In 2015, I joined a new company, hired as a backend engineer, and the main language there was Python. My onboarding to the project took forever as it was alm...

In 2015, I joined a new company, hired as a backend engineer, and the main language there was Python. My onboarding to the project took forever as it was almost impossible to set up the dependencies on my machine. A colleague of mine suggested that I use Docker. It was 2 years old at the time and I had never heard of it. It took me 2 weeks to learn the new technology and fix my local setup. A month later, I was containerizing different projects at the company and making CI/CD pipelines for them.

Arguably the most important thing about containers is their isolation — they bundle everything an app needs to run, from OS packages to runtimes. If you have two different applications using clashing versions of Python, you can run them in two different containers and make them both happy. In that sense, containers are similar to Python virtual environments, but more sophisticated:

Support for all packages: Python virtual environments only support Python packages, but containers support all kinds of packages, including OS packages. You can use containers to package Linux applications, C libraries, and even Java applications, all together.
Shareable: You can share containers with others. Once you have packed your application and its dependencies into a container, you can share it with others. This package is called a Docker image. So, no more "it works on my machine" problems!
Runnable: Another key difference here is that, you can — and probably should — run your application in a container. There are a lot of mechanisms and tools to run your application in this isolated environment. When running a container, Docker orchestrates low-level components to manage networking, storage, and resource limits — and hides all that complexity from the developer. As Solomon Hykes mentioned in his talk, it simplifies container execution.

Now that we've covered the concepts, let's try some Docker commands in practice. We'll first go through Docker basic commands quickly, and then dive deeper into what containers really are behind the curtains.

Docker 101

Let's go back to the command Solomon Hykes ran in his terminal.

$ docker run busybox /bin/echo hello world

This command has two parts. The first part is docker run busybox. It tells Docker to run a container based on the busybox image. The second part is /bin/echo hello world. It tells Docker to run the /bin/echo command inside the container and print hello world to the standard output.

This hello world example exits immediately after printing hello world. Let's run a container that doesn't exit:

$ docker run -d busybox /bin/sh -c \
    "while true; do echo hello world; sleep 1; done"

This command writes to the standard output every second, and as we passed the -d flag, it runs in the background. It's not a very useful piece of software to run, but it's a good example to show how containers work.

You can list the running containers with the docker ps command:

$ docker ps

You can look into the logs of the container with the docker logs command:

$ docker logs <container-id>

You can stop the container with the docker stop command:

$ docker stop <container-id>

So, what exactly are containers? Let's dive deeper and uncover the technical side.

Containers from a More Technical Perspective

Containers, in a nutshell, are isolated processes. They are isolated from the host by using Linux namespaces and control groups (abbreviated as cgroups). Namespaces are used to isolate the process from the host. Cgroups are used to limit the resources the process can use. Cgroups are named this way, because they are a way to group processes and control their resource usage.

Note. The word "namespace" generally means a space where names are defined and are unique. In C++, for example, you can define a namespace to avoid name clashes. There, you can have a function cout in your namespace, and it won't clash with the cout function from the standard library. This is the same in the Linux sense. If you have a namespace for process IDs, you can again have the same process ID in another namespace. This is why we call them namespaces.

The concept of namespaces in Linux was introduced in 2002 by the Linux kernel developers, to isolate the resources of processes. It was inspired by the operating system "Plan 9 from Bell Labs", which was developed in the 1980s, by the same people who developed Unix and C. Plan 9 was designed to have a namespace for each process.

There are different types of Linux namespaces. The ones Docker uses are the following:

PID namespace: It isolates the process IDs. The process ID 1 in the container is not the same as the process ID 1 on the host.
Network namespace: It isolates the network interfaces. As expected, the network interfaces in the container differ from those on the host.
Mount namespace: It isolates the mount points. So, you have different mount points in the container than on the host.
Unix Time Sharing (UTS) namespace: It isolates the hostname and the domain name. Hostname, as you know, the name of the "computer" and while your hostname is lisa-laptop, the container's hostname can be hello-java.
Inter-Process Communication (IPC) namespace: It isolated the processes' communication resources. This is a bit more advanced, but it's used to isolate the shared memory between processes.

Hostname is the thing you see in the terminal, before the :. I'm on Ubuntu and my Bash prompt is mohammad-ali@StealthAMG:~/Dev/DKS/docker-security-book$. Here, StealthAMG is my hostname. You can also write hostname in your terminal to see your hostname.

Domain name is the name of the network domain. A domain name is a way to find a computer on the network. Some very popular domain names that most people know are "Internet domains" like google.com or wikipedia.org. But you can also have a domain name in your local network, like jackslaptop.local, so that next time you want to SSH into Jack's laptop, you can write ssh jackslaptop.local. This is a very useful, e.g. when you have a database server in your local network. In that case, instead of entering the IP address of your database server, you can use a domain name like db.local.

Note. There is also the concept of user namespaces on Linux that isolates the user and group IDs. User namespaces are not used by Docker by default, but they can be enabled with the --userns-remap flag on the Docker daemon:

$ dockerd --userns-remap=default

The command here is dockerd, which we will get into in a bit.

The main reason Docker doesn't enable user namespaces by default is that it breaks some applications. For example, if you run a container with the --userns-remap=default flag, you won't be able to run the ping command inside the container.

As you can see, the containerized processes are isolated using Linux capabilities, but they're still using the same Linux kernel as the host. That's the main difference between containers and virtual machines. Virtual machines are isolated using hardware virtualization, but containers are isolated using Linux namespaces. This difference makes containers much more lightweight than virtual machines. A container is usually smaller and has a much faster startup time than a virtual machine.

But this also means that an attacker could "potentially" break out of the container and access the host. This is a big security concern, and we'll cover it in the next chapters.

Feature	Containers	Virtual Machines
Isolation	Process-level (shares OS kernel)	Hardware-assisted
Startup Time	Seconds	Minutes
Resource Usage	Lightweight (MBs)	Heavy (GBs)
Performance	Near-native	Slight overhead
Use Case	Microservices, CI/CD	Legacy apps, full OS isolation

Table: How are containers different from virtual machines?

Docker is not the only tool making use of containers. Android, for example, being a Linux-based operating system, uses Linux namespaces to isolate applications from each other. This is how Android achieves security and isolation.

In this context, Docker as a tool does two things:

Run the container: It does all the Linux magic in the background and runs the isolated process.
Build the container: It builds the container image. The image is a read-only template that contains the application and its dependencies.

Docker is not the only container runtime though. There are other container runtimes like containerd and CRI-O. To see what they are, let's dive into the container runtime stack.

Container Runtimes

In the early days of Docker, it was a monolithic application that did everything. Its daemon, dockerd, was responsible for building and running containers and managing the container images. It was written in Go and was open-sourced in 2013.

Note: Docker and other container runtimes have begun integrating WebAssembly (WASM) as an alternative lightweight runtime model. While not yet mainstream, it's an emerging trend to watch in coming years.

As Docker grew, it became a huge monolithic application. It was hard to maintain and extend. So Docker, Inc. decided to split it into multiple components:

dockerd: Responsible for running containers and being the interface for the Docker CLI. Its code was stripped down to the bare minimum and most responsibilities were delegated to containerd.
containerd: Responsible for managing container images, as of Docker 1.11 (2016). It was written in Go and was open-sourced in 2015. Later, in 2017, it was donated to the Cloud Native Computing Foundation (CNCF). It's now a graduated project of the CNCF.
runc: Responsible for running containers. It supports the OCI (Open Container Initiative) runtime specification. It was written in Go and was open-sourced in 2015. Later, in 2017, it was donated to the CNCF. It's now also a graduated project of the CNCF.

Cloud Native Computing Foundation is a foundation that was started by the Linux Foundation in 2015. It's a home for open-source projects that are used in cloud-native environment like Kubernetes, containerd, and Prometheus.

OCI (Open Container Initiative) is a standard for container images and runtimes. It's a joint effort between Docker, Inc. and CoreOS. It was announced in 2015 and the first version was released in 2017. It was an initiative to standardize the container ecosystem. It was a huge success and now most container runtimes support OCI.

Moby is a project that was started by Docker, Inc. in 2017 to make Docker more modular. It was a collection of open-source projects that were used to build Docker. Containerd and runc were part of Moby, as well as other projects like SwarmKit, Notary, and Compose.

Kubernetes, the container orchestration platform that started in 2014, was originally using Docker as its container runtime. In 2016, Kubernetes announced CRI, the Container Runtime Interface. It was an interface between Kubernetes and container runtimes. It was a way to make Kubernetes agnostic to the container runtime. Support for containerd was added to Kubernetes in 2017, and it became the default container runtime in Kubernetes 1.20 (2020).

CRI-O is a Kubernetes incubator project that was started in 2016. It's a lightweight container runtime that implements the CRI. It's written in Go and is open-sourced as an alternative to containerd.

Modern container runtimes are modular by design: Docker uses containerd to manage images, and containerd uses runc to create and run containers according to the OCI runtime specification. Now that we looked under the hood of Docker, let's do a test drive and run some containers.

Running a Container

The first command we're going to run is the hello-world container. It's a container that prints hello world and exits.

$ docker run hello-world

The output should be something like this:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
70f5ac315c5a: Pull complete
Digest: sha256:926fac19d22aa2d60f1a276b66a20eb765fbeea2db5dbdaafeb456ad8ce81598
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

There are a few things to notice here:

Lines 1-2. The first time you run a container, Docker will download the container image from the Docker Hub. Docker Hub is a registry of container images, and it's the default registry for Docker. The version of the image is latest. It's the default tag for images.
Lines 12-13. The arm64v8 version of the image was downloaded. It's the version for ARM64 processors. It's the default version for Apple Silicon Macs. The output could show other architectures like amd64 or arm32v6.
The image was run in a container after it was downloaded. The container ran the hello command and printed the output to the standard output.
As you can see, there is no command after the image name. The default command for the image is hello (an executable binary in the image). You can override the default command by passing a command after the image name. For example, you can run the busybox image and override the default command with /bin/echo hello world.

A Docker image name consists of the following parts:

<registry>/<organization>/<image-name>:<tag>

An example is the following:

ghcr.io/dockersecurity-io/book-chapter02:master

In this example:

ghcr.io is the registry. It's the GitHub Container Registry.
dockersecurity-io is the organization. It's the GitHub organization for the book.
book-chapter02 is the image name.
master is the tag. Here it's the branch name, but it can be any tag.

If the registry is not specified, Docker will use the default registry, which is Docker Hub. If the organization is not specified, Docker will use the default organization, which is library. If the tag is not specified, Docker will use the default tag: latest. This is why when running the hello-world image, we didn't specify the registry or the organization or the tag, and it started pulling library/hello-world:latest (well, docker.io/library/hello-world:latest to be precise).

Now let's run the hello-world image with a specific command:

$ docker run hello-world /bin/echo hello world

The output should be something like this:

docker: Error response from daemon: failed to create shim task: OCI runtime 
create failed: runc create failed: unable to start container process: exec: 
"/bin/echo": stat /bin/echo: no such file or directory: unknown.

It says that the /bin/echo command doesn't exist. This is expected, as the hello-world image doesn't have normal Linux commands. It's a minimal image that only has the hello command.

Note. Here Docker daemon says it cannot create a shim task. The shim task failed because, in turn, OCI runtime failed to create a container. And yet again that's because runc failed to start the container process. And finally, the container process failed to start because it couldn't find the /bin/echo command. This is a good example of how the container runtime stack works. Shim is a small piece of code that acts as a proxy between the container runtime and the container process. It's used to handle signals and other low-level stuff.

We're going to test this command with the ubuntu image, but to do so, we're going to pull it first:

$ docker pull ubuntu

The output should be something like this:

Using default tag: latest
latest: Pulling from library/ubuntu
5af00eab9784: Already exists 
Digest: sha256:0bced47fffa3361afa981854fcabcd4577cd43cebbb808cea2b1f33a3dd7f508
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

This command downloaded the ubuntu:latest image from the Docker Hub. We can download a specific version of the image by specifying the version after the image name. For example, we can download the ubuntu:24.04 image:

$ docker pull ubuntu:24.04

You can find the available versions of the ubuntu image on the Docker Hub.

Note that ubuntu:24.04 also doesn't always download the same image. It downloads the latest image for Ubuntu 24.04. It can download a different minor version of Ubuntu 24.04. For example, it can download ubuntu:24.04.3 instead of ubuntu:24.04.4. The only way to download a specific image is to use the image digest.

Now that we have downloaded the ubuntu image, let's learn about its digest:

$ docker image inspect ubuntu

It returns a huge JSON object. We're only interested in the RepoDigests field:

"RepoDigests": [
    "ubuntu@sha256:33a5cc25d22c45900796a17cb09f09ea00b779e3b2026b4fc2faba"
]

We could download the same image with the following command:

$ docker pull ubuntu@sha256:33a5cc25d22c45900796a17cb09f09ea00b779e3b2026b4fc2faba

This command will always download the same image, regardless of the version of Ubuntu 24.04 or their tags.

Now let's run the ubuntu image:

$ docker run ubuntu

It doesn't do anything and exits. This is expected, as the default command for the ubuntu image is bash. We can override the default command with /bin/echo hello world:

$ docker run ubuntu /bin/echo hello world

We can also attach our terminal to the container's terminal by passing the -it flag:

$ docker run -it ubuntu

This will attach our terminal to the container's terminal. We can run commands inside the container now. For example, we can run the ls command:

root@c7593865b1ac:/# ls

The output should be something like this:

bin  boot  dev  etc  home  lib  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

You can exit the container by typing exit or pressing Ctrl + D. This will also stop the container. To have the container running in detached mode, you can pass the -d flag:

$ docker run -d ubuntu /bin/sh -c "while true; do echo hello world; sleep 1; done"

Docker will return a hash to you, that is the container ID. Now you can attach your terminal to the container's terminal with the docker attach command:

$ docker attach <container-id>

It will start writing down hello world every second. Doing a Ctrl + C will kill the process now, hence the container. To attach your terminal to the same container, but not the same process on it, you can use the docker exec command:

$ docker exec -it <container-id> /bin/bash

This will run another Bash instance on the container and attach your terminal to it. You can run the ls command again:

root@c7593865b1ac:/# ls

And exiting from it won't kill the container.

Exercises

Install Docker on your machine. Create an account on Docker Hub and log in to it with the docker login command.
Run the hello-world container. Inspect the container with the docker inspect command. What's the image name? What's the image digest?
Create your own Docker image. This image will ping 8.8.8.8 every second. Create a new directory and in it create a file called Dockerfile with the content below:
```
FROM ubuntu:24.04

RUN apt-get update && apt-get install -y iputils-ping

CMD ["/bin/sh", "-c", "while true; do echo Hello World; sleep 1; done"]
```
Build the image with the docker build command in the same directory as the Dockerfile:
```
$ docker build -t pinger .
```
Run the image with the docker run command:
```
$ docker run pinger
```
You can stop the container with the Ctrl + C command. You can also stop it with the docker stop command:
```
$ docker stop <container-id>
```
Tag the pinger container with your own Docker ID and push it to the Docker Hub:
```
$ docker tag pinger <docker-id>/pinger
$ docker push <docker-id>/pinger
```
Now stop all the running containers with the docker stop command:
```
$ docker stop $(docker ps -q)
```
And then do a prune of the images and containers:
```
$ docker system prune -a
```
Now run the pinger container again:
```
$ docker run <docker-id>/pinger
```
It should start pinging. Kill it with the Ctrl + C command.
Run the pinger container again, but this time in the background:
```
$ docker run -d <docker-id>/pinger
```
List the running containers with the docker ps command:
```
$ docker ps
```
Execute the following command on the container to get the process IDs of the processes running inside the container:
```
$ docker exec <container-id> ps aux
```
The output should be something like this:

User PID %CPU %Mem Start Time Command
root 1 0.0 0.0 12:39 0:00 ping 8.8.8.8
root 7 0.0 0.0 12:43 0:00 ps aux
Table: Output of the ps aux command on the container

You can see that the process ID 1 is the ping command. You can also see that the process ID 7 is the ps aux command. Now let's try to find the same process IDs on the host machine. Execute the following command on the host machine:
```
$ ps aux | grep ping
```
If you're running Docker Desktop, the command will return only one process which is the grep itself. This is because Docker Desktop runs inside a virtual machine. The easiest way to do that is using a Docker image created by Justin Cormack, the CTO of Docker, Inc.:
```
$ docker run -it --rm --privileged --pid=host justincormack/nsenter1
```
Let's run the ps aux command again:
```
$ ps aux | grep ping
```
The output should be something like this:
```
45074 root      0:00 grep ping
67277 root      0:07 ping 8.8.8.8
```
You can see that the ping process is visible on the host machine but with a different process ID. Let's try to kill the process on the host machine:
```
$ kill 67277
```
Of course, you need to adjust the process ID to the one you have on your machine. Now let's check the list of running containers:
```
$ docker ps
```
What would you expect to be the output?
Following the instructions on Docker's website, run Docker in rootless mode. You can find the instructions here: Rootless mode. Then try to run the pinger container again. What would you expect to be the output?

User	PID	%CPU	%Mem	Start	Time	Command
root	1	0.0	0.0	12:39	0:00	`ping 8.8.8.8`
root	7	0.0	0.0	12:43	0:00	`ps aux`
Table: Output of the `ps aux` command on the container