An introduction to Docker and Docker Compose

Christian Digiorno
devartis
Published in
10 min readFeb 7, 2020

--

What is Docker? What is it for? And why do people mention the word “compose” sometimes?

Perhaps you have heard (or read) about Docker as something related to software and, more specifically, applications, without knowing what Docker is. Or maybe you have a vague idea about it because Docker is being used in your project, but you never really got to understand how it works and just used a few useful commands your coworkers told you about. Either way, I will answer the three questions I started the post with.

However, before we begin, I would like to make sure there are some concepts related to operating systems that are understood by both of us.

Operating systems execution modes

Operating systems (OS) manage the execution of their own code and the code defined by the user differently.

For their own code, they utilize their kernel, which is a program that functions as the OS’s core, and has total control over the system.

  • Kernel mode: it is the mode that is used in order to protect the OS, because the code that has to be run might have instructions that can cause damage; therefore, the system is protected by allowing the execution of instructions known as privileged (those that can cause alterations in the OS, its resources, or input/output devices) only in this mode.
  • User mode: this mode is used when a user application is being run; however, it is necessary to switch to kernel mode every time a privileged instruction needs to be executed. Once that instruction is finished, it goes back to user mode.

Virtualization

It is the process of creating a virtual representation of something such as applications, servers, storage devices, and networks. One of the classic examples of virtualization is one or more OS (known as guests) running within another OS (the host).

Applications running in a virtual machine cannot access the hardware directly; every time they try, an emulation is run by a software called hypervisor in order to complete the task.

We can say that a virtual machine represents a hardware level virtualization. It is isolated from the host, managing its own kernel and resources.

Containerization

It is a method for bringing virtualization to OS level; the kernel allows the existence of multiple isolated user-space instances. It does not require a hypervisor, since the goal is not hardware level virtualization, which means the impact on the system’s performance is not as notorious.

As a container runs without the need of booting a separate OS, it is lightweight and limits the resources (e.g. memory) it uses to keep running.

The application you want to run in the container is packaged along with all the configuration files, libraries and dependencies it requires, and will function consistently in different environments.

All right, then. Now that we know for sure we are standing on the same ground, let’s begin with…

Docker

Docker is a containerization platform that packages an application along with all its dependencies as containers.

  • Every application runs in a separate container and has its own set of libraries and dependencies
  • It will also make sure there will be process level isolation, which means the applications will be independent of each other, and thus there will be no interference between them.

Even though a virtual machine is able to run in any platform using its own configuration, Docker provides a similar benefit but with a noticeably lower overhead.

Docker offers developers greater control and autonomy over the platform the software is being run in, with all its dependencies already defined and integrated, and ready to be deployed. The application’s requirements are decoupled from the infrastructure’s.

When it comes to developing and testing an application, it is desirable for the development environment to be as similar as possible to the productive environment. On the other hand, it should be as fast as possible, as an interactive usage is sought.

Taking all of this into consideration, we can say that Docker can come in handy because of its performance and the simplicity it provides for configuring applications. No more problems when trying to use the same code in different computers!

Since Docker runs Linux images, employing it on Windows or MacOS implies using tools for machine virtualization, greatly losing performance. However, Docker Windows Containers were introduced, which helps mitigate the problem.

Docker containers

A Docker container is a running instance of an image as a process within the host. From that same image, multiple containers can be created.

Containers contain all the binaries, libraries, and the application itself. Since those binaries and libraries are run with the host’s kernel, it is possible to do so rapidly.

In contrast, virtual machines run with a hypervisor and have their own OS. This notably increments its size, making setting virtual machines be more complex and require more resources to keep running. Containers are more efficient because there are no extra OS involved, and the libraries and resources they use are shared as they are needed.

Images

An image contains everything required to run applications in containers. Yes, I know I am being repetitive, but I want to make it crystal clear that containers are based on images.

Since all the application’s dependencies already exist within the image, the deployment process to other computers is fast. All they require is, of course, having Docker installed.

In order to create an image locally, we use a text file called Dockerfile, within which all the instructions that need to be executed and an image that must be used as a base are specified. Or, if you want to use an already existing image, you can download it from https://hub.docker.com if it has been uploaded there.

Layers

An image contains layers; every one of them is a set of differences from the previous layer caused by certain instructions in the Dockerfile. Docker uses a layer cache to optimize the image generation process, making it become faster since it will not have to be rebuilt from scratch.

Some of the possible commands for the Dockerfile (ADD, RUN and COPY) cause the previous image to change, creating new layers. However, if an instruction for which there is no cached layer is found (which happens if you modify an existing instruction or write a new one) while building the image, the cache will not be used from that point onward; it is convenient to minimize the cache invalidation with practices such as avoiding writing commands that are often modified at the beginning of the Dockerfile (they could be left for execution as late as possible).

The problematic part is that layers increase the image’s size, which means it will occupy more space and take longer to be downloaded. I suggest that you write multiple related commands in the same line.

Furthermore, it is strongly recommended not to use commands such as apt-get update in a separate instruction from the one that will install packages, since the update might be cached (which means Docker will not run it) and changing the packages to be installed will make it so Docker does run them; there might be issues regarding the versions that can be found available and the ones that the installation might be trying to search for. Therefore, it’s best to write the apt update and install instructions in the same line.

Data persistence in Docker

Data isn’t persisted once a container no longer exists. That is really inconvenient, right? I mean, there are PostgreSQL images available in Docker hub but, if what I said in the first sentence is true, how is a database inside a container supposed to be useful? As soon as the container goes down, it will lose everything!

Well, that is not a real issue because of the existence of volumes. A volume is a data persistence mechanism used by Docker containers. Its life cycle is independent from the container’s; said container can be brought down without the volume disappearing. The data is stored in the host, generally in /var/lib/docker/volumes.

Bind mounts are a similar option; there are two locations, one from the host and another one from the container, that point to the same file. Unlike volumes, they cannot be configured in a Dockerfile. They can exist in the host’s filesystem, and be modified by other containers or the host’s processes.

Dockerfile

It is the file where the instructions that need to be run to build an image will be specified.

There are several different instructions you can customize your Dockerfile with, and lots of commands you can run with Docker to set up and manage your containers. While I’d like to list them all right here, it would make for a (very) long and tedious read; instead, I suggest that you visit the repository I originally wrote everything in. You’ll be able to find most of the important ways to configure your images, start/stop/delete the containers, make them execute commands, and read their logs. I also explain how to install it on Ubuntu, and have a directory “docker” with a Dockerfile and bash files to set up containers in different ways, which you can use (and modify) to practice and fully grasp how all the commands function.

Here’s a very basic Dockerfile sample:

FROM ubuntu:18.04

RUN echo “Building image!”

RUN apt-get update && apt-get install -y iputils-ping

CMD [“ping”, ”medium.com”, ”-c”, ”5"]

With this file, we are asking Docker to use an image of Ubuntu 18.04 as the base for our own image. Then, it will say “Building image!”, update the package lists from the repositories and install iputils-ping. Once we decide to start a container, it will ping medium.com five times, since it is the command we specified.

If you save this Dockerfile somewhere in your host and run docker build -t "my_example" . in the same directory, you’ll have a new, custom Ubuntu image! Then, you can run docker run my_example to set up the container.

The next step is making sure the container is running. docker ps lists all currently running containers; it should show one (the one we just started)… but there are none! Well, I sort of left out the fact that containers run for as long as their command does… and, since the ping has already ended, so has the container. docker ps -a shows all the containers in your computer, even the ones that aren’t currently running, which should show a container that has already exited. A container will run indefinitely with a command that doesn’t stop, like a runserver for a Django application.

Another thing I would like you to keep in mind is that docker run will make the container get attached to the terminal, which means that if the terminal were to be killed for whatever reason, the container would exit. By just adding a flag, the container runs on the background instead of being dependent on a terminal; for the sake of using the same example, let’s go with docker run -d my_example . Do note, however, that starting the container in detached mode will not show the ping’s output, so you will need to read the log.

Docker compose

It is a tool that allows you to easily manage multiple containers that are related to each other. It works by applying rules declared in a “.yml” configuration file; these rules are used for the services, every one of them is the representation of the configuration managed by Docker Compose that corresponds to a certain container.

Instead of using docker run, you merely need to run a compose command such as docker-compose up -d so all containers start one after the other.

Service != container. A container starts up based on a service defined in the “.yml” file, and there can be multiple containers being run per service.

Networks

Networks define communication rules between services, and between a container and its host. Docker containers on the same network can find each other and communicate.

Let’s say, for example, that you have two containers: one for your web application, and another one for your database, with their service’s names being “web” and “db” respectively. If web wants to use the database, it connects to postgres://db:5432 (assuming you left its port as 5432).

Dependencies between services

It is possible that a service needs to start before another one (e.g. the web application needs to communicate with the database’s container at startup). You can set the dependency in the “.yml” file, specifying in a service’s configuration which other service (or services) it needs to wait for before trying to start.

In the repository I mentioned earlier there is a directory “compose” with a Dockerfile, an entrypoint file (which I wrote so a container executes its content on startup) and multiple compose configuration files you can use to practice.

When not to use Docker (examples):

  • You want to maximize your application’s performance (even though it has less overhead than virtual machines, it is not zero)
  • You consider security as a critical consideration: keeping different components isolated in containers has its benefits, but brings a bit of trouble to the table; container technology has access to the kernel’s subsystems (so a “kernel panic” caused by a container will affect the host as well) and its resources (if a container can monopolize some of the resources, it might affect the other containers, facilitating a denial-of-service attack because there would be inaccessible parts of the system), and if an attacker manages to access a container and escape from it entering another one (or even the host), he will have the same user privileges he had in the first container he got inside of (which makes having the root user as a default difficult to recommend)
  • Having a GUI for everything is a must; containers run on terminals and, even though there are some tricks you can use (such as X-forwarding or VNC), they are somewhat ugly and difficult to manage.

Final thoughts

At a distance, Docker might seem difficult to use, perhaps even intimidating, but, truth be told, it can prove to be a very useful tool for developers and sysadmins, packaging applications along with its dependencies and letting people just download images and setting them up consistently.

I strongly recommend you to create your own Dockerfiles and to play with them to reinforce what you know (and you will probably learn new stuff).

Thank you for reading, and I hope you’ll find the opportunity to use Docker in the future!

Visit us!

--

--