In this article we will learn about the docker build context and how to optimize it (using the .dockerignore
file).
Docker images can run anywhere on cheap cloud services so why bother optimizing them? Well it turns out there are lots of advantages to using .dockerignore. It can help reduce Docker image size, speedup docker build
and avoid unintended secret exposure (read on to see what I mean). To understand why .dockerignore is so effective you have to understand the build context. Read this useful tutorial and jump into Codefresh to deploy your service!
Docker build context
The docker build
command is used to build a new Docker image. There is one argument you can pass to the build
command that specifies the build context. In most cases you usually pass the current directory as in docker build . -t my-app-image:1.0.1
.
So, what is the Docker build context?
First, remember, that Docker is a client-server application, it consists of Docker client and Docker server (also known as the Docker daemon). The Docker client command line tool talks with the Docker server and asks it to do things. One of these things is Docker build: building a new Docker image. The Docker server can run on the same machine as the client or in a virtual machine, that also can be local, remote or in the cloud.
Why is that important and how is the Docker build context related to this fact?
In order to create a new Docker image, the Docker server needs access to the files, you want to create the Docker image from. So, you need somehow to send these files to the Docker server (because remember the Docker server can be another remote machine). These files are the Docker build context. The Docker client packs all build context files into a tar
archive and uploads this archive to the Docker server. By default the client will take all files (and folders) in current working directory and use them as the build context. While the default behavior sounds good in theory, in practice you must be aware of its implications.
You can also use as Docker context an existing tar
archive or git
repository. In the case of a git
repository, the client will clone it with submodules into a temporary folder and will create a build context archive from it.
Impact on Docker build
The first output line, that you see, running the docker build
command is:
Sending build context to Docker daemon 45.3 MB Step 1: FROM ...
This should make things clear. Actually, every time you are running the docker build
command, the Docker client creates a new build context archive and sends it to the Docker server. So, you are always paying this “tax”: the time it takes to create an archive, storage and network traffic and latency time.
Tip: The rule of thumb is not adding files to the build context, if you do not need them in your Docker image. The Docker context should be minimal and secret-free.
The .dockerignore file
The .dockerignore
file is the tool, that can help you to define the Docker build context you really need. Using this file, you can specify ignore rules and exceptions from these rules for files and folder, that won’t be included in the build context and thus won’t be packed into an archive and uploaded to the Docker server.
Why should you care?
Indeed, why should you care? Computers today are fast, networks are also pretty fast (hopefully) and storage is cheap. So, this “tax” may be not that big, right?
I will try to convince you, that you should care.
Reason #1: Docker image size
The world of software development is shifting lately towards continuous delivery, elastic infrastructure and microservice architecture.
Your systems are composed of multiple components (or microservices), each one of them running inside Linux container. There might be tens or hundreds of services and even more service instances. These service instances can be built and deployed independently of each other and this can be done for every single code commit. More than that, elastic infrastructure means that new compute nodes can be added or removed from the system and its microservices can move from node to node, to support scale or availability requirements. That means, your Docker images will be frequently built and transferred.
When you practice continuous delivery and microservice architecture, image size and image build time do matter. If it much faster to deploy a 5MB Docker image to 100 servers, than a 700MB image. It also helps local development as well.
Reason #2: Unintended secrets exposure
Not controlling your build context, can also lead to an unintended exposure of your code, commit history, and secrets (keys and credentials).
If you copy files into you Docker image with ADD .
or COPY .
command, you may unintentionally include your source files, whole git
history (a .git
folder), secret files (like .aws
, .env
, private keys), cache and other files not only into the Docker build context, but also into the final Docker image.
There are multiple Docker images currently available on DockerHub, that expose application source code, passwords, keys and credentials (for example Twitter Vine). Copying the .git folder in a Docker image by mistake is especially damaging.
Tip: Always mention your
.git
folder in your.dockerignore
file
Reason #3: The Docker build – cache invalidation
A common pattern is to inject an application’s entire codebase into an image using an instruction like this:
COPY . /usr/src/app
In this case, we’re copying the entire build context into the image. It’s also important to understand, that every Dockerfile command generates a new layer. So, if any of the included files changes in the entire build context, this change will invalidate the build cache for COPY . /opt/myapp
layer and a new image layer will be generated on the next build (making the build much slower)
If your working directory contains files that are frequently updated (logs, test results, git history, temporary cache files and similar), you are going to regenerate this layer for every docker build
run. If you don’t manage the build context correctly, your builds will be very slow as cache cannot be used correctly.
The .dockerignore
syntax
Now that you know why you need to control the docker build context, we can see how this is done. The .dockerignore
file is similar to gitignore
file, used by the git
tool. similarly to .gitignore
file, it allows you to specify a pattern for files and folders that should be ignored by the Docker client when generating a build context. While .dockerignore
file syntax used to describe ignore patterns is similar to .gitignore
, it’s not the same.
The .dockerignore
pattern matching syntax is based on the Go filepath.Match()
function and includes some additions.
Here is the complete syntax for the .dockerignore
:
pattern: { term } term: '*' matches any sequence of non-Separator characters '?' matches any single non-Separator character '[' [ '^' ] { character-range } ']' character class (must be non-empty) c matches character c (c != '*', '?', '\', '[') '\' c matches character c character-range: c matches character c (c != '\', '-', ']') '\' c matches character c lo '-' hi matches character c for lo <= c <= hi additions: '**' matches any number of directories (including zero) '!' lines starting with ! (exclamation mark) can be used to make exceptions to exclusions '#' lines starting with this character are ignored: use it for comments
Note: Using the !
character is pretty tricky. The combination of this character and patterns before and after line with the !
character can be used to create more advanced rules.
Examples
Here are some examples. You can also find more if you search for “docker ignore for [your favorite programming language]” in Google.
# ignore .git and .cache folders .git .cache
# ignore all *.class files in all folders, including build root **/*.class
# ignore all markdown files (md) beside all README*.md other than README-secret.md *.md !README*.md README-secret.md
Should the Dockerfile itself be mentioned in .dockerignore
This is a question that has no clear answer and it mostly boils does to personal preference. We suggest you include the Dockerfile in the Docker image (i.e. not mention it in .dockerignore) as it can help the consumers of the image to understand how it was build. Before you do that, make sure that you Dockerfile does not contain any sensitive information.
Conclusion
Use .dockerignore
in every project, where you are building Docker images. It will make your Docker images small, fast and secure. It will help with the Docker cache during local development as well.
At the very least you must mention your .git folder in the docker ignore file. Then add extra files specific to your project such as
- build logs
- test scripts/results
- temporary files
- caching/intermediate artifacts
- local secrets
- Local development files such as docker-compose.yml
For more information see the official Documentation
Ready to try Codefresh, the CI/CD platform for Docker/Kubernetes/Helm? Create Your Free Account Today!
*** This story is also published at my personal blog ***