Do not ignore .dockerignore (it’s expensive and potentially dangerous)

Do not ignore .dockerignore (it’s expensive and potentially dangerous)

7 min read

In this article we will learn about the docker build context and how to optimize it (using the .dockerignore file).

Docker images can run anywhere on cheap cloud services so why bother optimizing them? Well it turns out there are lots of advantages to using .dockerignore. It can help reduce Docker image size, speedup docker build and avoid unintended secret exposure (read on to see what I mean). To understand why .dockerignore is so effective you have to understand the build context. Read this useful tutorial and jump into Codefresh to deploy your service!

Docker build context

The docker build command is used to build a new Docker image. There is one argument you can pass to the build command that specifies the build context. In most cases you usually pass the current directory as in docker build . -t my-app-image:1.0.1.

So, what is the Docker build context?

First, remember, that Docker is a client-server application, it consists of Docker client and Docker server (also known as the Docker daemon). The Docker client command line tool talks with the Docker server and asks it to do things. One of these things is Docker build: building a new Docker image. The Docker server can run on the same machine as the client or in a virtual machine, that also can be local, remote or in the cloud.

Why is that important and how is the Docker build context related to this fact?

In order to create a new Docker image, the Docker server needs access to the files, you want to create the Docker image from. So, you need somehow to send these files to the Docker server (because remember the Docker server can be another remote machine). These files are the Docker build context. The Docker client packs all build context files into a tar archive and uploads this archive to the Docker server. By default the client will take all files (and folders) in current working directory and use them as the build context. While the default behavior sounds good in theory, in practice you must be aware of its implications.

You can also use as Docker context an existing tar archive or git repository. In the case of a git repository, the client will clone it with submodules into a temporary folder and will create a build context archive from it.

Impact on Docker build

The first output line, that you see, running the docker build command is:

Sending build context to Docker daemon 45.3 MB
Step 1: FROM ...

This should make things clear. Actually, every time you are running the docker build command, the Docker client creates a new build context archive and sends it to the Docker server. So, you are always paying this “tax”: the time it takes to create an archive, storage and network traffic and latency time.

Tip: The rule of thumb is not adding files to the build context, if you do not need them in your Docker image. The Docker context should be minimal and secret-free.

The .dockerignore file

The .dockerignore file is the tool, that can help you to define the Docker build context you really need. Using this file, you can specify ignore rules and exceptions from these rules for files and folder, that won’t be included in the build context and thus won’t be packed into an archive and uploaded to the Docker server.

Why should you care?

Indeed, why should you care? Computers today are fast, networks are also pretty fast (hopefully) and storage is cheap. So, this “tax” may be not that big, right?
I will try to convince you, that you should care.

Reason #1: Docker image size

The world of software development is shifting lately towards continuous delivery, elastic infrastructure and microservice architecture.

Your systems are composed of multiple components (or microservices), each one of them running inside Linux container. There might be tens or hundreds of services and even more service instances. These service instances can be built and deployed independently of each other and this can be done for every single code commit. More than that, elastic infrastructure means that new compute nodes can be added or removed from the system and its microservices can move from node to node, to support scale or availability requirements. That means, your Docker images will be frequently built and transferred.

When you practice continuous delivery and microservice architecture, image size and image build time do matter. If it much faster to deploy a 5MB Docker image to 100 servers, than a 700MB image. It also helps local development as well.

Reason #2: Unintended secrets exposure

Not controlling your build context, can also lead to an unintended exposure of your code, commit history, and secrets (keys and credentials).

If you copy files into you Docker image with ADD . or COPY . command, you may unintentionally include your source files, whole git history (a .git folder), secret files (like .aws, .env, private keys), cache and other files not only into the Docker build context, but also into the final Docker image.

There are multiple Docker images currently available on DockerHub, that expose application source code, passwords, keys and credentials (for example Twitter Vine). Copying the .git folder in a Docker image by mistake is especially damaging.

Tip: Always mention your .git folder in your .dockerignore file

Reason #3: The Docker build – cache invalidation

A common pattern is to inject an application’s entire codebase into an image using an instruction like this:

COPY . /usr/src/app

In this case, we’re copying the entire build context into the image. It’s also important to understand, that every Dockerfile command generates a new layer. So, if any of the included files changes in the entire build context, this change will invalidate the build cache for COPY . /opt/myapp layer and a new image layer will be generated on the next build (making the build much slower)

If your working directory contains files that are frequently updated (logs, test results, git history, temporary cache files and similar), you are going to regenerate this layer for every docker build run. If you don’t manage the build context correctly, your builds will be very slow as cache cannot be used correctly.

The .dockerignore syntax

Now that you know why you need to control the docker build context, we can see how this is done. The .dockerignore file is similar to gitignore file, used by  the git tool. similarly to .gitignore file, it allows you to specify a pattern for files and folders that should be ignored by the Docker client when generating a build context. While .dockerignore file syntax used to describe ignore patterns is similar to .gitignore, it’s not the same.

The .dockerignore pattern matching syntax is based on the Go filepath.Match() function and includes some additions.

Here is the complete syntax for the .dockerignore:

pattern:
{ term }
term:
'*' matches any sequence of non-Separator characters
'?' matches any single non-Separator character
'[' [ '^' ] { character-range } ']'
character class (must be non-empty)
c matches character c (c != '*', '?', '\', '[')
'\' c matches character c

character-range:
c matches character c (c != '\', '-', ']')
'\' c matches character c
lo '-' hi matches character c for lo <= c <= hi

additions:
'**' matches any number of directories (including zero)
'!' lines starting with ! (exclamation mark) can be used to make exceptions to exclusions
'#' lines starting with this character are ignored: use it for comments

Note: Using the ! character is pretty tricky. The combination of this character and patterns before and after line with the ! character can be used to create more advanced rules.

Examples

Here are some examples. You can also find more if you search for “docker ignore for [your favorite programming language]” in Google.

# ignore .git and .cache folders
.git
.cache
# ignore all *.class files in all folders, including build root
**/*.class
# ignore all markdown files (md) beside all README*.md other than README-secret.md
*.md
!README*.md
README-secret.md

Should the Dockerfile itself be mentioned in .dockerignore

This is a question that has no clear answer and it mostly boils does to personal preference. We suggest you include the Dockerfile in the Docker image (i.e. not mention it in .dockerignore) as it can help the consumers of the image to understand how it was build. Before you do that, make sure that you Dockerfile does not contain any sensitive information.

Conclusion

Use .dockerignore in every project, where you are building Docker images. It will make your Docker images small, fast and secure. It will help with the Docker cache during local development as well.

At the very least you must mention your .git folder in the docker ignore file. Then add extra files specific to your project such as

  • build logs
  • test scripts/results
  • temporary files
  • caching/intermediate artifacts
  • local secrets
  • Local development files such as docker-compose.yml

For more information see the official Documentation

Ready to try Codefresh, the CI/CD platform for Docker/Kubernetes/Helm? Create Your Free Account Today!

*** This story is also published at my personal blog ***

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

Ready to Get Started?
  • safer deployments
  • More frequent deployments
  • resilient deployments