Docker Multi-Stage Builds

One of the most anticipated features of Docker’s recent 17.05 release is multi-stage builds. Even those familiar with Docker may find themselves asking what a multi-stage Docker build is. Before we learn about multi-stage Docker builds, lets examine a single-stage build and the process by which Docker uses Dockerfile instructions to create an image.

Codefresh supports multi-stage builds as part of the pipeline. Add a Dockerfile and try it out.

Dockerfiles

Docker uses a special text file called a Dockerfile. Dockerfiles use a DSL to describe how to build a Docker image. This DSL is actually a series of commands that, when executed, assemble a Docker image. Users can issue the docker build command against the Dockerfile to perform an automated build that executes these command-line instructions in succession.

These special Dockerfile instructions direct the Docker builder to create a Docker image exactly as you specify. They determine exactly which file system, application, files, and ultimately what running processes will make up the eventual running container spawned from the Docker image.

FROM golang:1.8
MAINTAINER codefresh.io

WORKDIR /go/src/myapp
COPY app.go .

# build
RUN go build -o app .

CMD ["./app"]

Dockerfiles always begin with the FROM instruction, which sets the base image on which to run all subsequent commands. The base image can be any valid Docker image that has been custom created or pulled from public repositories. Subsequent commands like COPY and RUN define not only what the image is to contain, but more frequently command line steps to derive your final application run state.

For example, by using the Golang “SDK” image, a user may copy in their source code and perform a build all inside the Dockerfile, resulting in a convenient single image. Unfortunately, the size of the resulting image frequently ends up ballooning to over 700MB. If your application is deployed using several or more Docker images, this can be a serious problem.

Image Construction

Docker images are made up of a series of union file system layers, the technical storage underpinnings of which is determined by the storage driver. Each executed instruction within a Dockerfile resides in its own layer of the final Docker image. The first layer will always be the inherited base image, as defined by the FROM instruction. Each subsequent layer will store your application, files, etc.

Understanding Docker image layers is not a necessity to getting started with building and running Docker images and containers. However, it is essential for any Docker user who wishes to become proficient in the Docker tool set. How layers are constructed affects the build time, storage, and overall performance of your application. The more instructions in the Dockerfile, the more layers in the image, and the larger the image becomes.

As shown in our previous Golang “SDK” example, images can become quite large if it contains both the build tools, or SDKs, and the compiled application. However, we no longer need the base build tools, only the final application binary.

Life without Multi-Stage Builds

Working off our Dockerfile example above, we see golang source code being copied in (line 5) then built (line 8). The CMD instruction (line 10) defines what the container will run as its primary process. In this case, the application that we just built. We’re able to write such a concise Dockerfile because we are inheriting from the golang official image which contains the golang SDK and necessary build tools for us to compile our application binary.

However, we now have these unnecessary tools in our production Docker image that are not required for running our application. Wouldn’t it be great if we could use a single Dockerfile to build our source code, throw away the unnecessary build tools, and then execute our golang application? This would make the built Docker image extremely small.

Docker Multi-Stage Builds

The new multi-stage build feature in Docker 17.05 does just that. We are now able to define a base image for performing tasks such as building from source, while then defining a second (or even third!) base image to run our application or perform other steps as necessary. Let’s take a look at our previous Dockerfile but now with multi-stage builds:

FROM golang:1.8
MAINTAINER codefresh.io

WORKDIR /go/src/myapp
COPY app.go .

# build
RUN go build -o app .

FROM alpine:latest 
RUN apk --no-cache add ca-certificates

WORKDIR /root/

COPY --from=0 /go/src/myapp/app .

CMD ["./app"]

The final image that this Dockerfile will build starts with the last FROM instruction, FROM alpine:latest. We now have a single Dockerfile that builds our application from an official golang SDK image, and runs our application in the secure and lightweight Alpine image. Not only have we significantly saved on space, but we created a more secure runtime without defining any more Dockerfiles or custom scripts!

Depending on your app size, this image could be just a few MBs versus hundreds with build tools included.

Less is More

Outside of Codefresh, manually scripting multi-stage builds as was previously done was time consuming and overly complicated. With the new multi-stage build feature, we can concisely define multiple build steps and arrive at a final Docker image that meets best practices for production. With Docker 17.06 now GA, this is now a best practice moving forward.

Multi-stage build Dockerfiles are fully supported in Codefresh and have always been available inside of Codefresh.yml.