Optimizing Docker Image Size with Multi-Stage Builds

Overview

One of the challenges developers face when building Docker images is managing image size. Large Docker images lead to slow download and deployment times, consume more storage, and can make container orchestration less efficient. Docker multi-stage builds provide a powerful way to tackle this problem by allowing you to separate the build environment from the runtime environment, reducing the overall size of the final image.

In this post, we’ll explore how to use multi-stage builds to optimize Docker images, improving performance while keeping your images as small and efficient as possible.

1. The Problem with Large Docker Images

Docker images tend to grow in size when additional layers are added for software dependencies, libraries, tools, and application code. This happens when you install compilers, development tools, or other utilities needed for building the software but unnecessary for running the application in production. These build-time tools and artifacts unnecessarily increase the final image size, even though they aren't required for the runtime environment.

Common issues with large Docker images include:

Slower deployments: Pulling large images from a registry takes more time, leading to slower deployments.
Higher storage costs: Large images take up more disk space on both the registry and the host machines where containers run.
Longer build times: The more layers your image contains, the longer it takes to rebuild and cache those layers.

To solve this, Docker offers multi-stage builds.

2. Introduction to Multi-Stage Builds

Multi-stage builds were introduced in Docker 17.05 to solve the problem of unnecessarily large images by allowing you to use multiple FROM statements in a single Dockerfile. Each FROM instruction starts a new stage in the build process. The idea is to use a builder stage that includes all the tools, dependencies, and files needed to build the application, and then use a separate runtime stage that only contains the necessary files and dependencies required to run the application.

This way, you can discard unnecessary files, tools, and libraries from the final image, making it as small as possible.

Key Concepts of Multi-Stage Builds:

Builder stage: This stage contains all the dependencies, tools, and code needed to compile or build your application.
Runtime stage: This stage contains only the minimal set of files needed to run the application in production, often using a much smaller base image.
Copying files between stages: You can selectively copy files or directories from one stage to another using the COPY --from=<stage> directive.

3. Benefits of Multi-Stage Builds

Multi-stage builds offer several advantages:

Reduced Image Size: By copying only the necessary artifacts from the build stage to the final stage, you eliminate unnecessary build tools and dependencies from your final image, reducing its size.
Improved Security: A smaller image typically has fewer libraries and utilities, which minimizes the attack surface for security vulnerabilities.
Faster Deployments: Smaller images take less time to transfer between Docker registries and target environments (e.g., production servers), speeding up deployment pipelines.
Easier Maintenance: With smaller images, it's easier to manage and maintain your Docker setup over time, as you focus on keeping only essential files.

4. How to Implement a Multi-Stage Build

Implementing multi-stage builds involves splitting your Dockerfile into multiple stages, each with a different purpose. Here’s a high-level process for implementing multi-stage builds:

Step 1: Define the first stage (builder), where you compile the application or package the necessary binaries.
Step 2: In the next stage, use a smaller base image that only contains the necessary runtime dependencies (such as minimal operating system libraries and the runtime itself).
Step 3: Copy the application’s final artifacts (e.g., binaries, static files) from the builder stage to the runtime stage.
Step 4: Discard all the unnecessary dependencies, libraries, and tools from the build stage.

5. Step-by-Step Example: Optimizing a Go Application

Let’s walk through a real-world example of optimizing a Go application using multi-stage builds. Go applications are a good example because they compile down to a single static binary, making it easy to discard the build environment once the binary is produced. We will base our example project on the other post I published before Getting Started with Go: Writing and Running Your First "Hello, World!" Application on Linux.

Project Setup

Assume you have a simple Go project with the following files:

main.go: Contains your Go application code.
Dockerfile: Contains the instructions for building and running the Docker image.

Here’s a simple Go application (main.go):

// main.go
package main

import "fmt"

func main() {
    fmt.Println("Hello, Docker multi-stage builds!")
}

Traditional Dockerfile (Without Multi-Stage Builds)

Here’s how you might write a traditional Dockerfile for building and running the Go application:

# Traditional Dockerfile (with multi-stage builds)

# Step 1: Build the Go app in a builder stage
FROM golang:1.18
WORKDIR /app
COPY . .
RUN go build -o myapp main.go
CMD ["./myapp"]

First, let's initiate the build process to generate our image. Once the build is complete, we can proceed to verify the image size. Here is the command to build the image.

docker build -t my-go-app-0 --no-cache  .

Check image size:

docker images
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
my-go-app-0   latest    446251e15a7d   6 seconds ago   969MB
golang        1.18      c37a56a6d654   22 months ago   965MB

The size of golang:1.18 itself is over 950 MB, so the final image size with our application onboard is the start image size plus the code and the actual results of the build.

Multi-Stage Dockerfile (with the Same Image)

# Dockerfile (without multi-stage builds)

# Step 1: Build the Go app
FROM golang:1.18 as builder
WORKDIR /app
COPY . .
RUN go build -o myapp main.go

# Step 2: Run the Go app
FROM golang:1.18
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["./myapp"]

In this Dockerfile, both the build and runtime environments use the same base image (golang:1.18), which includes the Go compiler and all the build tools. However, this results in a larger image because the final image includes the entire Go runtime and build dependencies, even though they’re not needed to run the app.

If we build this image and check its size:

docker build -t my-go-app-0 --no-cache .
docker images
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
my-go-app-0   latest    446251e15a7d   6 seconds ago   969MB
my-go-app-1   latest    5ed8729274eb   3 minutes ago   7.38MB
golang        1.18      c37a56a6d654   22 months ago   965MB

We have image of only 7.38 MB instead of 969 MB. It's because the image is detached from his build environment.

Multi-Stage Dockerfile (with some Optimization)

Here’s how you can optimize the Dockerfile using multi-stage builds with different images for each step:

# Optimized Dockerfile (with multi-stage builds)

# Step 1: Build the Go app in a builder stage
FROM golang:1.18 as builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Step 2: Use a smaller base image for the runtime
FROM alpine:3.14
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["./myapp"]

Explanation of the Dockerfile:

Builder stage (FROM golang:1.18 as builder):
- The first stage uses the official Go image (golang:1.18), which includes all the tools required to compile the Go app.
- The app is copied to the container’s /app directory, and the go build command is used to compile the app into a binary (myapp).
Runtime stage (FROM alpine:3.14):
- The second stage uses a much smaller base image (alpine:3.14), which is only about 5.61 MB in size and includes just enough to run the application.
- The compiled binary (myapp) is copied from the builder stage using the COPY --from=builder instruction.
- The final image only contains the compiled binary and the minimal Alpine Linux base, resulting in a significantly smaller image.

Building and Running the Multi-Stage Build

To build the image, run:

docker build -t my-go-app-2 .

If, after the build, you check the images for your Docker, you'll see following output:

docker images
REPOSITORY    TAG       IMAGE ID       CREATED          SIZE
my-go-app-0   latest    446251e15a7d   22 minutes ago   969MB
my-go-app-1   latest    5ed8729274eb   25 minutes ago   7.38MB
my-go-app-2   latest    7cb2d5d990f2   28 minutes ago   7.38MB
alpine        3.14      9e179bacf43c   19 months ago    5.61MB
golang        1.18      c37a56a6d654   22 months ago    965MB

To run the container:

docker run my-go-app-2

You should see the output:

Hello, Docker multi-stage builds!

Image Size Comparison

Without multi-stage builds: The image size is around 950 MB because the full Go environment is included.
With multi-stage builds: The image size is reduced to ~10 MB since only the compiled binary and minimal runtime environment are included.

6. Best Practices for Using Multi-Stage Builds

Use Minimal Base Images: For the runtime stage, use lightweight images like alpine, scratch, or minimal distributions specific to your programming language (e.g., node:alpine, python:alpine).
Avoid Copying Unnecessary Files: Use .dockerignore to exclude files and directories that are not required in the image (e.g., .git, node_modules, etc.).
Leverage Build Caching: Docker reuses cached layers when possible, so structure your Dockerfile to optimize caching. For example, place COPY and RUN instructions that rarely change at the top.
Clean Up After Installations: If you install packages or libraries during the build process, clean up any unnecessary files afterward (e.g., apt-get clean, rm -rf).
Use Named Build Stages: By naming your build stages (e.g., as builder), you make it easier to reference and manage them.

Reduce the Number of Layers: Docker caches layers to speed up builds, but too many layers can lead to large images. Combine multiple related instructions into a single RUN command to reduce layers.Example:

RUN apt-get update && apt-get install -y \
    curl \
    git \
    && rm -rf /var/lib/apt/lists/*

7. Common Pitfalls and How to Avoid Them

Not Using Multi-Stage Builds for All Projects: Even if your project is small, using multi-stage builds is beneficial as it sets up best practices for scalability.
Including Build Tools in the Final Image: Avoid leaving build tools or unnecessary dependencies in the runtime image. Use multi-stage builds to remove them.
Unoptimized Dockerfile Structure: Not ordering Dockerfile instructions effectively can lead to inefficient caching and larger images. Pay attention to how Docker caches layers and structure the Dockerfile to reuse layers when possible.

Conclusion

Docker multi-stage builds provide an excellent way to reduce the size of your Docker images, making them more efficient for production deployments. By separating the build and runtime environments and copying only the necessary artifacts into the final image, you can significantly reduce the overhead associated with large images.

In this post, we explored how multi-stage builds work, demonstrated their implementation using a Go application, and discussed best practices to ensure you're getting the most out of this feature. Multi-stage builds are particularly useful for applications that require extensive build-time tools but minimal runtime dependencies.

By leveraging multi-stage builds, you'll not only optimize your image size but also improve the performance and security of your Dockerized applications.