Using Multi-Stage Builds to Reduce Docker Image Size

Introduction

One of the most important aspects of working with Docker is building efficient and optimized Docker images. A well-optimized image not only reduces the size of the image itself but also speeds up the build process, improves security, and allows for faster deployment. One of the most powerful techniques for achieving these optimizations is multi-stage builds.

Multi-stage builds allow you to create complex Dockerfiles that only include the files and dependencies that are necessary for the final image. By using multiple stages, you can compile and build your application in one stage and then copy only the required artifacts into the final image, leaving behind the large build dependencies and tools. This results in a smaller, cleaner, and more efficient image that’s optimized for deployment.

In this post, we’ll dive into the following:

The Problem with Traditional Docker Builds
What Are Multi-Stage Builds?
Benefits of Multi-Stage Builds
How Multi-Stage Builds Work: Step-by-Step
Writing an Optimized Multi-Stage Dockerfile
Real-World Example of Multi-Stage Builds
Best Practices for Multi-Stage Builds
Additional Tips for Reducing Docker Image Size
Conclusion

1. The Problem with Traditional Docker Builds

Before the introduction of multi-stage builds, developers would often write Dockerfiles that installed all the necessary build dependencies (such as compilers, SDKs, etc.) along with the application. However, these build dependencies aren’t needed when running the application in production.

The result is often a bloated Docker image that includes unnecessary libraries, files, and binaries, leading to slower builds, larger image sizes, and potential security risks. For example:

You might have compilers, debuggers, or even package managers installed in the final image, even though they are only needed during the build process.
Layers of unnecessary files (such as temporary build files) might remain in the final image, taking up space.

This problem is exacerbated when building large applications, especially when using frameworks or languages that require complex build steps, like Go, Java, or Node.js. The larger the image, the slower it becomes to push and pull from Docker registries, and it becomes more cumbersome to manage in production environments.

2. What Are Multi-Stage Builds?

Multi-stage builds are a Docker feature introduced in Docker 17.05 that allow you to use multiple FROM statements in a single Dockerfile. Each FROM statement initializes a new build stage, and you can selectively copy artifacts from one stage to another.

This means you can use one stage to compile and build your application (with all the necessary build dependencies) and then use a separate, minimal stage that includes only the files needed to run the application.

In essence, you can think of multi-stage builds as an internal optimization that allows you to leave behind the "heavy lifting" and only ship the lightweight, production-ready version of your application.

3. Benefits of Multi-Stage Builds

Multi-stage builds come with several key benefits:

Smaller Image Sizes: By copying only the necessary build artifacts into the final image, you eliminate all unnecessary dependencies, resulting in a smaller and leaner image.
Simplified Dockerfiles: Multi-stage builds allow you to keep your build process and runtime environment within the same Dockerfile, making the overall build process more straightforward and easier to manage.
Better Security: By excluding unnecessary packages and dependencies from the final image, you reduce the potential attack surface of your application. This is especially important in production environments where minimizing security risks is a priority.
Improved Build Times: Smaller images mean faster builds, which translates to faster CI/CD pipelines and quicker deployment times.
Ease of Use for Multiple Environments: You can build an image for different environments (development, testing, production) in the same Dockerfile by using different stages.

4. How Multi-Stage Builds Work: Step-by-Step

Multi-stage builds work by defining multiple build stages within a single Dockerfile. Each build stage can use a different base image, install different packages, and produce different build artifacts. You can then use the COPY or FROM instructions to selectively copy files from one stage to another.

Here’s a simplified step-by-step breakdown of how multi-stage builds work:

Define the Build Stage: The first stage is where you install all your build dependencies and compile or package your application. This stage can be as complex as needed.
Create the Final Stage: In the second (or final) stage, you start with a minimal base image that’s appropriate for running your application in production (e.g., alpine, scratch). You then copy only the necessary files from the first stage.
Discard the Unnecessary Files: After the multi-stage build is complete, all the unnecessary files, dependencies, and build tools from the first stage are discarded. Only the files you explicitly copy to the final stage are included in the final image.

5. Writing an Optimized Multi-Stage Dockerfile

Let’s walk through a practical example of how to write a multi-stage Dockerfile.

Example: A Go Application

Imagine you’re building a Go application. Go applications are typically statically compiled, meaning you don’t need any dependencies at runtime—just the binary itself. Here’s how you would structure a multi-stage build for a Go application.

Dockerfile

# Stage 1: Build stage
FROM golang:1.18-alpine AS builder

# Set the working directory inside the container
WORKDIR /app

# Copy the source code into the container
COPY . .

# Compile the Go application
RUN go mod tidy && go build -o myapp

# Stage 2: Production stage
FROM alpine:latest

# Set the working directory inside the final container
WORKDIR /app

# Copy only the compiled binary from the first stage
COPY --from=builder /app/myapp .

# Expose the application port
EXPOSE 8080

# Run the Go application
CMD ["./myapp"]

Breakdown:

Build Stage (Stage 1):
- We start with the golang:1.18-alpine base image, which is lightweight but contains everything we need to build a Go application.
- The WORKDIR is set to /app, and the source code is copied into this directory using the COPY command.
- We then run go mod tidy to ensure all dependencies are installed, and go build -o myapp to compile the Go application into a binary named myapp.
Production Stage (Stage 2):
- In the second stage, we start with the alpine base image, which is a minimal image optimized for production use.
- We use the COPY --from=builder instruction to copy only the compiled myapp binary from the first stage (builder) into the final image.
- We set the WORKDIR to /app in this stage, expose the necessary port (8080), and define the command to run the application (CMD ["./myapp"]).

Result:

The first stage includes the full Go compiler, source code, and dependencies, but these are not carried over into the final image.
The final image contains only the statically compiled Go binary and the necessary Alpine base, resulting in a much smaller image size compared to using a single-stage Dockerfile.

6. Real-World Example of Multi-Stage Builds

Let’s explore a more complex real-world scenario involving a Node.js application with a frontend build process. In this example, you’ll compile the frontend assets in one stage and then copy them to the backend stage for the final image.

Dockerfile for a Node.js Application with Frontend Build

# Stage 1: Build frontend assets
FROM node:18-alpine AS frontend-builder

# Set the working directory
WORKDIR /app/frontend

# Copy the frontend source code
COPY frontend/package.json .
COPY frontend/package-lock.json .

# Install dependencies and build the frontend
RUN npm install && npm run build

# Stage 2: Build backend
FROM node:18-alpine AS backend-builder

WORKDIR /app/backend

# Copy the backend source code
COPY backend/package.json .
COPY backend/package-lock.json .

# Install dependencies for the backend
RUN npm install

# Copy the frontend build artifacts from the first stage
COPY --from=frontend-builder /app/frontend/build ./public

# Copy backend source code
COPY backend/ .

# Expose the backend port
EXPOSE 3000

# Run the backend server
CMD ["npm", "start"]

Explanation:

Frontend Build Stage:
- The first stage uses the node:18-alpine image to build the frontend assets.
- We install frontend dependencies and run the npm run build command to compile the frontend assets into a build directory.
Backend Build Stage:
- The second stage also uses the node:18-alpine image but focuses on building and running the backend service.
- The compiled frontend assets from the first stage (/app/frontend/build) are copied into the ./public directory of the backend.
- Finally, we expose the backend port (3000) and define the command to run the backend server.

7. Best Practices for Multi-Stage Builds

When working with multi-stage builds, keep the following best practices in mind:

**Use Minimal Base

Images**: Always start your final stage with a minimal base image, such as alpine or scratch. This ensures that your image contains only the essential runtime components.

Reduce the Number of Layers: Combine related commands (such as multiple RUN instructions) into a single command whenever possible to reduce the number of layers in the final image.
Leverage Caching: Docker caches layers from previous builds. Structure your Dockerfile to maximize cache usage by placing commands that change frequently (like COPY . .) later in the file.
Clean Up After Build: In the build stage, remove any unnecessary files (such as intermediate build artifacts, temporary files, or unused dependencies) to reduce image size.
Exclude Unnecessary Files: Use .dockerignore to exclude unnecessary files from being copied into the image, such as development environment files, documentation, and logs.

8. Additional Tips for Reducing Docker Image Size

Beyond multi-stage builds, there are additional strategies you can use to reduce Docker image sizes:

Use Smaller Base Images: Base images like alpine are much smaller than full-blown OS images like ubuntu. Use them whenever possible.
Avoid Installing Unnecessary Packages: Be mindful of what you’re installing in your Dockerfile. If a package is only needed during development, exclude it from the final production image.
Use .dockerignore: The .dockerignore file is similar to .gitignore and allows you to exclude files from being copied into the Docker image, reducing the size of the build context.

Conclusion

Multi-stage builds are an incredibly powerful feature in Docker that allows you to create smaller, more efficient, and more secure Docker images by separating the build process from the final runtime environment. By optimizing your Dockerfiles using multi-stage builds, you can significantly reduce image size, build time, and potential security risks.

Whether you’re building a Go, Node.js, or any other type of application, multi-stage builds provide a flexible and robust way to streamline your Docker workflows.

By following the best practices and tips mentioned in this post, you’ll be able to create Docker images that are optimized for production use, ensuring faster deployments, quicker pull times, and a leaner infrastructure.

Now that you’re familiar with multi-stage builds, start experimenting with your Dockerfiles and enjoy the benefits of more efficient and optimized containers!