Get started with multi-stage builds in Docker

Photo by Ian Taylor on Unsplash

Get started with multi-stage builds in Docker

A multi-stage Docker builds is an advanced approach used in Docker to optimize the building of Docker images, especially for applications that require a compilation or build process. This approach allows you to create multiple stages in a single Dockerfile, with each stage potentially using a different base image, and then selectively copy artifacts from one stage to another. The key benefits and features of a multi-stage Dockerfile include:

  1. Separation of Build and Runtime Environments: You can have one stage to compile or build your application and a different, often lighter, stage for the runtime environment. This separation keeps the production image small and secure.

  2. Reduced Image Size: By separating build tools and intermediate files into earlier stages, the final image contains only the necessary runtime dependencies. This results in a smaller image, which is beneficial for storage efficiency and faster deployment.

  3. Better Caching and Faster Builds: Docker caches the results of each stage. When you rebuild the image, Docker only re-runs the stages that have changed. This can significantly speed up the build process.

  4. Simplified Management: Instead of managing multiple Dockerfiles for different purposes (e.g., one for building and one for running), you can manage a single Dockerfile that includes everything.

  5. Easier Maintenance and Readability: A multi-stage Dockerfile is often more readable and maintainable. Each stage is clearly defined, making it easier to understand the role of each part of the Dockerfile.

The Old Way: Builder Pattern

Before the introduction of multi-stage builds in Docker, a common technique to optimize Docker images was the "builder pattern". This pattern was used to create lean and efficient Docker images, especially for compiled languages like C++ or Go, or for Node.js applications where dependencies needed to be installed and built.

Concept of the Builder Pattern

The builder pattern in Docker involved using two separate Dockerfiles and images:

  1. Builder Image:

    • This image was used for compiling the source code, installing dependencies, and running any necessary build scripts.

    • It typically was a larger image, containing all the necessary build tools and environments.

    • The source code would be copied into this image, and the build process would generate the necessary binaries or compiled code.

  2. Runtime Image:

    • After the build process was complete, the runtime image was created.

    • This image was much leaner and only contained the necessary runtime environment and the compiled code or artifacts from the builder image.

    • Since it didn’t include the build environment and tools, it was significantly smaller and more secure.

Example Workflow

Here’s a simplified example of how the builder pattern might have been used:

  1. Dockerfile for Builder Image (Dockerfile.build):

     FROM node:14
     WORKDIR /app
     COPY . /app
     RUN npm install
     RUN npm run build
    

    This Dockerfile would create an image containing the built application.

  2. Building the Builder Image:

     docker build -f Dockerfile.build -t myapp-builder .
    
  3. Extract Artifacts:

    After building the builder image, the next step was to extract the necessary artifacts (like compiled binaries) from the builder container.

     docker create --name temp-container myapp-builder
     docker cp temp-container:/app/build /local/build
     docker rm temp-container
    
  4. Dockerfile for Runtime Image (Dockerfile):

     FROM node:14-slim
     WORKDIR /app
     COPY /local/build /app
     CMD ["node", "app.js"]
    

    This Dockerfile would create a much smaller runtime image containing only the necessary runtime environment and the built artifacts.

  5. Building the Runtime Image:

     docker build -t myapp .
    

Transition to Multi-Stage Builds

With the introduction of multi-stage builds in Docker 17.05, this process was significantly simplified. Multi-stage builds allow both the build and runtime environments to be specified in a single Dockerfile, eliminating the need for separate Dockerfiles and manual artifact extraction. This modern approach is more efficient, reduces complexity, and decreases the potential for errors.

Name your build stages

In Docker multi-stage builds, stages can be either named or unnamed. Naming a stage provides a reference that can be used in subsequent stages, making it easier to copy artifacts from one stage to another. Let's explore examples of both unnamed and named stages.

Example of Unnamed Stages

Here's an example of a Dockerfile with unnamed stages:

# First stage (unnamed)
FROM node:14
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build

# Second stage (unnamed)
FROM nginx:alpine
COPY --from=0 /app/build /usr/share/nginx/html

In this example, there are two stages:

  • The first stage is based on the node:14 image. It's used for building a Node.js application.

  • The second stage is based on the nginx:alpine image. It copies the build artifacts from the first stage.

The COPY --from=0 command in the second stage refers to the first stage by its index (0). Since the stages are unnamed, they are referred to by their sequential index numbers, starting from 0 for the first stage.

Example of Named Stages

Now, let's see an example with named stages:

# First stage named 'builder'
FROM node:14 as builder
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build

# Second stage named 'runtime'
FROM nginx:alpine as runtime
COPY --from=builder /app/build /usr/share/nginx/html

In this example, the stages are explicitly named using the as keyword:

  • The first stage is named builder. It's where the Node.js application is built.

  • The second stage is based on nginx:alpine and copies the build artifacts from the builder stage using COPY --from=builder.

By naming the stages, the Dockerfile becomes more readable, and it's easier to understand from which stage the artifacts are being copied.

Benefits of Naming Stages

  • Readability: Naming stages makes the Dockerfile more readable and easier to understand, especially in Dockerfiles with multiple stages.

  • Maintainability: It's easier to maintain and modify a Dockerfile with named stages, as changes can be made more confidently without counting stage indexes.

  • Flexibility: In complex builds with many stages, naming allows more flexibility in referring to different stages out of sequential order.

Naming stages in multi-stage Docker builds is a best practice for clarity and maintainability, especially in complex Dockerfiles.

Stop at a specific build stage

When you build your image, you don't necessarily need to build the entire Dockerfile including every stage. You can specify a target build stage. The following command assumes you are stopping at the stage named build:

$ docker build --target build -t hello .

A few scenarios where this might be useful are:

  • Debugging a specific build stage

  • Using a debug stage with all debugging symbols or tools enabled, and a lean production stage

  • Using a testing stage in which your app gets populated with test data, but building for production using a different stage which uses real data

Use an external image as a stage

When using multi-stage builds, you aren't limited to copying from stages you created earlier in your Dockerfile. You can use the COPY --from instruction to copy from a separate image, either using the local image name, a tag available locally or on a Docker registry, or a tag ID. The Docker client pulls the image if necessary and copies the artifact from there. The syntax is:

COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf

Use a previous stage as a new stage

You can pick up where a previous stage left off by referring to it when using the FROM directive. For example:

# syntax=docker/dockerfile:1

FROM alpine:latest AS builder
RUN apk --no-cache add build-base

FROM builder AS build1
COPY source1.cpp source.cpp
RUN g++ -o /binary source.cpp

FROM builder AS build2
COPY source2.cpp source.cpp
RUN g++ -o /binary source.cpp

Problems That Docker Multistage Builds Might Encounter

Docker multi-stage builds are a powerful feature that help in creating optimized and lean Docker images. However, like any tool or approach, they come with their own set of potential problems or challenges. Understanding these can help you effectively use multi-stage builds and troubleshoot any issues that arise.

1. Complexity and Maintenance

  • Increased Complexity: Multi-stage builds can increase the complexity of your Dockerfile, especially when you have several stages or complex dependencies. This complexity can make it harder for new developers to understand and maintain the Dockerfile.

  • Maintenance Overhead: Keeping track of what happens in each stage and ensuring that each stage has exactly what it needs (and nothing more) can add to maintenance overhead.

2. Build Time and Resource Usage

  • Longer Build Times: Each stage in a multi-stage build essentially creates a separate image, which can lead to longer build times, especially if stages are not efficiently organized or if there are redundant operations across stages.

  • Increased Resource Usage: More stages can mean more temporary images and layers being created, which can use more disk space and resources during the build process.

3. Incorrect Artifact Copying

  • Artifact Confusion: When copying artifacts from one stage to another, it's essential to specify the correct paths and stage names. Mistakes here can lead to missing files, incorrect application behavior, or larger-than-necessary final images.

  • Dependency Errors: Omitting necessary files or dependencies in the final stage can lead to runtime errors. This is a common issue when dependencies are split across different stages.

4. Caching Issues

  • Ineffective Use of Cache: Docker caches intermediate layers to speed up subsequent builds. However, changes in early stages will invalidate the cache for all subsequent stages, which can slow down the build process. Properly structuring your Dockerfile to take advantage of caching can be tricky.

  • Cache Bust: Understanding when and how the Docker build cache is busted is crucial. Unintentional cache busts can significantly increase build times.

5. Overlooking Security Aspects

  • Security Oversight: Each stage in a multi-stage build can potentially introduce security vulnerabilities. It's important to ensure that each stage uses secure base images and that any added software is trusted and up-to-date.

  • Leaking Sensitive Data: If sensitive data (like credentials) is used in an early stage, care must be taken to ensure it’s not included in the final image.

6. Layer Size and Optimization

  • Suboptimal Layer Reduction: One of the goals of multi-stage builds is to reduce the final image size. However, if not carefully managed, these builds can still result in larger-than-necessary images, especially if large files are not excluded or if unnecessary layers are retained.

7. Incompatibility and Dependency Issues

  • Base Image Mismatch: If different stages use significantly different base images, there might be compatibility issues with binaries or libraries.

  • Runtime Environment Mismatch: There can be cases where the environment in which the application was built differs from the environment in which it runs, leading to unexpected behavior or errors.

Best Practices

To mitigate these problems, it's important to:

  • Plan your stages and artifacts carefully.

  • Use consistent and secure base images.

  • Understand Docker caching mechanisms.

  • Regularly update and maintain your Dockerfiles.

  • Test your Docker builds thoroughly in environments similar to your production setup.

By being aware of these potential issues and following best practices, you can effectively leverage Docker multi-stage builds to create efficient and secure containerized applications.

💡
Docker BuildKit is an advanced feature introduced in Docker 18.09 that can be used to speed up the build process of Docker images. BuildKit offers a range of improvements over the traditional Docker image build process, making it more efficient and faster, particularly for complex builds or multi-stage Dockerfiles. You can enable BuildKit, and get faster multi-stage builds by setting the DOCKER_BUILDKIT environment variable to 1. In newer versions of Docker this is enabled by default.

Tutorial: Create Docker images with single-stage and multi-stage builds

Here's a basic example to illustrate how a multi-stage Dockerfile might look for a Node.js application and how it differs from normal build process.

To complete the tutorial, let's create a simple Node.js application. This application will be used in both the single-stage and multi-stage Docker builds.

Step 0: Create a Simple Node.js Application

  1. Create Application Files: In your project directory, create the following files:

    • app.js: This will be your Node.js application.

    • package.json: This file will define your application and its dependencies.

  2. app.js: Write a simple Node.js application.

     // app.js
     const express = require('express');
     const app = express();
     const port = 3000;
    
     app.get('/', (req, res) => {
       res.send('Hello World!');
     });
    
     app.listen(port, () => {
       console.log(`Example app listening at http://localhost:${port}`);
     });
    
  3. package.json: Define the application and its dependencies.

     {
       "name": "docker-node-app",
       "version": "1.0.0",
       "description": "A simple Node.js app for Docker tutorial",
       "main": "app.js",
       "scripts": {
         "start": "node app.js"
       },
       "dependencies": {
         "express": "^4.17.1"
       }
     }
    
  4. .dockerignore: (Optional) Create a .dockerignore file to exclude node_modules and other non-essential files from the Docker context.

     node_modules
     npm-debug.log
    

This setup will allow you to effectively demonstrate the difference in image sizes between a normal Docker build and a multi-stage Docker build.

Step 1: Create a Basic Dockerfile

  1. Create a Dockerfile: Create a simple Dockerfile. This Dockerfile will use a single stage build.

    Example Dockerfile:

     FROM node:14
     WORKDIR /app
     COPY . .
     RUN npm install
     CMD ["node", "app.js"]
    
  2. Build the Image: Run the Docker build command to create an image.

     docker build -t myapp:single-stage .
    
  3. Check Image Size: After the build completes, use the Docker images command to check the size of the image.

     docker images myapp:single-stage
    

  4. (Optionally). To check if the app is working as expected you can run the Docker container in the background (detached) mode and the access the web server at http://localhost:3000 :

     docker run -d -p 3000:3000 myapp
    

Step 2: Create a Multi-stage Dockerfile

  1. Create a Multi-stage Dockerfile: Now, modify the Dockerfile to use multi-stage builds. This approach typically reduces the size of the final image by discarding unnecessary build dependencies.

    Example Multi-stage Dockerfile:

     # Build stage
     FROM node:14 AS builder
     WORKDIR /app
     COPY . .
     RUN npm install
    
     # Final stage
     FROM node:14-slim
     WORKDIR /app
     COPY --from=builder /app .
     CMD ["node", "app.js"]
    
  2. Build the Multi-stage Image: Build the image using the new Dockerfile.

     docker build -t myapp:multi-stage .
    
  3. Check Image Size: Again, check the size of the newly built image.

     docker images myapp:multi-stage
    

Step 3: Compare Image Sizes

  • Compare the sizes reported in the outputs of the docker images commands for both the single-stage and multi-stage builds. The multi-stage build image is typically smaller.

    In the given example above the normal build produced the image size 916 MB while multi-stage build produced the image with 183 MB in size.

Step 4: Optional Testing

  • If you want to further test, you can run both images and ensure that they work as expected. This step is to confirm that the multi-stage build didn't break any functionality.

      docker run -d --name myapp-single myapp:single-stage
      docker run -d --name myapp-multi myapp:multi-stage
    
  • Check if both containers are running and serving their purpose.

Additional Notes

  • Ensure that your application’s source code is in the same directory as your Dockerfiles for the COPY command to work correctly.

  • The node image is used as an example. Adjust your Dockerfile based on the technology stack of your application.

  • The effectiveness of a multi-stage build in reducing image size can vary significantly depending on the nature of the application and its dependencies.

Clean-up

To perform a complete cleanup of all Docker containers and images from your system, you can follow these steps. This process will remove all containers, images, volumes, and networks that are not in use. Remember, this action is irreversible, so ensure you don't have important data in your containers or images before proceeding.

Clean Up Docker Containers

  1. Stop All Running Containers: This command stops all currently running containers.

     docker stop $(docker ps -aq)
    
  2. Remove All Containers: After stopping them, remove all containers (both running and stopped).

     docker rm $(docker ps -aq)
    

Clean Up Docker Images

  1. Remove All Docker Images: This command removes all Docker images from your system.

     docker rmi -f $(docker images -aq)
    

Clean Up Volumes and Networks

  1. Prune System: Docker provides a convenient command to clean up unused containers, networks, images (both dangling and unreferenced), and optionally, volumes.

     docker system prune -a
    
    • Adding -a includes all unused images, not just dangling ones.

    • If you want to remove volumes as well, include the --volumes flag:

        docker system prune -a --volumes
      

Additional Notes

  • Volumes: If you have data in Docker volumes that you want to preserve, be cautious with using docker system prune --volumes. This will delete all unused volumes.

  • Active Containers and Images: These commands will not remove containers and images that are currently in use. If you want to force the removal, you can use the -f (force) flag, but be very careful with this as it can disrupt running applications.

  • Data Loss Warning: These actions will result in the loss of all Docker containers and images on your system. Make sure to backup any important data before proceeding.

By following these steps, you will clean up your system by removing all Docker containers, images, volumes, and networks that are not currently in use.

References

  1. Multi-stage builds

  2. Dockerfile reference

  3. Understanding Docker Multistage Builds

  4. Keep it small: a closer look at Docker image sizing

  5. https://docs.docker.com/build/buildkit/

  6. https://github.com/moby/buildkit

  7. Introducing BuildKit

  8. Cache Bust a Docker Build