Dockerfile Best Practices

Photo by Kevin Ku on Unsplash

Dockerfile Best Practices

Dockerfile best practices focus on creating efficient, maintainable, and secure Docker images. Here are some key guidelines:

Use Official Base Images

Start with official, minimal, and verified base images from trusted registries like Docker Hub to reduce security risks and ensure quality. These images are well-maintained, secure, and widely used.

When using official base images in a Dockerfile, it's essential to choose images that are minimal, well-maintained, and secure. Here's an example of how to use an official base image in a Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

In this example:

  • FROM python:3.8-slim: This line specifies the base image. Here, we're using a slim version of the Python 3.8 image. Slim images are smaller and include only essential packages, which is good for security and efficiency. The tag 3.8-slim refers to Python 3.8 on a slim Debian base image. Using official images like this ensures that you're building on a secure and well-maintained foundation.

  • The rest of the Dockerfile sets up the application environment, copies the application into the container, installs dependencies, exposes a port, sets an environment variable, and specifies the command to run the application.

It's important to note that you should always use a specific tag (like 3.8-slim in this case) rather than the latest tag. This practice ensures consistency and reproducibility of your Docker builds.

Label Your Images

Use the LABEL instruction to add metadata to your image, like version, description, and maintainer information, making it easier to manage and identify images.

Labeling your Docker images is a good practice for adding metadata that describes the contents or purpose of the image. This metadata can include information like the version of the image, the maintainer's contact information, and a description of the image. Here's an example of how to label a Docker image in a Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Label the image
LABEL maintainer="example@example.com" \
      version="1.0" \
      description="This is a sample Docker image for a Python application."

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

In this Dockerfile:

  • The LABEL instruction is used to add metadata to the image. Each label is a key-value pair. Common labels include:

    • maintainer: The email or name of the person maintaining the image.

    • version: The version of the image or the application it contains.

    • description: A brief description of the image or the application.

  • You can chain multiple labels in a single LABEL instruction by using backslashes (\) to continue the instruction on the next line. This is a best practice for readability and to avoid creating additional layers in the image.

Labels help with the management and identification of images, especially when you have multiple images and tags in your registry. They don't change the functionality of the image, but they provide useful information to users or automated tools that handle the images.

Minimize Layers

Combine related commands into a single RUN instruction to reduce the number of layers in your image, which can improve build performance and readability.

Minimizing layers in a Dockerfile is a best practice aimed at reducing the overall size of the Docker image and improving its performance. In Docker, each instruction in the Dockerfile creates a new layer. By combining instructions, you can reduce the number of layers. Here's an example:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
# and perform clean-up in one RUN command to minimize layers
RUN pip install --no-cache-dir -r requirements.txt \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

In this Dockerfile:

  • The RUN instruction is used to install dependencies from requirements.txt. After installing, it also cleans up unnecessary files (rm -rf /var/lib/apt/lists/*) and runs apt-get clean. By combining these commands into one RUN statement using &&, you ensure that all these actions are performed in a single layer, rather than creating a new layer for each command.

  • This practice of combining commands reduces the total number of layers in the image, which can lead to smaller image sizes and faster build times.

  • Note that while it's important to minimize layers for efficiency, readability and maintainability of the Dockerfile should also be considered. Don't over-combine commands to the point where the Dockerfile becomes hard to read or understand.

Use Multi-Stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. This practice is beneficial for minimizing the size of your final image by separating the build environment from the runtime environment. For more information, please see the following guide: "Get started with multi-stage builds in Docker".

Optimize for Cache Usage

Docker caches layers to speed up subsequent builds. Order your Dockerfile instructions so that frequently changed instructions are at the bottom.

This way, you can leverage Docker's caching mechanism effectively.

Additional Tips:

  • Use RUN --mount type=cache to explicitly specify which cached layers to use for a particular RUN instruction.

  • Consider using newer versions of Docker which supports BuildKit for faster caching and parallel builds. For more information, please see the guide "Get started with Docker BuildKit".

  • Analyze your Dockerfile with linter like hadolint to identify potential caching improvements.

Use .dockerignore

Similar to .gitignore, a .dockerignore file prevents unnecessary files from being added to your Docker context, speeding up the build process.

Using a .dockerignore file in your Docker project is similar to using a .gitignore file in a Git project. It tells Docker to ignore certain files and directories when building an image, which can speed up the build process by reducing the amount of context sent to the Docker daemon. Here's a code example demonstrating how to use .dockerignore effectively:

Project structure:

.
├── Dockerfile
├── README.md
├── app/
│   ├── __init__.py
│   ├── main.py
├── tests/
│   ├── test_main.py
├── env/
│   ├── .env  # sensitive configuration file
└── ...

Example .dockerignore:

# Ignore development files
*.pyc
*.env

# Ignore test files
tests/

# Ignore documentation
README.md

# Ignore temporary files
**/.cache
**/.idea

# Ignore specific files
/path/to/secret.txt

# Ignore everything except app directory (explicit for clarity)
*
!app/
!Dockerfile

# Ignore patterns recursively
**/vendor/*

# Ignore dotfiles except Dockerfile and .gitignore
.*
!Dockerfile
!/.gitignore

Explanation:

  • We ignore common development artifacts like .pyc files and environment files containing sensitive information like .env.

  • Test files are excluded as they're not needed for running the application in production.

  • Documentation files like README.md are also excluded, unless specifically needed in the container.

  • Temporary and IDE-related directories are ignored to keep the image clean.

  • Specific files like secret.txt are explicitly excluded for security reasons.

  • Using * to ignore everything followed by specific inclusions for app/ and Dockerfile ensures only relevant files are copied.

  • Recursive patterns like **/vendor/* exclude all files within the vendor/ directory and its subdirectories.

  • Dotfiles are typically ignored, but Dockerfile and .gitignore are explicitly included.

Check out VSCode extension for highlighting the ingore file's syntax: https://github.com/ldez/vscode-language-ignore

Remember:

  • Tailor your .dockerignore to your specific project needs.

  • Don't ignore files necessary for your application to run.

  • Always have Dockerfile and .dockerignore themselves outside the ignored patterns.

By using a .dockerignore file, you reduce the size of the build context sent to the Docker daemon, which can result in faster build times, especially in cases where the project directory contains large or numerous unnecessary files. This practice also helps in ensuring that only the necessary files are included in your Docker image, contributing to a cleaner, more secure final product.

Set Non-Root User

Running processes as a non-root user in your Docker containers improves security by minimizing the potential damage if vulnerabilities are exploited. Here are some ways to achieve this, depending on your needs:

1. Dockerfile Instruction:

This method directly sets the user within the Dockerfile using the USER instruction.

FROM python:3.11-slim-buster

# Set non-root user with UID 1000 and GID 1000
USER 1000

# Copy your application and run it with the appropriate command
COPY . /app
CMD ["python", "your_app.py"]

2. Docker Run Flag:

You can also set the user at runtime using the --user flag when running the container.

docker run --user 1000:1000 my-image python your_app.py

3. Docker Compose:

In Docker Compose, specify the user for a service in the user key within its definition.

version: "3.8"

services:
  my-service:
    image: my-image
    user: 1000
    command: ["python", "your_app.py"]

4. Dockerfile with Group Management:

For more complex scenarios, you can create a dedicated user and group within the image and set them accordingly.

FROM python:3.11-slim-buster

RUN useradd -ms /bin/bash myuser -u 1000 -g 1000
RUN mkdir -p /home/myuser

# Copy your application and set ownership
COPY . /home/myuser
RUN chown -R myuser:myuser /home/myuser

# Set user for your application
USER myuser

CMD ["python", "/home/myuser/your_app.py"]

Choosing the Method:

  • The Dockerfile instruction is the preferred method for production deployments, as it ensures consistent user settings.

  • The docker run flag or Docker Compose user option are convenient for testing or ad-hoc runs.

  • The custom user creation method allows for finer control over group memberships and permissions.

Additional Tips:

  • Avoid running your application as root even with elevated privileges like sudo.

  • Choose a non-obvious user ID and group ID for added security.

  • Consider adopting least privilege principles and limit user access to only necessary resources.

By adopting non-root user practices in your Docker containers, you can significantly enhance the security and robustness of your deployments.

Specify Versions Explicitly

One crucial Dockerfile best practice is explicitly specifying versions for all components, including base images, dependencies, and tools. This improves reproducibility, security, and predictability in your builds and deployments.

Why Specify Versions?

Imagine building a Docker image with the following instruction:

FROM python:latest

While seemingly convenient, this approach has drawbacks:

  • Uncertainty: You're unaware of the exact Python version included. This can lead to unexpected behavior if different versions have incompatibilities.

  • Unwanted updates: The "latest" tag can change at any time, potentially breaking your build due to unforeseen changes in the base image.

  • Difficult debugging: Tracking down issues becomes harder as reproducing the exact build environment becomes challenging.

By explicitly specifying versions, you address these concerns:

  • Clarity: You know precisely which Python version your application relies on.

  • Stability: Your build remains consistent and predictable across environments.

  • Traceability: Reproducing past builds or debugging issues becomes easier by referencing specific versions.

How to Specify Versions?

Here are some ways to explicitly version components in your Dockerfile:

  1. Tag-based versioning:
FROM python:3.11-slim

# Install specific package version
RUN pip install --upgrade package==1.2.3

# Use image with specific commit SHA
FROM my-custom-image:e8f8231
  1. Environment variables:
# Define base image version in environment variable
ENV PYTHON_VERSION=3.11-slim

FROM $PYTHON_VERSION

# Set tool version dynamically
RUN pip install --upgrade tool==$TOOL_VERSION
  1. Build arguments:
ARG DEPENDENCY_VERSION=1.2.3

FROM python:3.11-slim

# Use build argument for package installation
RUN pip install --upgrade package==$DEPENDENCY_VERSION

Remember to choose appropriate versioning strategies based on your needs and update dependencies regularly for security and bug fixes.

Benefits of Explicit Versioning

Following these best practices leads to:

  • Improved build reliability and consistency.

  • Easier debugging and troubleshooting.

  • Enhanced security by avoiding unexpected vulnerabilities.

  • Simplified version control and dependency management.

By embracing explicit versioning in your Dockerfiles, you cultivate a robust and dependable development environment for your applications.

Use ENTRYPOINT and CMD Appropriately

Dockerfiles offer two key instructions for defining how your container runs: ENTRYPOINT and CMD. Understanding their differences and using them appropriately is crucial for building clean, consistent, and adaptable images.

What they do:

  • ENTRYPOINT: Sets the main command to be executed inside the container, similar to the program invoked on the command line. It can take either a single executable path or a list of arguments.

  • CMD: Specifies the default arguments to be passed to the ENTRYPOINT command. It can also be a single executable path or a list of arguments.

Best Practices:

  1. Use ENTRYPOINT for the main executable: Define the core program your container runs with ENTRYPOINT. This ensures consistent execution regardless of how the container is launched.

  2. Use CMD for default arguments: Set default parameters passed to your ENTRYPOINT command with CMD. This allows overriding them during container execution with custom flags.

  3. Avoid redundant usage: Don't specify the same executable in both ENTRYPOINT and CMD. Use ENTRYPOINT for the program itself and CMD only for its default arguments.

  4. Keep it modular: Use environment variables in your CMD to inject dynamic configuration or settings without modifying the base image.

  5. Consider multi-stage builds: Separate build and runtime stages in your Dockerfile. Use ENTRYPOINT in the final stage to define the container's execution behavior.

FROM python:3.11-slim

# Copy your application files
COPY . /app

# Set the main entrypoint script
ENTRYPOINT ["/app/my_app.py"]

# Set default arguments for the script
CMD ["--server", "localhost", "--port", "8000"]

# Expose the port for your application
EXPOSE 8000

In this example, /app/my_app.py is designated as the ENTRYPOINT, indicating the primary program to run. The CMD provides default arguments like server address and port, which can be overridden when launching the container with custom flags.

Remember:

  • Choose meaningful names for your ENTRYPOINT and CMD commands for improved readability.

  • Test your Dockerfile thoroughly to ensure your commands work as expected.

By utilizing ENTRYPOINT and CMD effectively, you can build flexible, consistent, and well-defined Docker images for your applications.

Avoid Sensitive Data

Dockerfiles offer a convenient way to build container images, but including sensitive data in them poses a significant security risk. Here are some best practices and a code example for avoiding sensitive information in your Dockerfiles:

Best Practices:

  1. Store secrets outside the Dockerfile: Never directly embed sensitive data like passwords, API keys, or database credentials within your Dockerfile instructions.

  2. Use environment variables: Store sensitive data in environment files outside the Dockerfile and load them as environment variables within the container at runtime.

  3. Leverage external secret management tools: Consider dedicated tools like AWS Secrets Manager or Vault to securely store and manage your secrets, accessed by the container using their respective SDKs.

  4. Mount volumes for configuration: For larger configuration files, consider mounting volumes containing them from the host system or another secure location.

  5. Minimize build context: Limit the files and directories copied into the Dockerfile's build context to avoid accidentally including sensitive files.

  6. Use .dockerignore: Exclude sensitive files and directories from being copied into the image using a properly configured .dockerignore file.

  7. Minimize permissions: Grant container users only the minimum permissions necessary to run the application, further reducing potential damage if access is compromised.

FROM python:3.11-slim

# Define Vault login details (store them securely outside Dockerfile)
ENV VAULT_ADDR="https://vault.example.com"
ENV VAULT_TOKEN="my-vault-token"

# Retrieve sensitive data from Vault using path and field expressions
# Replace "secret/path" and "field_name" with your actual settings
RUN secret_val=$(vault kv get secret/path -field=field_name | jq -r .data.field_name)

# Store secret value in environment variable
ENV MY_SECRET=$secret_val

# Copy application code
COPY . /app

# Define CMD using environment variables
CMD ["/app/my_app.py", "--my-secret=$MY_SECRET"]

# Expose port for application
EXPOSE 8000

Note: This code example defines environment variable and loads sensitive data from HashiCorp Vault. These values should be stored securely outside the Dockerfile and injected into the container environment at runtime using your chosen mechanism.

Remember:

  • Security is an ongoing process: Continuously assess your Dockerfile practices and update them as needed to maintain proper security hygiene.

  • Automate and manage secrets: Consider using automation tools to manage secret injection and rotation for improved security and operational efficiency.

By adopting these best practices and carefully avoiding sensitive data within your Dockerfiles, you can build secure and reliable container images for your applications.

Keep Your Images Up-to-Date

Keeping your Docker images updated is crucial for security, bug fixes, and performance improvements. Here are some best practices and a code example for maintaining up-to-date images:

Best Practices:

  • Automate updates: Implement automated workflows to rebuild and deploy new image versions whenever critical updates are available. CI/CD pipelines are ideal for this task.

  • Use multi-stage builds: Separate dependencies and application stages in your Dockerfile. Update only the relevant stage for dependency changes, maximizing efficiency and minimizing rebuild times.

  • Pin dependencies: Consider pinning critical dependencies to specific versions in your Dockerfile. This ensures consistent builds and avoids unexpected regressions due to dependency updates.

  • Monitor vulnerabilities: Utilize vulnerability scanners like Snyk or Anchore to regularly scan your images for known security flaws. Promptly update or patch vulnerable components.

  • Test updates: Before deploying updated images to production, thoroughly test them in a staging environment to ensure no unexpected issues arise.

  • Version your images: Tag your images with meaningful versions that reflect the underlying updates or changes. This facilitates tracking specific versions and simplifies rollbacks if needed.

# Stage 1: Build dependencies (python requirements)
FROM python:3.10-buster AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

# Stage 2: Build your application
FROM python:3.10-buster

WORKDIR /app
COPY --from=builder /app/lib .
COPY . .

RUN ["make", "build"]

# Final image: Package and ship
CMD ["your_app", "--option"]

Explanation:

  • This example utilizes a multi-stage build. The "builder" stage installs Python dependencies, while the final stage packages the application with those dependencies.

  • Updating only the dependencies would require rebuilding only the "builder" stage, minimizing overall build time.

  • You can add vulnerability scanning commands or integrate tools like Snyk into your CI/CD pipeline to automatically detect and manage updates based on identified vulnerabilities.

Remember: Adapting these practices to your specific needs and environment is crucial. Regularly evaluate your update process and optimize it for efficiency and security.

Additional Tips:

  • Consider using tools like Docker Compose to manage multi-container applications and facilitate version rollbacks.

  • Stay informed about upcoming updates for critical dependencies and base images to prioritize patching vulnerabilities.

  • Utilize automated rollback mechanisms to revert to previous image versions if necessary.

By keeping your Docker images up-to-date, you ensure your applications benefit from the latest improvements while maintaining robust security and performance.

Document Your Dockerfile

A well-documented Dockerfile improves clarity, maintainability, and collaboration around your container images. Here are some best practices and a code example for effective documentation:

What to Document:

  • Purpose: Briefly explain the purpose of the Dockerfile and what image it builds.

  • Base Image: Specify the base image used and why it was chosen.

  • Instructions: Clearly explain each FROM, COPY, RUN, CMD, and EXPOSE instruction and its purpose.

  • Environment Variables: Document any environment variables used and their expected values.

  • Multi-stage Builds: Explain the purpose and rationale behind each stage in a multi-stage build.

  • Volumes: If mounting volumes is crucial, explain the mounted folders and their roles.

  • Security Considerations: Highlight any security measures implemented (e.g., user accounts, non-root users).

  • Versioning: Explain how versioning works for the image and its components.

Documentation Formats:

  • Comments within the Dockerfile: Use clear and concise comments within the file itself.

  • Markdown files: Create a dedicated Markdown file with detailed explanations and diagrams.

  • Readme files: Include a brief overview of the Dockerfile within your project's README.

# A Dockerfile for building a Node.js API server

# 1. Base Image
FROM node:16-alpine AS builder

# 2. Install dependencies
WORKDIR /app
COPY package.json .
RUN npm install

# 3. Build the application
RUN npm run build

# 4. Production Image
FROM nginx:stable-alpine

# 5. Copy the built application
COPY --from=builder /app/dist /usr/share/nginx/html

# 6. Configure Nginx
COPY nginx.conf /etc/nginx/nginx.conf

# 7. Expose port and run
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

# Note: This is a simplified example. Add additional comments and documentation
# as needed for your specific Dockerfile.

Documentation Tips:

  • Use consistent formatting and terminology throughout your documentation.

  • Include links to relevant resources and external references.

  • Update your documentation as the Dockerfile evolves.

  • Encourage team members to contribute to and maintain the documentation.

By following these best practices and incorporating them into your Dockerfile development process, you can create well-documented images that are easier to understand, maintain, and share.

Remember, good documentation is key to effective collaboration and ensures your Docker images are valuable assets for your entire team.

References:

  1. General best practices for writing Dockerfiles

  2. Build context

  3. Overview of best practices for writing Dockerfiles

  4. Multi-stage builds

  5. Best practices for Dockerfile instructions

  6. https://github.com/dnaprawa/dockerfile-best-practices

  7. 10 tips for writing secure, maintainable Dockerfiles

  8. Dockerfile reference

  9. python:3.8-slim

  10. Reduce the size of container images with DockerSlim

  11. Optimizing builds with cache management

  12. https://github.com/hadolint/hadolint

  13. Run the Docker daemon as a non-root user (Rootless mode)