Get started with Terraform State

Terraform state is a fundamental concept in Terraform, an infrastructure as code (IaC) tool used to manage and provision resources in a consistent and predictable manner. The state plays a crucial role in Terraform's operations, and understanding it is key to effectively using Terraform. The state file can be kept locally on the machine running Terraform or remotely using a remote backend like Azure Storage Account or Amazon S3, or HashiCorp Consul. Here's an overview:

  1. What is Terraform State?

    • Terraform state is a record of the infrastructure Terraform manages. It stores information about the resources Terraform creates and manages, which allows Terraform to map real-world resources to your configuration, keep track of metadata, and improve performance for large infrastructures.
  2. Purpose of Terraform State:

    • Mapping to Real Resources: The state file helps Terraform know which real-world resources correspond to the resources in your configuration.

    • Synchronization: It ensures that Terraform's view of your infrastructure is up-to-date, which is crucial for teams and automation.

    • Metadata Storage: The state includes metadata such as resource dependencies, which Terraform uses to create and modify resources in the correct order.

  3. Location of State Files:

    • By default, Terraform stores state locally in a file named terraform.tfstate. However, for team environments, it's recommended to use remote state, which allows team members to share the state and ensures that everyone is working with the latest version of the infrastructure.
  4. Remote State Backends:

    • Terraform supports various backends for storing state remotely, including cloud storage services like AWS S3, Google Cloud Storage, and Azure Blob Storage. Remote backends not only help in collaboration but also provide additional features like state locking and versioning.
  5. State Locking:

    • State locking prevents others from running Terraform commands that might modify the state while someone else is already modifying it. This helps avoid conflicts and potential corruption of the state file.
  6. Security Considerations:

    • The state file can contain sensitive information, such as passwords or access keys. When using remote state, it's essential to secure access to the state file and consider encryption at rest.
  7. State Manipulation:

    • Terraform provides commands like terraform state to manually modify the state file. This is useful in situations where resources need to be renamed, removed, or manually altered. However, manual editing of the state should be done with caution, as it can lead to inconsistencies.

Understanding Terraform state is crucial for any team or individual using Terraform, as it is key to maintaining the integrity and consistency of your infrastructure deployments.

Storing Terraform state

Terraform state is a critical component in Terraform's infrastructure management process. Understanding how it is stored and the best practices for its management is essential for effective and safe infrastructure provisioning and maintenance.

How Terraform State is Stored

  1. Local State:

    • By default, Terraform stores state locally in a file named terraform.tfstate. When working in a team or on a larger infrastructure, local state can be problematic because it doesn't support collaboration or history tracking.
  2. Remote State:

    • For team-based or larger infrastructures, storing state remotely is recommended. Terraform supports various backends for remote state storage, like AWS S3, Azure Blob Storage, and Google Cloud Storage.

    • Remote state storage allows team members to access the most current state of the infrastructure, provides backup, and often includes state locking to prevent conflicting changes.

When to Use Remote vs. Local State

  1. Local State:

    • Best for individual use or small projects where you don't need to share state information with others.

    • Suitable for learning, experimentation, or personal projects.

  2. Remote State:

    • Essential for team environments to ensure everyone is working with the same state.

    • Important for larger infrastructures to manage state more efficiently and securely.

    • Necessary when implementing automation and CI/CD pipelines, as it ensures the latest state is always used.

Example backend configuration for the remote state

Below is an example of a Terraform configuration that specifies a backend for storing the remote state. In this example, I'll use AWS S3 as the remote backend, which is a common choice for storing Terraform state files.

Terraform Configuration for AWS S3 Backend

main.tf:

terraform {
  required_version = ">= 0.12"

  backend "s3" {
    bucket         = "my-terraform-state-bucket"  # Replace with your S3 bucket name
    key            = "path/to/my/terraform.tfstate" # Path in the bucket to store the state file
    region         = "us-east-1"              # Replace with the region your bucket is in
    encrypt        = true
    dynamodb_table = "my-lock-table"          # Replace with your DynamoDB table name for state locking
  }
}

# Provider configuration
provider "aws" {
  region = "us-east-1"  # Replace with your desired AWS region
}

# Your resource definitions go here

Storing Terraform State in Version Control System

  • Generally Not Recommended:

    • Storing state files in a version control system (VCS) like Git is typically not recommended, primarily for the following reasons:

      • Security Risks: State files can contain sensitive information, including credentials and private keys, which should not be exposed in a VCS.

      • Size: State files can be large, making them unsuitable for version control system.

      • Merge Conflicts: State files are frequently updated, which can lead to merge conflicts in a VCS, making the process cumbersome and error-prone.

For more information see also Sensitive Data in State

  • Best Practice:

    • Use a proper backend for state storage, and implement versioning and state locking if the backend supports it.

    • For change tracking, rely on the version control of Terraform configuration files rather than the state file itself.

    • Consider using Terraform Cloud or Enterprise for enhanced state management, security, and collaboration features, especially in larger or more complex environments. Terraform Cloud always encrypts state at rest and protects it with TLS in transit. Terraform Cloud also knows the identity of the user requesting state and maintains a history of state changes. This can be used to control access and track activity. Terraform Enterprise also supports detailed audit logging.

The choice between local and remote state storage in Terraform should be based on the scale of the project, the number of collaborators, and the need for secure and efficient state management. While local state might suffice for individual or small-scale use, remote state is crucial for teams and larger infrastructures. Storing state files in a VCS is not recommended due to security concerns, potential for merge conflicts, and the nature of state files.

State Isolation and Workspace Considerations in Terraform

Using separate state files for each Terraform configuration is a best practice, especially in large or complex environments. This approach, known as state isolation, helps in managing the infrastructure more securely and efficiently. Let's delve into the details and provide a code example:

Best Practice of State Isolation

  1. Modular Approach:

    • Break down your infrastructure into logical units (modules). Each module should have its own Terraform configuration and thus, its own state file.

    • This reduces the size of each state file, limits the scope of changes, and minimizes the risk of accidental modifications to unrelated infrastructure components.

  2. Independent Management:

    • Each module can be independently applied, updated, and destroyed without impacting other parts of the infrastructure.

    • This is particularly useful for large teams where different members or teams might be responsible for different parts of the infrastructure.

  3. Enhanced Security:

    • By isolating state files, you limit the exposure of sensitive data contained in the state to only those who need access to that particular part of the infrastructure.

Code Example

Suppose you have an infrastructure with two main components: network and compute. You can create separate directories for each component, with its own Terraform configuration and state file.

Directory Structure

infrastructure/
│
├── network/
│   ├── main.tf
│   ├── variables.tf
│   └── terraform.tfstate
│
└── compute/
    ├── main.tf
    ├── variables.tf
    └── terraform.tfstate

network/main.tf

terraform {
  backend "s3" {
    bucket = "my-terraform-network-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

# Network resource definitions...

compute/main.tf

terraform {
  backend "s3" {
    bucket = "my-terraform-compute-state"
    key    = "compute/terraform.tfstate"
    region = "us-east-1"
  }
}

# Compute resource definitions...

In this example, each directory (network and compute) has its own set of Terraform files and a dedicated state file stored in a specific path in an S3 bucket. This ensures that the state for network resources is independent of the state for compute resources.

Terraform Workspaces for Environment Isolation

Terraform workspaces allow you to use the same configuration for multiple environments (e.g., dev, stage, production) by switching between different workspaces. Each workspace has its own state file. However, using workspaces for environment isolation is generally discouraged for several reasons:

  1. Limited Isolation:

    • Workspaces store state files for different environments in the same backend, which can lead to accidental cross-environment impacts if not managed carefully.
  2. Complexity in Large Environments:

    • As the infrastructure grows, managing multiple environments with workspaces can become complex and error-prone. It's easy to mistakenly apply changes to the wrong environment.
  3. Access Control Challenges:

    • With all environments in the same configuration, it can be difficult to implement fine-grained access controls. Different environments, especially production, often have stricter access requirements.
  4. Difficulty in Promoting Changes:

    • Promoting changes from one environment to another (like dev to prod) can be more complicated with workspaces, as it may require manual intervention or complex automation.

Instead of using workspaces for environment isolation, it's recommended to use separate configurations (and thus separate state files) for each environment, just like the modular approach for resource isolation. This ensures clearer separation, better security, and more straightforward management of different environments.

Using Terraform remote state data source

The terraform_remote_state data source in Terraform is used to access the state data stored by another Terraform configuration. This feature is especially useful when you have split your infrastructure across multiple Terraform configurations and need to share information between them. For instance, a network infrastructure setup might output network IDs that a separate application setup needs to reference.

💡
A Terraform data source is a feature that allows you to retrieve and use information defined outside of your Terraform configuration or gathered from a provider's external resources. Data sources enable a Terraform configuration to use information that is not managed by Terraform (i.e., it wasn't created by your Terraform code but by some other means or is a part of a different Terraform configuration). This can include fetching data about an existing cloud infrastructure component, querying information from a cloud provider, or even accessing data from a Terraform state file created by another separate Terraform configuration. Data sources are read-only views into the data; they do not create or manage resources, but rather provide a way to integrate and reference external data and resources into your Terraform-managed infrastructure.

Here's how it works and an example:

How terraform_remote_state works:

  1. Configuration A Outputs Data: One Terraform configuration (let's call it Configuration A) applies and generates outputs, which are stored in its remote state.

  2. Configuration B Accesses Data: Another Terraform configuration (Configuration B) uses the terraform_remote_state data source to access the outputs from Configuration A's remote state.

  3. Reference Remote Outputs: Configuration B can then reference these outputs just like any other data source, enabling modular and decoupled infrastructure management.

Code Example:

Configuration A: Network Setup (network/main.tf)

First, let's assume we have a network configuration that outputs a VPC ID:

# Network configuration (network/main.tf)

provider "aws" {
  region = "us-west-2"
}

resource "aws_vpc" "my_vpc" {
  # VPC configuration...
}

output "vpc_id" {
  value = aws_vpc.my_vpc.id
}

terraform {
  backend "s3" {
    bucket = "my-terraform-bucket"
    key    = "network/terraform.tfstate"
    region = "us-west-2"
  }
}

Configuration B: Application Setup (app/main.tf)

Now, let's reference the VPC ID in a separate application configuration:

# Application configuration (app/main.tf)

provider "aws" {
  region = "us-west-2"
}

data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "my-terraform-bucket"
    key    = "network/terraform.tfstate"
    region = "us-west-2"
  }
}

resource "aws_instance" "my_app" {
  # ... other configuration ...

  subnet_id = data.terraform_remote_state.network.outputs.vpc_id
}

In this example, the application configuration (Configuration B) uses the terraform_remote_state data source to fetch the state from the network configuration (Configuration A). It references the vpc_id output from the network configuration to set the subnet_id property of an AWS instance.

Access resources and outputs:

Once you have defined the data source, you can access resources or outputs from the remote state using the following syntax:

data.terraform_remote_state.remote_state.outputs.<output_name>
data.terraform_remote_state.remote_state.modules.<module_name>.outputs.<output_name>
  • outputs.<output_name>: Accesses an output defined in the root module of the remote state.

  • modules.<module_name>.outputs.<output_name>: Accesses an output defined in a specific module within the remote state.

Here's an example:

# Create a resource using an output from the remote state
aws_instance "my_instance" {
  ami           = data.terraform_remote_state.remote_state.outputs.ami
  instance_type = data.terraform_remote_state.remote_state.modules.vpc.outputs.instance_type
}

This code defines an AWS instance resource named my_instance. The ami attribute is set to the value of the ami output from the root module of the remote state. The instance_type attribute is set to the value of the instance_type output from the vpc module within the remote state.

Important Notes:

  • When using terraform_remote_state, ensure both configurations use the same Terraform version and compatible providers.

  • Accessing resources directly from the remote state is not recommended as it bypasses Terraform's dependency management.

  • Always consider security implications when accessing remote state, as it might expose sensitive information.

Terraform state and workspace management commands

Working with Terraform state involves a range of commands designed to view, modify, and manage the state file. Here's a comprehensive list of these commands:

  1. Basic State Management Commands:

  2. Advanced State Manipulation Commands:

  3. State Backup and Restoration:

    • While Terraform does not have specific commands for backing up and restoring state (since this is often handled by the backend itself), the terraform apply command have the -backup flag to create backup state files. Also, terraform state push and terraform state pull can be used for manual backup and restoration processes.
  4. State Locking and Unlocking (Dependent on Backend):

  5. Workspace Commands:

Using these commands appropriately can help you effectively manage and manipulate your Terraform state, which is crucial for maintaining the consistency and integrity of your infrastructure management. Remember that some of these commands, especially those that modify the state, should be used with caution as they can change how Terraform perceives your infrastructure.

References:

  1. How to manage Terraform state

  2. Managing Terraform State – Best Practices & Examples

  3. Terraform Data Sources – How They Are Utilized

  4. The terraform_remote_state Data Source

  5. Terraform Architecture Overview – Structure and Workflow

  6. Terraform pricing

  7. What are Terraform Workspaces? Overview with Examples

  8. Data Sources