Get started with Amazon S3 Glacier

Amazon S3 Glacier is a secure, durable, and extremely low-cost cloud storage service from Amazon Web Services (AWS) designed for data archiving and long-term backup. It is an integral part of the AWS cloud storage service, offering a solution for the storage of large amounts of data at a very low cost, with a focus on data that is infrequently accessed. For more information about S3 Glacier service pricing, see S3 Glacier pricing.

Key features and use cases of Amazon S3 Glacier include:

  1. Data Archiving: Ideal for organizations needing to archive data for compliance or regulatory reasons. It's commonly used to store financial records, healthcare records, and historical data that must be retained for long periods.

  2. Backup and Disaster Recovery: S3 Glacier is used as a cost-effective solution for backing up data. This ensures that in case of a disaster, data can be recovered, albeit with some delay due to retrieval times.

  3. Digital Media Preservation: Media companies use it to archive large volumes of media content, such as films, music, and images, especially content that is not frequently accessed but needs to be stored for a long time.

  4. Scientific and Research Data Storage: Researchers and scientists use S3 Glacier to store experimental data, research findings, and large datasets that are important for long-term studies but are not accessed frequently.

  5. Legal and Compliance Data Retention: Businesses use it to store data that needs to be retained for legal and compliance reasons, often for many years.

  6. Cold Data Storage: For data that is infrequently accessed but still needs to be stored, like old project files or historical data.

  7. Cost-Effective Storage Solution: For businesses looking to reduce costs in IT infrastructure, S3 Glacier provides a cheaper alternative to on-premises storage solutions for rarely accessed data.

It's important to note that S3 Glacier is optimized for data that is infrequently accessed. Retrieving data from Glacier can take longer than other storage options (from a few minutes to several hours), making it less suitable for data that needs to be accessed quickly or frequently.

Benefits of using Amazon S3 Glacier

Amazon S3 Glacier offers several significant benefits, particularly for businesses and organizations looking to store large volumes of data over the long term in a cost-effective and secure manner. Some of the key benefits include:

  1. Cost-Effectiveness: One of the primary advantages of S3 Glacier is its low cost. It is designed for data that is infrequently accessed, making it an ideal, cost-effective solution for long-term storage. This can lead to significant savings compared to using regular, frequently-accessed storage options.

  2. High Data Durability: S3 Glacier provides extremely high durability for stored data. Amazon guarantees 99.99999999999% durability, which means there is a very low probability of data loss. This is achieved through multiple redundancies and robust infrastructure.

  3. Security: Amazon S3 Glacier offers robust security features. Data is encrypted at rest and can be transferred over SSL to maintain data security. Additionally, AWS provides fine-grained access controls and auditing capabilities to help manage data securely.

  4. Scalability: With S3 Glacier, you can scale your storage needs up or down without worrying about capacity planning. This makes it an ideal solution for businesses with varying storage requirements.

  5. Compliance and Data Retention: S3 Glacier supports various compliance requirements, making it suitable for industries that need to retain data for regulatory purposes. It can be used to store sensitive data, including healthcare, legal, and financial records.

  6. Integration with AWS Ecosystem: Being part of the AWS ecosystem, S3 Glacier works well with other AWS services, allowing for seamless integration and management of data across various AWS products.

  7. Customizable Retrieval Options: S3 Glacier offers flexible data retrieval options, including expedited, standard, and bulk retrievals, to meet different access needs and cost considerations.

  8. Automatic Data Lifecycle Management: It can be used in conjunction with Amazon S3 lifecycle policies to automatically transfer data to Glacier based on age or other criteria, simplifying data management and reducing costs.

These benefits make Amazon S3 Glacier an attractive choice for organizations that need reliable, secure, and cost-effective long-term data storage solutions.

Data durability with Amazon S3 Glacier

Data durability refers to the likelihood of your data remaining intact and uncorrupted over a period of time. In the context of data storage and cloud services, durability is a measure of how reliably the system stores your data without losing it due to errors or failures.

Several factors contribute to data durability:

  1. Redundancy: This involves storing multiple copies of data across different physical locations or storage media. If one copy becomes corrupted or is lost, other copies remain available for recovery.

  2. Error Checking and Correction: Systems employ various methods to detect and correct errors in data. This includes algorithms that can identify and repair corrupted data segments.

  3. Robust Infrastructure: Durable storage systems are built on hardware and software that are resilient to failures. This includes using high-quality storage media and implementing robust backup systems.

  4. Regular Backups: Regularly backing up data to separate locations or media contributes to overall data durability. This practice ensures that even if the primary data source is compromised, a backup is available.

  5. Disaster Recovery Planning: Preparing for catastrophic events like natural disasters, power outages, or cyber-attacks is crucial. This may involve offsite backups, geographic redundancy, and having contingency plans in place.

All S3 Glacier storage classes provide virtually unlimited scalability and are designed for 99.99999999999% (11 nines) of data durability, indicating that the probability of data loss is extremely low. This is achieved through a combination of the factors mentioned above, ensuring that your data remains safe and intact over the long term.

The S3 Glacier storage classes

Amazon S3 Glacier offers three main storage tiers, each designed for different use cases and access requirements:

  1. S3 Glacier Flexible Retrieval (Formerly Known as S3 Glacier): This tier is optimized for data that is infrequently accessed and where retrieval time of several hours is acceptable. It offers low-cost storage and retrieval costs. The standard retrieval time is typically 3-5 hours, but it also offers options for expedited retrievals (within minutes) for an additional cost. It is an ideal solution for backup, disaster recovery, offsite data storage needs, and for when some data needs to occasionally retrieved in minutes, and you don’t want to worry about costs.

  2. S3 Glacier Deep Archive: This is the lowest-cost storage option in AWS and is intended for data that is rarely accessed. The retrieval time is considerably longer, typically within 12-48 hours. It is particularly suitable for data that may only need to be accessed less than once per year, such as long-term preservation of historical records and scientific data. S3 Glacier Deep Archive is a cost-effective and easy-to-manage alternative to tape. It is designed for customers — particularly those in the financial services, healthcare, media and entertainment and public sector — that retain data sets for 7-10 years or longer to meet customer needs and regulatory compliance requirements. The cost savings compared to the S3 Glacier Flexible Retrieval tier make it an attractive option for storing very large datasets that do not require frequent access.

  3. S3 Glacier Instant Retrieval: This is the newest addition to the S3 Glacier service. It's designed for data that is accessed once per quarter and still requires millisecond retrieval times. It is designed for rarely accessed data that still needs immediate access in performance-sensitive use cases like image hosting, online file-sharing applications, medical imaging and health records, news media assets, and satellite and aerial imaging. This class is ideal for certain types of backup and archival data, where quick access is important but frequent retrieval is not.

The choice between these tiers depends on how frequently you need to access the stored data and how quickly you need to retrieve it.

Amazon S3 Glacier Data Model

The Amazon S3 Glacier data model is structured around several key concepts that define how data is stored, organized, and accessed within the service. Understanding these concepts is crucial for effectively using S3 Glacier for data archiving and backup. The main components of the S3 Glacier data model include:

  1. Vaults: Vaults are the primary containers in S3 Glacier for storing archives. A vault is similar to an Amazon S3 bucket. Each vault is a unique namespace within an AWS account and a specific AWS region. The general form is:

     https://region-specific-endpoint/account-id/vaults/vault-name
    

    You can create as many vaults as needed, and each vault can store an unlimited number of archives. Vaults are used to organize your data, set access policies, and configure vault lock policies for compliance. S3 Glacier supports various vault operations. Vault operations are Region-specific.

  2. Archives: An archive is any object, such as a file or a collection of files, that you store in a vault. An archive is similar to an Amazon S3 object, and is the base unit of storage in S3 Glacier. Archives can represent individual files or be aggregated into larger collections (such as tar or zip files) before being uploaded. Each archive is assigned a unique archive ID within the vault, which you use to access or manage the archive. There is no limit on the size of an individual archive, but larger archives might require multipart upload for efficient uploading.

  3. Jobs: To retrieve data or perform other actions (like inventory retrieval) in S3 Glacier, you initiate a job. Since S3 Glacier is optimized for infrequent access with a retrieval time ranging from minutes to several hours, jobs are asynchronous operations. You submit a job request, and once the job completes, you get a notification (if configured), and then you can download the data.

  4. Data Retrieval Policies: S3 Glacier allows you to set data retrieval policies at the account level. These policies help manage and limit the data retrieval costs by setting retrieval limits and specifying retrieval rates.

  5. Vault Access Policy: A Vault Access Policy in Amazon S3 Glacier is a resource-based policy attached to a vault that defines who has access to the vault and what actions they can perform. It is like a bucket policy. This policy is separate from IAM policies, offering an additional layer of access control specifically for the vault.

  6. Vault Lock Policy: This feature allows you to enforce compliance controls on individual vaults by creating and locking data retention policies. Once locked, the policy cannot be changed, ensuring that the data retention rules comply with corporate or regulatory standards. You can specify controls such as "write once read many" (WORM) in a Vault Lock policy and lock the policy from future edits.

  7. Inventory: S3 Glacier maintains an inventory of all archives in each vault. This inventory is updated approximately once a day. To retrieve a list of archives in a vault, you initiate an inventory retrieval job.

  8. Access Control: You can control access to the data stored in S3 Glacier using AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and resource-based policies.

  9. Encryption: Data stored in S3 Glacier is automatically encrypted at rest using server-side encryption with AES-256.

Understanding these components is essential for effectively utilizing Amazon S3 Glacier for long-term data storage, ensuring data security, compliance, and cost-effective retrieval and management of archives.

Notifications for restore operations in vault via SNS topic

Notifications for restore operations in Amazon S3 Glacier, particularly when integrated with Amazon Simple Notification Service (SNS), are crucial for efficiently managing data retrieval processes. This integration plays a significant role in asynchronous operations like data retrieval from S3 Glacier. Here's an overview of the use case:

  1. Asynchronous Nature of Glacier Retrievals: Retrievals from Amazon S3 Glacier are not immediate. Depending on the retrieval option chosen (e.g., expedited, standard, or bulk), it can take from a few minutes to several hours for the data to be ready for download.

  2. Setting Up Notifications with SNS: To stay informed about the status of these retrieval requests, you can configure SNS notifications. Amazon SNS is a fully managed messaging service that enables you to send messages or notifications. By setting up an SNS topic and linking it to your Glacier vault, you can receive automated messages about the status of your data retrieval jobs.

  3. Use Case Scenario:

    • Initiating a Restore Operation: When you initiate a data retrieval request (a job) in S3 Glacier, it processes the request asynchronously.

    • Configuration of Notifications: You configure your Glacier vault to send notifications to an SNS topic when specific events, like the completion of a retrieval job, occur.

    • Receiving Notifications: Once the retrieval job is complete, S3 Glacier sends a message to the specified SNS topic.

    • Subscribing to the SNS Topic: You, or any intended recipient, can subscribe to this SNS topic. Subscriptions can be in various forms, such as an email, SMS, or even triggering a Lambda function.

    • Action Upon Notification: Upon receiving the notification, you can take appropriate actions, such as downloading the retrieved data. This helps in planning the next steps without the need to continuously check the status of the job manually.

  4. Benefits:

    • Efficiency and Time Management: You don't need to periodically check the job status manually, saving time and effort.

    • Automation of Downstream Processes: The notification can trigger other automated processes, like data processing or backup workflows.

    • Improved Responsiveness: Immediate notifications allow for quicker responses once the data is available.

Using SNS notifications for restore operations in Amazon S3 Glacier enhances the efficiency of data retrieval processes. It provides timely updates on the job status, enabling better resource management and automation of workflows in data-driven applications and scenarios.

Tutorial: Get started with Amazon S3 Glacier via Management Console

Here's a basic tutorial to get started with Amazon S3 Glacier via Management Console:

Step 1: Sign in to AWS Management Console

  • Create or Access Your AWS Account: Go to aws.amazon.com and sign in to your account.

Step 2: Access Amazon S3

  • In the AWS Management Console, search for and select the S3 service.

  • Remember to select the appropriate AWS Region from the top right corner where you want to store your data.

Step 3: Create an S3 Bucket

  • Click on “Create bucket”.

  • Provide a unique name for your bucket and select the region.

  • Configure options and permissions as needed.

  • Click “Create bucket” at the bottom.

Step 4: Enable Glacier Storage Class for Objects

  • Once your bucket is created, click on its name to open it.

  • To upload an object (file) with the Glacier storage class, click “Upload”.

  • Add files and then click on “Properties”.

  • Under “Storage class,” select the desired Glacier storage class (Glacier Instant Retrieval, Glacier Flexible Retrieval, or Glacier Deep Archive).

  • Click “Upload” to finish.

Step 5: Accessing Glacier Data

  • To access or download data stored in Glacier, you'll need to first restore it since it's not immediately accessible.

  • Navigate to the object stored in Glacier, click on it, and select “Initiate restore”.

  • Specify the number of days for which you want the data to be accessible and choose the retrieval speed (if options are available).

  • Click “Initiate restore” to initiate the job.

Step 6: Monitor Restore Job

  • The restoration process can take several minutes to hours, depending on the Glacier storage class and retrieval options.

  • You can check the status by viewing the properties of the object. Once the data is restored, it will be temporarily accessible for the specified number of days.

Step 7: Download Restored Data

  • After the object is restored, you can download it directly from the bucket.

Step 8: Clean Up

  • To avoid additional charges, make sure to delete any objects you no longer need.

  • If you created a bucket specifically for this tutorial, you can delete the bucket as well.

Additional Tips

  • Lifecycle Policies: You can automate the transition of objects to Glacier storage classes using S3 Lifecycle rules.

  • Cost Management: Regularly monitor your AWS usage and costs to avoid unexpected charges, especially if you perform multiple restores or transitions.

References:

  1. Amazon S3 Glacier storage classes

  2. Maximize the value of cold storage with Amazon S3 Glacier

  3. Getting Started with Amazon S3 Glacier

  4. Amazon S3 Glacier Data Model

  5. Amazon S3 Glacier pricing (Glacier API only)

  6. S3 Glacier Data Retrieval Policies

  7. Code examples for S3 Glacier using AWS SDKs

  8. AWS Pi Week 2021: Modernizing your data archive with Amazon S3 Glacier | AWS Events

  9. Vault Access Policies

  10. Vault Lock Policies