Optimizing AWS Costs: Automating EBS Snapshot Cleanup with Lambda

Introduction

Managing cloud costs effectively is paramount for any organization leveraging AWS (Amazon Web Services). One proven strategy to optimize costs involves identifying and removing unused resources. This article will focus on saving storage costs by efficiently detecting and deleting stale EBS (Elastic Block Store) snapshots.

Understanding EBS Snapshots

EBS snapshots are backups for your EBS volumes, which are crucial for data integrity and disaster recovery. However, these snapshots can become orphaned over time when associated instances are terminated or volumes are deleted. Orphaned snapshots continue to occupy storage space and incur unnecessary expenses, impacting your AWS billing.

Automating Cleanup with Lambda

To address this challenge, we will demonstrate how to automate the identification and deletion of stale EBS snapshots using AWS Lambda. Lambda allows you to run code without provisioning or managing servers, making it ideal for cost-effective and scalable automation tasks.

Prerequisites

Before diving into the implementation, ensure the following prerequisites are in place:

  • AWS Account: You need permissions to manage Lambda functions, EBS snapshots, and EC2 instances.

  • AWS CLI: Installed and configured on your local machine for AWS service interactions.

  • Basic IAM Understanding: Ability to create roles and custom policies.

  • Basic AWS Lambda Knowledge: Understanding of how to create and deploy Lambda functions.

  • Python Knowledge: Familiarity with Python programming, as our Lambda function will be implemented in Python.

  • Boto3 Library: Basic knowledge of Boto3, the AWS SDK for Python, which will be used for AWS API interactions.

Step 1: Create an EC2 Instance

You have two options for provisioning an EC2 Instance: you can either create it manually through the AWS Management Console or utilize Terraform for automation. The choice is yours based on your preference and operational needs.

If you prefer to proceed with manual creation, you can follow the detailed guide provided in this article: Unlock the Cloud: A Step-by-Step Guide to Launching Your First EC2 Instance on AWS.

Alternatively, if you opt for automation using Terraform, you can refer to the comprehensive instructions outlined in this article: Creating EC2 Instance Using Terraform on AWS.

Once you have completed either method, we can proceed with the next steps.

Step 2: Verify the Volume

To access details about the root volume of your EC2 instance:

  1. Navigate to the EC2 Console.

  2. Find your instance and click on its ID or name.

  3. Scroll down to the Description tab.

  4. Click the root volume link to view size, state, and type details.

  5. Note that this volume was automatically created during instance setup and serves as the root volume for your EC2 instance.

Step 3: Creating Snapshots of the Volume

To verify the absence of snapshots for our instance:

  1. Navigate to the EC2 Dashboard by searching for EC2 in the services menu.

  2. Locate and click on the Snapshots section in the left-hand menu.

This will allow you to confirm that no snapshots are currently available for our instance.

  • To create a snapshot of the instance click on the Create Snapshot button.

Select the EBS volume from the dropdown menu for which you wish to generate a snapshot. You may also include a description to add further details about the snapshot if needed.

  • Click on the Create Snapshot button to initiate the snapshot creation process.

Step 4: AWS Lambda

To leverage AWS Lambda for executing code in response to events without managing servers, our project focuses on automating the identification and removal of outdated EBS snapshots. Here are the steps to create this Lambda function using the AWS Management Console:

A. Sign in to the AWS Management Console:

  • Access the AWS Management Console and authenticate using your AWS credentials.

B. Navigate to the Lambda Dashboard:

  • Locate and select Lambda from the services menu in the AWS Management Console.
  • Start Creating a Lambda Function:

    • On the Lambda Dashboard, click the Create function button.

C. Choose Authoring Method:

  • Opt for Author from scratch to initiate the creation of a new Lambda function.

  • Configure Basic Details:

    • Specify a descriptive name for your Lambda function.

    • Choose Python 3.12 as the runtime environment.

  • Create the Lambda Function:

    • Click on Create function to proceed with the creation of your Lambda function.

  • Edit Configuration:

    • Navigate to the Configuration tab of your Lambda function and click Edit.

  • Adjust the function timeout setting to "10 seconds" (default is 3 seconds).

  • This setting determines the maximum duration Lambda allows for function execution before terminating it. Ensure to minimize execution time to optimize cost efficiency based on AWS billing parameters.

  • Save your changes by clicking Save.

Step 5: Setup IAM Role

In our project, the Lambda function plays a crucial role in optimizing AWS costs by identifying and deleting stale EBS snapshots.

To accomplish this, the function requires specific permissions: the ability to describe and delete snapshots, and to describe volumes and instances.

Roles are used to delegate access to AWS resources securely, eliminating the need to share long-term credentials such as access keys.

Follow these steps to configure the necessary permissions:

  • Navigate to the Lambda function details page and click on the Configuration tab.

  • Scroll down to the Permissions section and expand it.

  • Click on the execution role link to open the IAM role configuration in a new tab.

In the newly opened tab, you will be directed to the IAM Console with details of the IAM role associated with your Lambda function:

  • Scroll down to the Permissions section of the IAM role details page.
  • Click on the Add inline policy button to create a new inline policy.

To configure the policy:

  • Choose EC2 as the service and filter permissions.

  • Search for Snapshot and add the DescribeSnapshots and DeleteSnapshots permissions.

  • Add the DescribeVolume and DescribeInstances permissions as well.

Under the Resources section:

  • Select All.

  • Click the Next button.

Final steps:

  • Name the policy and click the Create Policy button.

  • Ensure that the newly created policy is attached to the existing role.

Step 6: Writing the Lambda Function

Our Lambda function, powered by Boto3, automates the identification and deletion of stale EBS snapshots. Key features include:

  • Snapshot Retrieval: Fetching owned EBS snapshots and active EC2 instances.

  • Stale Snapshot Detection: Identifying unattached snapshots and checking volume-attachment status.

  • Exception Handling: Ensuring robustness with error management.

  • Cost Optimization: Efficiently managing resources to minimize storage costs.

import boto3

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')

    # Get all EBS snapshots
    response = ec2.describe_snapshots(OwnerIds=['self'])

    # Get all active EC2 instance IDs
    instances_response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
    active_instance_ids = set()

    for reservation in instances_response['Reservations']:
        for instance in reservation['Instances']:
            active_instance_ids.add(instance['InstanceId'])

    # Iterate through each snapshot and delete if it's not attached to any volume or the volume is not attached to a running instance
    for snapshot in response['Snapshots']:
        snapshot_id = snapshot['SnapshotId']
        volume_id = snapshot.get('VolumeId')

        if not volume_id:
            # Delete the snapshot if it's not attached to any volume
            ec2.delete_snapshot(SnapshotId=snapshot_id)
            print(f"Deleted EBS snapshot {snapshot_id} as it was not attached to any volume.")
        else:
            # Check if the volume still exists
            try:
                volume_response = ec2.describe_volumes(VolumeIds=[volume_id])
                if not volume_response['Volumes'][0]['Attachments']:
                    ec2.delete_snapshot(SnapshotId=snapshot_id)
                    print(f"Deleted EBS snapshot {snapshot_id} as it was taken from a volume not attached to any running instance.")
            except ec2.exceptions.ClientError as e:
                if e.response['Error']['Code'] == 'InvalidVolume.NotFound':
                    # The volume associated with the snapshot is not found (it might have been deleted)
                    ec2.delete_snapshot(SnapshotId=snapshot_id)
                    print(f"Deleted EBS snapshot {snapshot_id} as its associated volume was not found.")

This script is pivotal in our AWS cost optimization strategy, demonstrating the effectiveness of serverless computing in streamlining operations and reducing expenses.

Step 7: Testing the Lambda Function

  • To simulate a real-world scenario, start by deleting the existing EC2 instance. When an EC2 instance is deleted, AWS automatically removes the attached EBS volume as illustrated below.

  • However, any EBS snapshots associated with that volume remain in storage, even though they are no longer needed.

  • These snapshots, termed ‘stale,’ incur additional storage costs without serving any purpose. Therefore, it’s crucial to regularly identify and remove such stale snapshots to optimize AWS storage costs effectively.

Once the instance is deleted, we can observe whether our Lambda function successfully identifies and removes any associated snapshots.

Follow these steps:

  1. Terminate the EC2 Instance:

    • Navigate to the created EC2 instance and terminate it.
  2. Set Up the Lambda Function:

    • Navigate to the created Lambda function.

    • Under the Code section in lambda_function paste the above Python code for your Lambda function.

    • Ensure that your code includes the necessary imports (e.g., import boto3) and the lambda_handler function.

  • Once your function passes testing, you can deploy it by clicking on the Deploy button.

The Lambda function automatically finds and deletes these stale snapshots, helping you manage your AWS expenses more efficiently.

By following these steps, you can test whether the Lambda function effectively identifies and deletes the stale snapshots, thus optimizing your AWS storage costs.

Conclusion

In this project, we’ve implemented a solution to automate the identification and deletion of stale EBS snapshots, leveraging AWS Lambda and Boto3. By optimizing storage usage, we have reduced costs and improved resource efficiency. This project demonstrates the effectiveness of automation in driving cost optimization within AWS, setting the stage for continued success in our cloud management endeavors.