Amazon S3 (Simple Storage Service) in Simple Terms: Your Digital Storage Room in the Cloud ☁️
Table of contents
- Introduction
- S3 (Simple Storage Service) ☁️ 🗑️
- | Features of S3 ✨
- | S3 Buckets – Naming Rules
- | S3 Buckets – Sub resources
- | Objects in Amazon S3
- | Versioning in Amazon S3
- | Multifactor Authenticator in Amazon S3
- | Multipart Upload in Amazon S3
- | Storage Classes of Amazon S3 / Types of S3
- | Cross-Region Replication (CRR) in Amazon S3
- | Object Life Cycle Management in Amazon S3
- | Conclusion
Introduction
Imagine you have a room where you keep all your important stuff - your favourite toys, pictures, and maybe even some secret superhero drawings. Now, think of Amazon S3 as a special magical version of that room, but in the digital world.
Storage Space in the Cloud 🌐:
- Just like you have shelves and boxes in your room, S3 provides space in the cloud to store all kinds of digital things - pictures, videos, documents, you name it!
Safe and Secure 🔒:
- S3 is like having a magical lock on your room. It keeps your digital treasures safe with special codes and locks, so only you can get to them.
Easy Access Anytime 👀:
- The cool part? You can open your magical room from anywhere, anytime! Need that funny cat video at midnight? S3 makes it happen.
Digital Cleanup Helper 🧹:
- S3 can also be your digital cleanup buddy. It helps organize your digital toys, and if you want, it can even make old stuff disappear when you don't need it anymore.
Backup in Another World 🌍:
- S3 is like having a backup friend who stores copies of your digital goodies in another secret place. So, even if something happens to your magical room, your treasures are safe and sound.
S3 (Simple Storage Service) ☁️ 🗑️
S3 is Object-level Storage, which means you can store any object/file, with no compulsion of file format, you can use any format of a file.
S3 is a storage for the internet. It has a simple web services interface for simple storing and retrieving of any amount of data, anytime from anywhere on the internet.
S3 is an object-based storage
You cannot install an operating system on S3
S3 has a distributed data store architecture where objects are redundantly stored in Multiple locations (Min 3 locations in some regions)
Data is stored in the bucket.
A bucket is a flat container of objects.
Max capacity of a bucket is 5 TB
You can create folders in your bucket (available through the console)
You cannot create Nested Buckets
Bucket ownership is Non-transferable
The S3 bucket is region specific
You can have up to 100 buckets per account
| Features of S3 ✨
Amazon S3 (Simple Storage Service) is a highly scalable, durable, and secure object storage service provided by Amazon Web Services (AWS). Here are some key features of Amazon S3:
Scalability:
- S3 is designed to scale horizontally, allowing you to store and retrieve any amount of data seamlessly. It can handle virtually unlimited data storage.
Durability and Reliability:
- S3 is designed for 99.999999999% (11 9's) durability, meaning that your data is highly resilient to hardware failures and is redundantly stored across multiple devices and locations.
Availability:
- S3 provides high availability, ensuring that your data is accessible whenever you need it. It is built to offer a robust and reliable service with low-latency access to stored objects.
Security:
- S3 offers multiple layers of security to protect your data, including bucket policies, access control lists (ACLs), and AWS Identity and Access Management (IAM) roles. You can also use server-side encryption to encrypt your data at rest.
Versioning:
- S3 supports versioning, allowing you to keep multiple versions of an object in the same bucket. This is useful for data recovery and maintaining a history of changes.
Lifecycle Management:
- You can configure lifecycle policies to automatically transition objects between storage classes or delete them after a certain period. This helps in cost optimization and data management.
Multi-Region Replication:
- S3 provides Cross-Region Replication (CRR) for replicating objects across different AWS regions, enhancing data resilience and supporting business continuity and disaster recovery strategies.
Event Notifications:
- You can configure event notifications to trigger AWS Lambda functions or Amazon Simple Notification Service (SNS) notifications based on specific events in your S3 bucket, such as object creation or deletion.
Multipart Upload:
- Multipart Upload enables you to upload large objects in parts, which can be uploaded in parallel. This is useful for optimizing throughput, reliability, and efficiency for large files.
Data Transfer Acceleration:
- S3 Transfer Acceleration allows you to accelerate uploading and downloading of objects to and from S3 by using the Amazon CloudFront content delivery network.
Query-in-Place with Amazon S3 Select:
- Amazon S3 Select allows you to run SQL queries directly against your data stored in S3, reducing the amount of data you need to transfer and process.
Requester Pays:
- With Requester Pays, the requester of the data (rather than the bucket owner) pays for the data transfer and request costs. This is useful in scenarios where you want to share data with others but avoid incurring all the costs.
| S3 Buckets – Naming Rules
Amazon S3 follows certain naming rules for buckets and objects:
Bucket Names:
Must be between 3 and 63 characters in length.
Can contain only lowercase letters, numbers, hyphens (-), and periods (.).
Must start and end with a lowercase letter or number.
Object Names (Keys):
Can be any Unicode character.
Can be a maximum of 1024 bytes in length.
| S3 Buckets – Sub resources
Lifecycle: To delete an objects lifecycle management
Website: To hold configurations related to a static website hosted in S3 buckets.
Versioning: Keep the object version as it changes (get updated), we can enable or suspend, we cannot disable it) o Access control list – Bucket Policies The name is simply two parts bucket regions endpoint/bucket name
acl (Access Control List): Used to retrieve or update the access control list of a bucket or object. Example:
https://s3.amazonaws.com/bucket/object?acl
versionId: Used to specify a particular version of an object in versioned buckets. Example:
https://s3.amazonaws.com/bucket/object?versionId=xyz
| Objects in Amazon S3
An object size stored in an S3 bucket can be bytes to 5 TB.
Each object is stored and retrieved by a unique key (ID or name)
An object in AWS S3 is uniquely identified and addressed through
Service endpoint
Bucket name
Object key (name)
Optionally object Version
Objects stored in an S3 bucket in a region will never leave that region unless you specifically move them to another region of CRR
A bucket owner can grant cross-account permission to another AWS account ( or user in another account ) to upload objects
You can grant S3 bucket/object permission to:
Individual Users
AWS Account
Make the resource public or authenticate the user
| Versioning in Amazon S3
Bucket versioning is an S3 Bucket sub resource used to protect against accidental object /data deletion or overwrites.
Versioning can also be used for data retention and archiving.
Once you enable versioning on a Bucket, it cannot be disabled, however, it can be suspended.
When enabled, Bucket versioning will protect the existing and new object and maintains their versions as they are updated
Updating object refers to PUT, POST, COPY, and DELETE actions on the object.
When versioning is enabled and you try to delete an object, a delete marker is placed on the object.
You can still view the object and the delete marker.
If you reconsider deleting the objects, you can delete the “delete marker” and the object will be available again.
You will be charged for all S3 storage cost for all object
You can use versioning with S3 Lifecycle policies to delete an older version, or you can move them to cheaper S3 storage ( or Glacier )
Bucket Versioning state:
Enabled
Suspended
Un-versioned
Versioning applies to all objects in a bucket and is not partially applied
Object existing before enabling versioning will have a version ID or “NULL”
If you have a Bucket that is already versioned, then you suspend versioning existing object and their versions remain as it is.
However, they will not be updated/versioned further with future updates while the Bucket versioning is suspended.
New objects (uploaded after suspension) will have a version ID “null”. If the same key(name) is used to store another object, it will override the existing one.
An object deletion in a suspended versioning Bucket will only delete the objects with ID “null”.
| Multifactor Authenticator in Amazon S3
Multi-factor authentication (MFA) adds an extra layer of security to your Amazon S3 (Simple Storage Service) account by requiring an additional authentication factor beyond the standard username and password. In the context of S3, MFA is often associated with protecting the deletion of objects or the modification of versioning configurations in a bucket.
Here's how MFA works in Amazon S3:
MFA Device Setup:
- To enable MFA on an S3 bucket, you need to associate a compatible MFA device with your AWS account. This MFA device is often a physical device or a virtual MFA application on a smartphone.
Bucket Versioning:
- MFA is often used in conjunction with versioning. When versioning is enabled for a bucket, multiple versions of an object can exist in that bucket. MFA is required to perform certain operations on versioned buckets, such as permanently deleting objects or suspending versioning.
Enabling MFA Delete:
- MFA can be enforced for object deletions by enabling "MFA Delete" on a versioned bucket. With MFA Delete enabled, you need to provide a valid MFA code in addition to your AWS credentials to delete objects.
aws s3api put-bucket-versioning --bucket <bucket-name> --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "SERIALNUMBER=<mfa-serial>,CODE=<mfa-code>"
- Replace
<mfa-serial>
and<mfa-code>
with the serial number of your MFA device and the current MFA code, respectively.
Using MFA for Operations:
- Once MFA Delete is enabled, any attempt to delete objects or suspend versioning will require providing the MFA code along with the AWS credentials.
aws s3 rm s3://<bucket-name>/<object-key> --mfa "SERIALNUMBER=<mfa-serial>,CODE=<mfa-code>"
- Replace
<object-key>
with the key of the object you want to delete.
Why Use MFA in S3:
Enhanced Security:
- MFA adds an additional layer of authentication, making it more difficult for unauthorized users to perform critical actions on your S3 bucket.
Preventing Accidental Deletions:
- By requiring MFA for deletions, you reduce the risk of accidental data loss. It adds a deliberate step to the deletion process.
Control over Versioning:
- MFA provides control over versioning operations. Without a valid MFA code, users cannot permanently delete versioned objects or suspend versioning.
Compliance Requirements:
- In certain industries or for specific compliance requirements, MFA may be mandated to secure sensitive data stored in S3.
While MFA is a powerful security feature, it's important to manage MFA credentials securely and ensure that MFA is configured appropriately based on your security and compliance needs.
| Multipart Upload in Amazon S3
Multipart upload is a feature in Amazon S3 that allows you to upload large objects in parts, improving efficiency, reliability, and performance. This is particularly useful when dealing with files that are too large to be uploaded in a single operation. Multipart upload enables parallelization of uploads and provides the ability to retry uploading failed parts.
Here's how multipart upload works in Amazon S3:
Why Multipart Upload?
Efficiency:
- Multipart upload allows you to upload parts of an object concurrently, optimizing data transfer speed and efficiency.
Resilience:
- If a part fails to upload, you only need to re-upload that specific part rather than the entire object. This enhances reliability, especially when dealing with large files and potentially unstable network connections.
Pause and Resume:
- Multipart upload enables you to pause and resume uploads. This is helpful when dealing with intermittent network issues or when you need to distribute the upload process over time.
Steps to Perform Multipart Upload:
Initialize Multipart Upload:
- Start by initiating the multipart upload and receive an upload ID.
aws s3api create-multipart-upload --bucket <bucket-name> --key <object-key>
Upload Parts:
- Break your large object into smaller parts and upload them in parallel. Each part is assigned a unique part number.
aws s3api upload-part --bucket <bucket-name> --key <object-key> --upload-id <upload-id> --part-number 1 --body <part1-file>
- Repeat this step for each part.
Complete Multipart Upload:
- After uploading all parts, complete the multipart upload to assemble them into a single object.
aws s3api complete-multipart-upload --bucket <bucket-name> --key <object-key> --upload-id <upload-id> --multipart-upload '{"Parts": [{"PartNumber": 1,"ETag": "<part1-etag>"}, {"PartNumber": 2,"ETag": "<part2-etag>"}]}'
- Replace
<part1-etag>
,<part2-etag>
, and other placeholders with the ETags obtained during the part uploads.
Benefits of Multipart Upload:
Parallelism:
- Multipart upload allows multiple parts to be uploaded concurrently, taking advantage of available bandwidth and improving overall upload speed.
Fault Tolerance:
- If a part fails to upload, only that specific part needs to be re-uploaded, reducing the impact of failures and ensuring the successful upload of the entire object.
Optimized for Large Files:
- Multipart upload is specifically designed for large files, avoiding limitations imposed by the maximum object size in a single operation.
Use Cases:
Uploading large video files, backups, or datasets.
Optimizing uploads for users with varying network conditions.
Efficiently handling interruptions in the upload process.
In summary, multipart upload in Amazon S3 is a powerful feature for handling large objects with efficiency, resilience, and flexibility. It's a key tool when dealing with files that exceed the standard upload size limitations.
| Storage Classes of Amazon S3 / Types of S3
Amazon S3 Standard (Very fast)
S3 Standard offers high durability availability and performance object storage for frequently accessed data.
Durability for 99.999999999 %
Designed for 99.99% availability over a given year
Support SSL for data in transit and encryption of data at rest.
The storage cost for the object is fairly high, but there is very less charge for accessing the objects.
The largest object that can be uploaded in a single PUT in 5GB.
Use Cases: Use Standard for a wide range of use cases, including big data analytics, mobile and gaming applications, content distribution, backup, and archival.
aws s3 cp <local-file> s3://<bucket-name>/<object-key>
Amazon S3 Standard Infrequent Access (Cost less but you pay to access it more frequently)
S3-IA is for data that is accessed less frequently but requires rapid access when needed.
The storage cost is much cheaper than s3 standard. almost half the price. But you are charged more heavily for accessing your objects.
Durability is 99.999999999%
Resilient against events that impact an entire AZ
Availability is 99.99% in the year
Support SSL for data in transit and encryption of data at rest.
Data that is deleted from S3-IA within 30 days will be charged for a full 30 days.
Backed with the Amazon S3 service level agreement for availability.
Amazon S3 – Intelligent Tiering
The S3 Intelligent tiering storage class is designed to optimize cost by automatically moving data to the most cost-effective access tier
It works by storing objects in two access tiers.
If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access ties.
There are no retrieval fees when using the S3- Intelligent tiering storage class and no additional tiering fee when objects are moved between access tiers.
Same low latency and high performance of S3 standard
Object less than 128KB cannot move to IA
Durability is 99.999999999%
Availability is 99.99%
Lifecycle of S3 Intelligent tiering is: First the storage will store in Intelligent tiering then after 30 days move it to IA and then if you have access again then move to Standard
Use Cases: Use Intelligent-Tiering for data with changing access patterns, where you want to optimize costs without managing the data lifecycle manually.
aws s3 cp <local-file> s3://<bucket-name>/<object-key> --storage-class INTELLIGENT_TIERING
Amazon S3 One Zone - IA A (Store single copy of your data, 20% less cost than standard)
S3 one Zone IA is for data that is accessed less frequently, but requires rapid access when needed.
Data store in single AZ.
Ideal for those, who want lower cost option of IA-data.
It is good choice for storing secondary backup copies of on-premise data or easily recreatable data.
You can use S3 lifecycle policies
Durability is 99.999999999%
Availability is 99.5%
Because S3 one Zone IA stores data in a Single AZ, data stored in this storage class will be lost in the event of AZ destruction.
Use Cases: Use One Zone-IA for data that can be recreated or is non-critical and can be easily reproduced in case of loss.
aws s3 cp <local-file> s3://<bucket-name>/<object-key> --storage-class ONEZONE_IA
Amazon S3 Glacier (Cheapest, long term)
S3 Glacier is a secure, durable low-cost storage class for data archiving
To keep cost low yet for varying needs, S3 glacier provides three retrieval options that range from a few minutes to hours.
You can upload object directly to glacier or use Lifecycle policies.
Durability is 99.999999999%
Data is resilient in the event of one entire AZ destruction.
Support SSL for data in transit and encryption data at rest.
You can retrieve 10 GB of your amazon S3 glacier data per month for free with free tier account.
Use Cases: Use Glacier for data archiving where access times are less critical, and costs need to be minimized.
aws s3 cp <local-file> s3://<bucket-name>/<object-key> --storage-class GLACIER
Amazon S3 Glacier Deep Archive (Very Cheapest, Very Long term)
S3 Glacier Deep Archive is Amazon s3 cheapest storage
Design to retain data for a long period e.g., 10 years
All objects stored in S3 glacier deep achieve are replicated and stored across at least three geographically dispersed AZs.
Durability is 99.999999999%
Ideal alternative to magnetic tape libraries
Retrieval time within 12 hours
Storage cost is up to 75% less than for the existing S3 - Glacier storage class
Availability is 99.99%
Use Cases: Use Glacier Deep Archive for data that is rarely accessed and where the retrieval time is not critical.
aws s3 cp <local-file> s3://<bucket-name>/<object-key> --storage-class DEEP_ARCHIVE
It's essential to carefully consider your specific use case, access patterns, and cost requirements when choosing the appropriate S3 storage class for your data.
| Cross-Region Replication (CRR) in Amazon S3
Cross-Region Replication (CRR) in Amazon S3 is a feature that enables automatic and asynchronous replication of objects across different AWS regions. This means that when you upload an object to a source bucket in one region, a copy of that object is automatically replicated to a destination bucket in another region. This provides data redundancy, fault tolerance, and compliance with regulatory requirements that might mandate data to be stored in specific geographic locations.
Here's a realistic example to illustrate Cross-Region Replication:
Scenario: Imagine you have a business with critical data stored in an Amazon S3 bucket located in the US East (N. Virginia) region (us-east-1
). Due to business continuity and disaster recovery requirements, you want to ensure that a copy of this data is stored in another region, let's say US West (Oregon) region (us-west-2
).
Steps to Set Up Cross-Region Replication:
Create a Destination Bucket:
- In the US West (Oregon) region, create a new S3 bucket. This will be your destination bucket.
aws s3api create-bucket --bucket <destination-bucket-name> --region us-west-2
Enable Versioning on Both Buckets:
- Ensure that versioning is enabled on both the source and destination buckets. Cross-Region Replication works with versioning to replicate changes to objects.
aws s3api put-bucket-versioning --bucket <source-bucket-name> --versioning-configuration Status=Enabled
aws s3api put-bucket-versioning --bucket <destination-bucket-name> --versioning-configuration Status=Enabled
Set Up Cross-Region Replication:
- Configure Cross-Region Replication rules to replicate objects from the source bucket to the destination bucket.
aws s3api put-bucket-replication --bucket <source-bucket-name> --replication-configuration '{
"Role": "arn:aws:iam::account-id:role/replication-role",
"Rules": [
{
"Destination": {
"Bucket": "arn:aws:s3:::<destination-bucket-name>"
},
"Status": "Enabled"
}
]
}'
Note: Replace account-id
with your AWS account ID.
Example: Let's say you upload a file example.txt
to your source bucket in the US East (N. Virginia) region:
aws s3 cp example.txt s3://<source-bucket-name>/
With Cross-Region Replication set up, the example.txt
file will be automatically replicated to the destination bucket in the US West (Oregon) region.
Use Cases:
Business Continuity and Disaster Recovery: In the event of a regional outage or disaster, having a copy of your data in another region ensures business continuity and minimizes downtime.
Compliance and Data Residency: Some industries and regulations require data to be stored in specific geographic locations. Cross-Region Replication helps in meeting these compliance requirements.
Reduced Latency: If you have users or applications located closer to a specific region, you can use Cross-Region Replication to store a copy of data in that region, reducing data access latency.
It's important to consider the costs associated with data transfer and storage in both regions when implementing Cross-Region Replication. Additionally, monitoring and logging should be configured to track replication events and ensure the integrity of replicated data.
| Object Life Cycle Management in Amazon S3
Amazon S3 provides a feature called Object Lifecycle Management, which allows you to automatically manage the lifecycle of your objects by defining rules that transition objects between storage classes or expire them. This feature is useful for optimizing costs and ensuring that data is stored in the most cost-effective manner based on its lifecycle.
Here's an explanation of S3 Object Lifecycle Management along with an example:
Object Lifecycle Management Concepts:
Transition Actions:
- Transition actions define the conditions under which objects transition from one storage class to another. For example, you can transition objects from the Standard storage class to the Infrequent Access (IA) storage class after a certain number of days.
Expiration Actions:
- Expiration actions define the conditions under which objects expire and are automatically deleted. This is useful for managing the retention of objects, such as log files or temporary files, to avoid unnecessary storage costs.
Example Scenario:
Let's consider a scenario where you have a bucket for storing log files, and you want to automatically manage the lifecycle of these log files based on certain conditions:
Create a Bucket:
- Create an S3 bucket for storing log files.
aws s3api create-bucket --bucket <your-bucket-name> --region <your-region>
Enable Versioning:
- Enable versioning on the bucket. Lifecycle policies work with versioning to manage different versions of objects.
aws s3api put-bucket-versioning --bucket <your-bucket-name> --versioning-configuration Status=Enabled
Create a Lifecycle Configuration:
- Create a JSON file (
lifecycle-config.json
) with the lifecycle configuration rules. For example:
- Create a JSON file (
{
"Rules": [
{
"ID": "MoveToIAAfter30Days",
"Filter": {
"Prefix": "logs/"
},
"Status": "Enabled",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
}
]
},
{
"ID": "ExpireAfter365Days",
"Filter": {
"Prefix": "logs/"
},
"Status": "Enabled",
"Expiration": {
"Days": 365
}
}
]
}
This configuration specifies two rules:
Move objects with the prefix "logs/" to the Infrequent Access storage class after 30 days.
Expire (delete) objects with the prefix "logs/" after 365 days.
Apply the Lifecycle Configuration:
- Apply the lifecycle configuration to your bucket.
aws s3api put-bucket-lifecycle-configuration --bucket <your-bucket-name> --lifecycle-configuration file://lifecycle-config.json
Result:
Objects with the prefix "logs/" in your bucket will automatically transition to the Infrequent Access storage class after 30 days.
Objects with the prefix "logs/" will be automatically deleted (expired) after 365 days.
Use Cases:
Cost Optimization: Transitioning objects to lower-cost storage classes as they age helps optimize storage costs.
Data Retention: Automatically deleting or archiving objects after a certain period helps manage data retention policies.
Compliance: Ensure that data is retained or deleted in compliance with regulatory requirements.
It's important to carefully plan and test lifecycle configurations to ensure they align with your specific use case and business requirements. Additionally, consider monitoring and logging to track lifecycle events and ensure that the configured policies are working as expected.
| Conclusion
Amazon S3 isn't just a storage service; it's a comprehensive ecosystem designed to meet the diverse needs of modern businesses. Scalability, security, and a plethora of features make it the go-to solution for organizations of all sizes. So, unleash the power of Amazon S3 and elevate your data storage experience to new heights! 🚀🌐