Amazon Simple Storage Services S3

Amazon S3 is a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of websites. The service not only aims to maximize benefits of scale and to pass those benefits on to developers, it is also secure, durable, scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data, and applications on Amazon Web Services

S3 Benefits

High performance:– Amazon S3 supports multipart uploads to help maximize network throughput and resilience and lets users choose the AWS region to store their data close to the end user and minimize network latency. Amazon S3 is integrated with Amazon CloudFront, a content delivery web service that distributes content to end users with 

  • Low latency, 
  • High data transfer speeds, and 
  • No minimum usage commitments.

Scalable:– users can scale up or scale down anytime as per your business requirements. Built to be flexible so that protocol or functional layers can easily be added. 

  • The default download protocol is HTTP, and the S3 API also supports HTTPS.
  • AWS CLI and SDK use secure HTTPS connections by default.

Low cost:– Amazon S3 is very cost effective and allows users to store a large amount of data at a low cost. There is no minimum cost associated with S3, and users pay only for what they need. 

  • There are no up-front costs associated with S3. 
  • With the volume discount, the more data user store, the cheaper it becomes.

Easy to manage:– The Amazon S3 storage management feature allows users to take a data-driven approach to storage optimization data security and management efficiency. As a result

  • S3 provides a simple and robust abstraction for file storage that frees customers from many underlying details.
  • Amazon S3 storage is independent of a server, and it manages data as blocks or files using SCSI, CIFS, or NFS protocols, data is managed as objects using an Application Program Interface (API) built on standard HTTP verbs

Using Amazon S3 customers can Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. 

  • The number of objects they can store is unlimited
  • Amazon S3 objects are automatically replicated on multiple devices in multiple facilities within a region.

Simple:– S3 has an intuitive graphical web-based console in which the data can be uploaded, downloaded, and managed. 

  • S3 also has a mobile app in which it can be managed. 
  • For easy integration with third parties, S3 provides REST APIs and SDKs.

Durable:– The underlying infrastructure of S3 is designed to deliver 99.999999999% durability, provides comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements and gives customers flexibility in the way they manage data for cost optimization, access control, and compliance. The data is stored across multiple data centers and in multiple devices in a redundant manner. 

Secured:– Secured Amazon S3 supports encryption, and the data can be automatically encrypted once it is uploaded. It provides functionality to simplify manageability of data through its lifetime. Includes options for  

  • Segregating data by buckets
  • Monitoring and controlling spend 
  • Automatically archiving data to even lower cost storage options.
  • Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. 
  • Objects can be made private or public, and rights can be granted to specific users

 Easy integration:– Amazon S3 can be easily integrated with third-party tools. As a result, it is easy to build an application on top of S3.

  • It can be used alone or in conjunction with other AWS services.
  • It offers a very high level of integration with many other AWS cloud services.

Amazon S3 object contains both data and metadata. Objects reside in containers called buckets, and each object is identified by a unique user-specified key (filename). Buckets are a simple flat folder with no file system hierarchy.(more later).

  • Amazon S3 is the only cloud storage solution with query-in-place functionality, allowing users to run powerful analytics directly on their data at rest in S3.

S3 Features

Amazon S3 offers various features for its cutomers, where they can use to organize and manage their data in ways that support specific use cases, enable cost efficiencies, enforce security, and meet compliance requirements. Data is stored as objects within resources called “buckets”, and a single object can be up to 5 terabytes in size. S3 features include capabilities to append metadata tags to objects, move and store data across the S3 Storage Classes, configure and enforce data access controls, secure data against unauthorized users, run big data analytics, and monitor data at the object and bucket levels. Objects can be accessed through S3 Access Points or directly through the bucket hostname.

Bucket Policies

Bucket policies provide centralized access control to buckets and objects based on a variety of conditions, including Amazon S3 operations, requesters, resources, and aspects of the request (for example, IP address).

  • The policies are expressed in the access policy language and enable centralized management of permissions.
  • The permissions attached to a bucket apply to all of the objects in that bucket.

Security

Amazon S3 offers flexible security features to block unauthorized users from accessing customers data.

  • Using VPC endpoints customers can connect to S3 resources from their Amazon Virtual Private Cloud (Amazon VPC).
  • Amazon S3 supports both server-side encryption (with three key management options) and client-side encryption for data uploads.
  • Customers can use S3 Inventory to check the encryption status of their S3 objects.

Access Control Lists

Access control lists (ACLs) are one of the resource-based access policy options that AWS customers can use to manage access to their buckets and objects.

  • Customers can use ACLs to grant basic read/write permissions to other AWS accounts. There are limits to managing permissions using ACLs. 
  • Customers can grant permissions only to other AWS accounts. However, they can’t grant permissions to users in their account.
  • Customers can’t grant conditional permissions, nor can they explicitly deny permissions.

Data Transfer

AWS has a suite of data migration services that make transferring data into the AWS Cloud simple, fast, and secure. S3 Transfer Acceleration is designed to maximize transfer speeds to S3 buckets over long distances.

  • Customers who want to keep their on-premises applications and enable a cloud storage architecture can use AWS Storage Gateway (a hybrid cloud storage service) to seamlessly connect on-premises environments to Amazon S3. 
  • AWS clients can automate transferring data between on-premises storage and AWS (including Amazon S3) by using AWS DataSync, which can transfer data at speeds up to 10 times faster than open-source tools.

Storage Classes

Amazon S3 offers a range of storage classes designed for different use cases. These include Amazon S3 STANDARD for general-purpose storage of frequently accessed data, Amazon S3 STANDARD_IA for longlived, but less frequently accessed data, and GLACIER for long-term archive. Amazon S3 offers a range of storage classes for the objects that its customers store. 

  • Storage Classes for Frequently Accessed Objects.
  • Storage Class That Automatically Optimizes Frequently and Infrequently Accessed Objects.
  • Storage Classes for Infrequently Accessed Objects.
  • Storage Classes for Archiving Objects.

Access management 

To protect clients data in Amazon S3, users only have access to the S3 resources they create. Clients can grant access to other users by using one or a combination of the following access management features:
  • AWS Identity and Access Management (IAM) to create users and manage their respective access;
  • Access Control Lists (ACLs) to make individual objects accessible to authorized users;
  • bucket policies to configure permissions for all objects within a single S3 bucket;
  • S3 Access Points to simplify managing data access to shared data sets by creating access points with names and permissions specific to each application or sets of applications; and
  • Query String Authentication to grant time-limited access to others with temporary URLs. Amazon S3 also supports Audit Logs that list the requests made against your S3 resources for complete visibility into who is accessing what data. 

Query in place

Query in place Amazon S3 has a built-in feature and complimentary services that query data without needing to copy and load it into a separate analytics platform or data warehouse. This means that AWS customers can run big data analytics directly on their data stored in Amazon S3.
  • S3 Select is an S3 feature designed to increase query performance by up to 400%, and reduce querying costs as much as 80%.
  • It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object, which can be up to 5 terabytes in size.
  • Amazon S3 is also compatible with AWS analytics services Amazon Athena and Amazon Redshift Spectrum
  • Amazon Athena queries customers data in Amazon S3 without needing to extract and load it into a separate service or platform.
  • Amazon Redshift Spectrum also runs SQL queries directly against data at rest in Amazon S3, and is more appropriate for complex queries and large data sets (up to exabytes).

S3 Use Cases

Backup and Storage:– Backup and archive for on-premises or cloud data. Customers can build scalable, durable, and secure backup and restore solutions with Amazon S3 and other AWS services, such as S3 Glacier, Amazon EFS, and Amazon EBS, to augment or replace existing on-premises capabilities.

Backup and Storage:– Backup and archive for on-premises or cloud data. Customers can build scalable, durable, and secure backup and restore solutions with Amazon S3 and other AWS services, such as S3 Glacier, Amazon EFS, and Amazon EBS, to augment or replace existing on-premises capabilities.

Archive:– Customers can retire physical infrastructure, and archive data with S3 Glacier and S3 Glacier Deep Archive. These S3 Storage Classes retain objects long-term at the lowest rates. Simply create an S3 Lifecycle policy to archive objects throughout their lifecycles, or upload objects directly to the archival storage classes.

Disaster recovery(DR):– With Amazon S3 storage, S3 Cross-Region Replication, and other AWS compute, networking, and database services, customers can create DR architectures in order to quickly and easily recover from outages caused by natural disasters, system failures, and human errors.

Data lake (Big data analytics Static website hosting):–A data lake is a central place for storing massive amounts of data that can be processed, analyzed, and consumed by different business units in an organization.

Hybrid cloud storage:– Customers can create a seamless connection between on-premises applications and Amazon S3 with AWS Storage Gateway in order to reduce your data center footprint, and leverage the scale, reliability, and durability of AWS, as well as AWS’ innovative machine learning and analytics capabilities.

Static Website hosting:– Customers can configure a static website to run from an S3 bucket. 

  •  A static website is one where the con- tent does not change and remains static. A static web site may contain some client-side scripts, but the content of the web site is almost stagnant all the time. 
  • A dynamic web site is one where the content changes frequently, and a lot of server-side processing happens by running scripts built on PHP, JSP, and so on.

Application Hosting – Provide services that deploy, install, and manage web applications. Customers can build fast, cost-effective mobile and Internet-based applications by using AWS services and Amazon S3 to store production data.

 

Media Hosting (Content distribution:– Customers can build a redundant, scalable, and highly available infrastructure that hosts video, photo, or music uploads and downloads 

  • The service includes Content, media, and software storage and distribution.

Private repository :– Customers can create their own private repository like with Git, Yum, or Maven.

Software Delivery – Host your software applications that customers can download.

Buckets 

A bucket is a container (web folder) for objects (files) stored in Amazon S3

A bucket is a container (web folder) for objects (files) stored in Amazon S3

For each bucket, customers can control access to it (who can create, delete, and list objects in the bucket), view access logs for it and its objects, and choose the geographical region where Amazon S3 will store the bucket and its contents.. 

  • Amazon S3 stores data as objects within buckets. 
    • An object consists of a file and optionally any metadata that describes that file.
    • To store an object in Amazon S3, customers upload the file they want to store to a bucket.

A bucket is a container (web folder) for objects (files) stored in Amazon S3, and this object stored in a flat namespace organized by the bucket. Every Amazon S3 object is contained in a bucket, and a single object can be up to 5 terabytes in size. Buckets are root level folders. Any subfolder within a bucket is known as a “folder”. Bucket serves the following purposes:

  • Organizes the Amazon S3 namespace at the highest level
  • Identifies the account responsible for charges
  • Plays a role in access control
  • Serves as the unit of aggregation for usage reporting

Buckets and objects are resources, and Amazon S3 provides APIs for customers to manage them. Customers can use either Amazon S3 API to create a bucket and upload objects on it, or they can use Amazon S3 console to perform these operations. The console uses the Amazon S3 APIs to send requests to Amazon S3. When customers upload a file, they can set permissions on the object as well as any metadata. Bucket names must follow a set of rules:

  • Names must be unique across all of AWS.
  • Names must be 3 to 63 characters in length.
  • Names can only contain lowercase letters, numbers and hyphens.
  • Names cannot be formatted as an IP address.
  • Bucket names should not contain underscores (_)
  • Bucket names should not end with a dash

Buckets form the top-level namespace for Amazon S3, and its names are global. This means that customers bucket names must be unique across all AWS accounts.This also means after a bucket is created, the name of that bucket cannot be used by another AWS account in any AWS Region until the bucket is deleted. 

Although the namespace for Amazon S3 buckets is global, each Amazon S3 bucket is created in a specific region where the customers choose. Customers can create and use buckets that are located close to a particular set of end users or customers in order to minimize latency, or located in a particular region to satisfy data locality and sovereignty concerns, or for disaster recovery and compliance needs.

 

Objects

Object Keys:– Objects stored in an S3 bucket are identified by a unique identifier called a key. These keys need to be unique within a single bucket. However, different buckets can contain objects with the same key. Since Amazon S3 is storage for the Internet, every Amazon S3 object can be addressed by a unique URL formed using the web services endpoint, the bucket name, and the object key.

  • A key can be up to 1024 bytes of Unicode UTF-8 characters, including embedded slashes, backslashes, dots, and dashes. 
  • The Amazon S3 data model is a flat structure: Customers create a bucket, and the bucket stores objects. There is no hierarchy of sub-buckets or subfolders.

Version ID:– Within a bucket, a key and version ID uniquely identify an object. 

  • The version ID is a string that Amazon S3 generates when customers add an object to a bucket.

Value – The content that customers are storing. An object value can be any sequence of bytes. Objects can range in size from zero to 5 TB.

Subresources – Amazon S3 uses the subresource mechanism to store object-specific additional information.

  • Because subresources are subordinates to objects, they are always associated with some other entity such as an object or a bucket. For more information, see Object Subresources.

Access Control Information – AWS customers can control access to the objects you store in Amazon S3.

  • Amazon S3 supports both the resource-based access control, such as an access control list (ACL) and bucket policies, and user-based access control.

Objects are entities or files stored in Amazon S3 buckets, and  each object can store virtually any kind of data in any format. Each Amazon S3 object has data, a key, and metadata. The object key (the file itself) uniquely identifies the object in a bucket. Object metadata (data about the file) is a set of name-value pairs. The data portion of an Amazon S3 object is opaque to Amazon S3, which simply means the object’s data is treated as a stream of bytes. 

Metadata:– The metadata associated with an Amazon S3 object is a set of name/value pairs that describe the object. There are two types of metadata: 

  • System metadata:– System metadata is created and used by Amazon S3 itself, and it includes things like the date last modified, object size, MD5 digest, and HTTP Content-Type.
    • For each object stored in a bucket, Amazon S3 maintains a set of system metadata. Amazon S3 processes this system metadata as needed.
    • User metadata:– User metadata can only be specified at the time an object is created. 
Objects stored in an S3 bucket are identified by a unique identifier called a key

Objects stored in an S3 bucket are identified by a unique identifier called a key

S3 storage classes

Amazon S3 offers a range of storage classes for the objects that customers store. They can choose a class depending on their use case scenario and performance access requirements. All of these storage classes offer high durability.

  • S3 Standard (durable, immediately available, frequently accessed).
  • S3 Standard-IA (durable, immediately available, infrequently accessed).
  • S3 One Zone-IA (lower cost for infrequently accessed data with less resilience).
  • S3 Intelligent-Tiering (automatically moves data to the most cost-effective tier).
  • S3 Glacier (archived data, retrieval times in minutes or hours).
  • S3 Glacier Deep Archive (lowest cost storage class for long term retention
  • Amazon EC2 is free to try. There are four ways to pay for Amazon EC2 instances: 
    • On-Demand, 
    • Reserved Instances, and 
    • Spot Instances. 
    • Customers can also pay for Dedicated Hosts which provide them with EC2 instance capacity on physical servers dedicated to your use.

Infrequently Accessed Objects(IA)

IA is for older data that is accessed infrequently, but that still requires millisecond access. The Standard_IA and Onezone_IA storage classes are designed for long-lived and infrequently accessed data. 

  • Standard_IA:- Amazon S3 stores the object data redundantly across multiple geographically separated Availability Zones. Its objects are resilient to the loss of an Availability Zone. This storage class offers greater availability and resilience than the Onezone_IA class.
    • Standard – IA offers the high durability, throughput, and low latency of Amazon S3 Standard, with a low per GB storage price and per GB retrieval fee.
    • Customers can use it for primary or only copy of data that can’t be recreated.
  • Onezone_IA:-  S3 One Zone-IA is for data that is accessed less frequently, but requires rapid access when needed. Unlike other S3 Storage Classes which store data in a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA. Amazon S3 stores the object data in only one Availability Zone, which makes it less expensive than Standard_IA..
    • Using this customers can recreate the data if the Availability Zone fails, and for object replicas when setting cross-region replication (CRR).
    • S3 One Zone-IA offers the same high durability†, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. 
Amazon S3 Storage Classes

Amazon S3 Storage Classes

Frequently Accessed Objects

For performance-sensitive use cases and frequently accessed data. The best use case for this class is for performance-sensitive use cases, such as those that require millisecond access time, and frequently accessed data. Amazon S3 provides two kinds of storage classes for frequently accessed Objects: 

  • Standard:- This is the default storage class if the client doesn’t specify the storage class when they upload an object, Amazon S3 assigns the STANDARD storage class. 
    • Delivers low latency and high throughput, perfect for a wide variety of use cases. 
    • There is no retrieval fee, minimum object size or minimum storage duration.
    • Same low latency and high throughput performance of S3 Standard Small monthly monitoring and auto-tiering fee
    • Automatically moves objects between two access tiers based on changing access patterns
    • Resilient against events that impact an entire
    • Availability Zone Supports SSL for data in transit and encryption of data at rest S3
    • Lifecycle management for automatic migration of objects to other S3 Storage Classes
  • Reduced Redundancy:- The Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data that can be stored with less redundancy than the STANDARD storage class.

Frequently and Infrequently Accessed Objects

Intelligent -Tiering delivers automatic cost savings by moving data on a granular object level between two access tiers, a frequent access tier and a lower-cost infrequent access tier. The S3 Intelligent-Tiering storage class is designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead. It works by storing objects in two access tiers: one tier that is optimized for frequent access and another lower-cost tier that is optimized for infrequent access. 

  • The Intelligent -Tiering storage class is ideal for customers, who want to optimize storage costs automatically for long-lived data when access patterns are unknown or unpredictable. The Intelligent -Tiering storage class stores objects in two access tiers: 
    • One tier:- One tier that is optimized for frequent access.
    • Lower-cost tier:- Lower-cost tier that is optimized for infrequently accessed data.

Amazon Glacier

Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and service that is mainly used for data archiving and long-term backup. Amazon Glacier retrieval jobs typically complete in 3 to 5 hours. Just like S3, Glacier is extremely secure and durable and provides the same security and durability as S3. Amazon Glacier enables customers to offload the administrative burdens of operating and scaling storage to AWS so that they don’t have to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time-consuming hardware migrations.

  • Amazon Glacier is designed for use with other Amazon web services. You can seamlessly move data between Amazon Glacier and Amazon S3 using S3 data life-cycle policies.
  • Customers can use Amazon Glacier in such a way that, they can archiving offsite enterprise information, media assets, and research and scientific data, and also performing digital preservation and magnetic tape replacement.

Amazon Glacier is designed to provide average annual durability of 99.999999999 percent (11 nines) for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores customers data across multiple facilities before returning SUCCESS on uploading an archive.

Amazon Glacier scales to meet growing and often unpredictable storage requirements. A single archive is limited to 40 TB in size, but there is no limit to the total amount of data that customers can store in the service. Whether the customer want to store petabytes or gigabytes, Amazon Glacier automatically scales their storage up or down as needed seamlessly.

Amazon Glacier uses server-side encryption to encrypt all data at rest. Amazon Glacier handles key management and key protection for its clients by using one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES256). Clients who want to manage their own keys can encrypt data prior to uploading it.

There are two ways that AWS clients can use the Amazon Glaciers.

  1. Amazon Glacier provides a native, standards-based REST web services interface. This interface can be accessed using the Java SDK or the .NET SDK. Customers can use the AWS Management Console or Amazon Glacier API actions to create vaults to organize the archives in Amazon Glacier.
  2. Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management that provides automatic, policy-driven archiving from Amazon S3 to Amazon Glacier. Clients can simply set one or more life-cycle rules for an Amazon S3 bucket, defining what objects should be transitioned to Amazon Glacier and when.
S3 PUT API for direct uploads to S3 Glacier, and S3 Lifecycle management for automatic migration of objects

S3 PUT API for direct uploads to S3 Glacier, and S3 Lifecycle management for automatic migration of objects

 Glacier provides three retrieval options that range from a few minutes to hours

Glacier provides three retrieval options that range from a few minutes to hours

Data stored in the GLACIER storage class has a minimum storage duration period of 90 days. AWS clients can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. To keep costs low yet suitable for varying needs, Amazon Glacier provides three retrieval options that range from a few minutes to several hours.

    • Expedited retrieval typically returns data in 1-5 minutes, and it is  great for Active Archive use cases. 
      • The expedited retrieval cost is $0.03 per gigabyte. 
    • Standard retrieval typically completed between 3-5 hours, and work well for less time-sensitive needs like backup data, media editing, or long-term analytics. 
      • The standard retrieval cost is $0.01 per gigabyte
    • Bulk retrieval are the lowest-cost retrieval option, returning large amounts of data within 5-12 hours.
      • The bulk retrieval cost is $0.0025 per gigabyte.

Amazon Glacier has a vault like a safe deposit box or locker. Customers can group multiple archives and put them in a vault. They can say a vault is like a container for an archive. A vault gives them the ability to organize their data residing in Amazon Glacier.

  • Amazon Glacier Vault Lock allows customers to easily deploy and enforce compliance controls on individual Glacier vaults via a lockable policy. 
  • Customers can specify controls such as a Write Once Read Many (WORM) in a Vault Lock policy and lock the policy from future edits. 
    • Once locked, the policy becomes immutable, and Glacier will enforce the prescribed controls to help achieve customers compliance objectives. 
  • Glacier maintains a cold index of archives refreshed every 24 hours, which is known as an inventory or vault inventory. 
  • Whenever customers want to retrieve an archive and vault inventory, they need to submit a Glacier job, which is going to run behind the scenes to deliver them the files requested.

 

Deep-Archive

Deep-Archive is used for archiving data that rarely needs to be accessed. Data stored in the Deep-Archive storage class has a minimum storage duration period of 180 days and a default retrieval time of 12 hours.

  • The Amazon S3 Glacier Deep-Archive storage class provides two retrieval options ranging from 12-48 hours.
  • DEEP_ARCHIVE is the lowest cost storage option in AWS.

Both Amazon S3 Glacier and S3 Glacier Deep Archive storage classes offer sophisticated integration with AWS CloudTrail to log, monitor and retain storage API call activities for auditing, and supports three different forms of encryption.

All of the storage classes except for Onezone_IA are designed to be resilient to simultaneous complete data loss in a single Availability Zone and partial loss in another Availability Zone.

Encryption

Versioning 

Amazon S3 versioning helps protects customers data against accidental or malicious deletion by keeping multiple versions of each object in the bucket, identified by a unique version ID. 

  • Versioning allows customers to preserve, retrieve, and restore every version of every object stored in their Amazon S3 bucket. 
  • If a the clients makes an accidental change or even maliciously deletes an object in their S3 bucket, they can restore the object to its original state simply by referencing the version ID in addition to the bucket and object key. 
  • Versioning is turned on at the bucket level. Once enabled, versioning cannot be removed from a bucket; it can only be suspended.

Cross-origin resource sharing (CORS

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. 

  • Customers can build rich client-side web applications with Amazon S3 and selectively allow cross-origin access to their Amazon S3 resources.
  • Cross-region replication customers to asynchronously replicate all new objects in the source bucket in one AWS region to a target bucket in another region. Any metadata and ACLs associated with the object are part of the replication.
  • Cross-region replication is commonly used to reduce the latency required to access objects in Amazon S3 by placing objects closer to a set of users or to meet requirements to store backup data at a certain distance from the original source data.

Amazon S3 enables customers to store, retrieve, and delete objects. They can retrieve an entire object or a portion of an object. If the customers have enabled versioning on their bucket, they can retrieve a specific version of the object.

  • Uploading objects:– Clients can upload objects of up to 5 GB in size in a single operation. 
  • Copying objects:– The copy operation creates a copy of an object that is already stored in Amazon S3. Customers can create and copy an object up to 5 GB in size in a single atomic operation.

Deleting Objects from a Version-Enabled Bucket

Customers with version-enabled buckets have multiple versions of the same object in the bucket. Options to delete version-enabled buckets:

  • Specify a non-versioned delete request:– Customers can specify only the object’s key, and not the version ID. In this case, Amazon S3 creates a delete marker and returns its version ID in the response. This makes the object disappear from the bucket.
  • Specify a versioned delete request:– Customers need to specify both the key and also a version ID. In this case the following two outcomes are possible:
    • If the version ID maps to a specific object version, then Amazon S3 deletes the specific version of the object.
    • If the version ID maps to the delete marker of that object, Amazon S3 deletes the delete marker. This makes the object reappear in the bucket.

Deleting Objects from an MFA-Enabled Bucket

MFA Delete adds another layer of data protection on top of bucket versioning. MFA Delete requires additional authentication in order to permanently delete an object version or change the versioning state of a bucket. 

  • MFA Delete requires an authentication code (a temporary, one-time password) generated by a hardware or virtual Multi-Factor Authentication (MFA) device. 
  • The MFA Delete can only be enabled by the root account.

S3 Batch Operations

S3 Batch Operations help customers to manage billions of objects stored in Amazon S3, with a single API request or a few clicks in the S3 Management Console. 

    • AWS customers can make changes to object properties and metadata, and perform other storage management tasks such as copying objects between buckets, replacing tag sets, modifying access controls, and restoring archived objects from Amazon S3 Glacier.
    • S3 Batch Operations manages retries, tracks progress, sends notifications, generates completion reports, and delivers events to AWS CloudTrail for all changes made and tasks executed.

Data encryption can happen either on clients’ side (client-side encryption) or on AWS (server-side encryption or SSE). When customers encrypt data on their side, the data transferred to S3 is already encrypted. S3 never sees the raw data. Server-side encryption is different because customers send the raw data to S3 where it is encrypted.

Clients can encrypt in flight and at rest. To encrypt their Amazon S3 data in transit, they can use the Amazon S3 Secure Sockets Layer (SSL) API endpoints. This ensures that all data sent to and from Amazon S3 is encrypted while in transit using the HTTPS protocol. 

To encrypt the clients Amazon S3 data at rest, ghey can use several variations of Server-Side Encryption (SSE). All SSE performed by Amazon S3 and AWS Key Management Service (Amazon KMS) uses the 256-bit Advanced Encryption Standard (AES). Clients can also encrypt their Amazon S3 data at rest using Client-Side Encryption, encrypting their data on the client before sending it to Amazon S3.

Server-side encryption:- In this case clients send unencrypted raw data to AWS, and the AWS infrastructure encrypt the raw data then store it on the disk. When the clients retrieve data, AWS reads the encrypted data from the disk, decrypts the data, and sends raw data back to them. The en/decryption is transparent to the AWS user.

  • SSE-AES:– AWS handles encryption and decryption for clients on the server-side using the aes256 algorithm. AWS also controls the secret key that is used for encryption/decryption.
  • SSE-KMS (AWS managed CMK):— SSE-KMS is very similar to SSE-AES. The only difference is that the secret key (aka AWS managed Customer Master Key (CMK)) is provided by the KMS service and not by S3.
  • SSE-KMS (customer managed CMK):— AWS clients can manage the secret key (aka Customer managed Customer Master Key) using the KMS service. 
    • Clients can create a Customer Master Key (CMK) and reference that key for encryption/decryption.
    • At any time, they can delete the CMK to make all data useless. 
    • They have full control over the CMK by customizing the key policy.
  • SSE-C:– With SSE-C, AWS clients are in charge of the secret key while AWS still cares about encryption/decryption. Every time clients call the S3 API, they also have to attach the secret key.

Client-side encryption:– Client-side encryption means that AWS clients encrypt the data before they send it to AWS. It also means that they decrypt the data that they retrieve from AWS. Client-side encryption needs to be deeply embedded into their application. Clients have two options for using data encryption keys:

  • Use an AWS KMS-managed customer master key.
  • Use a client-side master key.

AWS SDK + KMS:- Clients can use the AWS SDK to upload/download files from S3. The KMS service can generate data keys that clients can use for encryption/decryption. The data key itself is encrypted using the KMS Customer Master Key. 

  • If the clients want to use the encrypted data key, they have to send the encrypted data to the KMS service and ask for decryption. The decrypted data key is only returned if the CMK is still available and the clients have  permission to use it.

When using client-side encryption, clients retain end-to-end control of the encryption process, including management of the encryption keys. For maximum simplicity and ease of use

Access Control (ACL):- Access control lists (ACLs) are one of the resource-based access policy options that allow customers to manage and access their buckets and objects. Customers can also ACLs to grant basic read/write permissions to other AWS accounts. In order the customers to give controlled access to others, Amazon S3 provides:

    • Coarse-grained access controls:– (Amazon S3 Access Control Lists [ACLs]):– Amazon S3 ACLs enables customers to grant certain coarse-grained permissions such READ, WRITE, or FULL-CONTROL at the object or bucket level. 
    • Fine-grained access controls (Amazon S3 bucket policies, AWS Identity and Access Management [IAM] policies, and query-string authentication):– 
    • Fine-grained access control allows Amazon QuickSight account administrators to control authors’ default access to connected AWS resources. Fine-grained access control enables administrators to use IAM policies to scope down access permissions, limiting specific authors’ access to specific items within the AWS resources.
      • Amazon QuickSight is a business analytics service,  which customers can use to build visualizations, perform ad hoc analysis, and get business insights from their data. 
      • Amazon QuickSight can automatically discover AWS data sources and also works with customers data sources. 
      • Amazon QuickSight also enables organizations to scale to hundreds of thousands of users, and delivers responsive performance by using a robust in-memory engine called SPICE.
    • Amazon S3 bucket policies are the recommended access control mechanism for Amazon S3 and provide much finer-grained control. Amazon S3 bucket policies are very similar to IAM policies,