Amazon Elastic Block Store (EBS)
Amazon Elastic Block Store (EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale. Amazon EBS provides persistent block-level storage volumes for use with Amazon EC2 instances. Persistent storage means the storage is independent outside the lifespan of an EC2 instance. EBS volumes provide durable block-level storage for use with Amazon EC2 instances. EBS provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. AWS customers can mount these volumes as devices on their instances.
- Amazon EBS volumes are available in a variety of types that differ in performance characteristics and price.
- Although multiple Amazon EBS volumes can be attached to a single Amazon EC2 instance, a volume can only be attached to a single instance at a time.
- EBS is designed for mission-critical systems, EBS volumes are replicated within an Availability Zone (AZ) and can easily scale to petabytes of data. AWS clients can use EBS Snapshots with automated lifecycle policies to back up their volumes in Amazon S3,
In Amazon EBS, data changes relatively frequently and needs to persist beyond the life of EC2 instance. Amazon EBS is well-suited for use as the primary storage for a database or file system, or for any application or instance (operating system) that requires direct access to raw block-level storage. Amazon EBS provides a range of options that allow customers to optimize storage performance and cost for their workload. These options are divided into two major categories:
- solid-state drive (SSD)-backed storage for transactional workloads such as databases and boot volumes (performance depends primarily on IOPS) and
- hard disk drive (HDD)-backed storage for throughput-intensive workloads such as big data, data warehouse, and log processing (performance depends primarily on MB/s).
Amazon EBS provides the ability to save point-in-time snapshots of its clients volumes to Amazon S3. Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved.
- This minimizes the time required to create the snapshot and saves on storage costs by not duplicating data. Since Amazon EBS Snapshots are stored incrementally, only the blocks that have changed after clients’ last snapshot are saved, and they are billed only for the changed blocks.
- Customers can back up the data on their Amazon EBS volumes to Amazon S3 by taking point-in-time snapshots. When they delete a snapshot, only the data unique to that snapshot is removed.
- Each snapshot contains all of the information that is needed to restore your data (from the moment when the snapshot was taken) to a new EBS volume.
- EBS snapshots are incremental, point-in-time backups, containing only the data blocks changed since the last snapshot.
EBS is built to be secure for data compliance. Newly created EBS volumes can be encrypted by default with a single setting in your account. EBS volumes support encryption of data at-rest, data in-transit, and all volume backups. It offers seamless encryption of both EBS boot volumes and data volumes as well as snapshots, eliminating the need to build and manage a secure key management infrastructure.
- These encryption keys are Amazon-managed or keys that you create and manage using the AWS Key Management Service (AWS KMS).
- EBS encryption is supported by all volume types, includes built-in key management infrastructure, and has zero impact on performance.
- Datain-motion security occurs on the servers that host EC2 instances, providing encryption of data as it moves between EC2 instances and EBS volumes.
- Access control plus encryption offers a strong defense-in-depth security strategy for your data.
EBS volumes are created in a specific Availability Zone, and can then be attached to any instances in that same Availability Zone. Amazon EBS volumes are designed to be highly available and reliable. EBS volume data is replicated across multiple servers in a single Availability Zone to prevent the loss of data from the failure of any single component. Taking snapshots of your EBS volumes increases the durability of the data stored on your EBS volumes.
- To make a volume available outside of the Availability Zone, AWS customers can create a snapshot and restore that snapshot to a new volume anywhere in that Region.
- Customers can copy snapshots to other Regions and then restore them to new volumes there, making it easier to leverage multiple AWS Regions for geographical expansion, data center migration, and disaster recovery.
Customers can create their EBS volumes as encrypted volumes, in order to meet a wide range of data-at-rest encryption requirements for regulated/audited data and applications. Amazon EBS encryption offers a straight-forward encryption solution for customers EBS resources that doesn’t require them to build, maintain, and secure their own key management infrastructure. It uses AWS Key Management Service (AWS KMS) customer master keys (CMK) when creating encrypted volumes and snapshots.
- When customers create an encrypted EBS volume and attach it to a supported instance type, data stored at rest on the volume, disk I/O, and snapshots created from the volume are all encrypted.
- Encryption operations occur on the servers that host EC2 instances, ensuring the security of both data-at-rest and data-in-transit between an instance and its attached EBS storage.
- EBS encrypts customers volume with a data key using the industry-standard AES-256 algorithm.
Amazon EBS enables its clients to increase storage without any disruption to their critical workloads. Build applications that require as little as a single GB of storage, or scale up to petabytes of data — all in just a few clicks.
- Performance metrics, such as bandwidth, throughput, latency, and average queue length, are available through the AWS Management Console.
- These metrics, provided by Amazon CloudWatch, allow the clients to monitor the performance of their volumes to make sure that they are providing enough performance for their applications without extra cost.
Amazon offers a REST management API for Amazon EBS, as well as support for Amazon EBS operations within both the AWS SDKs and the AWS CLI.
- The API actions and EBS operations are used to create, delete, describe, attach, and detach EBS volumes for your EC2 instances; to create, delete, and describe snapshots from Amazon EBS to Amazon S3; and to copy snapshots from one region to another.
- The AWS Management Console gives customers all the capabilities of the API in a browser interface.
EBS Volume Types
As described previously, Amazon EBS provides a range of volume types that are divided into two major categories: SSD-backed storage volumes and HDD-backed storage volumes. SSD-backed storage volumes offer great price/performance characteristics for random small block workloads, such as transactional applications, whereas HDD-backed storage volumes offer the best price/performance characteristics for large block sequential workloads. You can attach and stripe data across multiple volumes of any type to increase the I/O performance available to your Amazon EC2 applications. The following table presents the storage characteristics of the current generation volume types.
SSD-backed volume is ideal for transactional workloads, such as databases and boot volumes (performance depends primarily on IOPS).
- SSD-backed volumes include the highest performance io1 for latency-sensitive transactional workloads and gp2, which balances price and performance for a wide variety of transactional data. .
- The performance of a block storage device is commonly measured and quoted in a unit called IOPS, short for input/output operations per second.
- SSD-backed volumes include General Purpose SSD (gp2) that balance price and performance for a wide variety of transactional data and the highest performance Provisioned IOPS SSD (io1) for latency-sensitive transactional workloads.
HDD-backed storage is good for throughput intensive workloads, such as MapReduce and log processing (performance depends primarily on MB/s).
- It also optimizes large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS.
- HDD-backed volumes include Throughput Optimized HDD (st1) for frequently accessed, throughput intensive workloads and the lowest cost Cold HDD (sc1) for less frequently accessed data.
Throughput-Optimized HDD volumes are low-cost HDD volumes designed for frequent access, throughput-intensive workloads such as big data, data warehouses, and log processing.
- Throughput Optimized HDD is designed for applications that require larger storage and bigger throughput, such as big data or data warehousing, where IOPS is not that relevant. St1 volumes, much like SSD gp2, use a burst model, where the initial baseline throughput is tied to the volume size, and credits are accumulated over time.
- Volumes can be up to 16 TB with a maximum IOPS of 500 and maximum throughput of 500 MB/s. These volumes are significantly less expensive than general purpose SSD volumes.
- ST1 is backed by hard disk drives (HDDs) and is ideal for frequently accessed, throughput intensive workloads with large datasets and large I/O sizes, such as MapReduce, Kafka, log processing, data warehouse, and ETL workloads.
- Low cost HDD volume designed for frequently accessed, throughput intensive workloads
- Big data, data warehouses, log processing.
- HDD st1:– Which can be used for frequently accessed, throughput-intensive workloads.
- It is good when the workload customers are going to run defines the performance metrics in terms of throughput instead of IOPS. The hard drives are based on magnetic drives
- HDD (sc1):– Which is for less frequently accessed data that has the lowest cost
Provisioned IOPS SSD (io1)
Provisioned IOPS SSD volumes are designed to meet the needs of I/O-intensive workloads, particularly database workloads that are sensitive to storage performance and consistency in random access I/O throughput.
- For customers, who have an IO-intense workload such as databases, predictable and consistent IO performance.
- Customers can use technologies such as RAID on top of multiple EBS volumes to stripe and mirror the data across multiple volumes.
- I/O-intensive NoSQL and relational databases
- It provide predictable, high performance and are well suited for:
- Critical business applications that require sustained IOPS performance
- Large database workloads.
- Consistently performs at provisioned level, up to 20,000 IOPS maximum
- I3:– High I/O instances. This family includes the High Storage Instances that provide Non-Volatile Memory Express (NVMe) SSD backed instance storage optimized for low latency, very high random I/O performance, high sequential read throughput and provide high IOPS at a low cost.
- D2:– Dense-storage instances. D2 instances feature up to 48 TB of HDD-based local storage, deliver high disk throughput, and offer the lowest price per disk throughput performance on Amazon EC2.
- For workloads requiring greater network performance, many instance types support enhanced networking.
- Enhanced networking reduces the impact of virtualization on network performance by enabling a capability called Single Root I/O Virtualization (SR-IOV). This results in:
- More Packets Per Second (PPS)
- Lower latency, and
- Less skittishness
General-purpose SSD volumes offer cost-effective storage that is ideal for a broad range of workloads. They deliver strong performance at a moderate price point that is suitable for a wide range of workloads.
- General Purpose SSD volume balances price performance for a wide variety of transactional workloads
- General-purpose SSD volumes are billed based on the amount of data space provisioned, regardless of how much data you actually store on the volume.
- Boot volumes, low-latency interactive apps, dev & test.
- General-Purpose SSD delivers single-digit millisecond latencies, which is actually a good use case for the majority of workloads.
- gp2s can deliver between 100 to 10,000 IOPS.
Some use cases that are a good fit for gp2 are
- System boot volumes,
- Applications requiring
- System boot volumes,
low latency, virtual desktops, development and test environments
- It’s suited for a wide range of workloads where the very highest disk performance is not critical, such as:
- System boot volumes
- Small- to medium-sized databases
- Development and test environments
- T2 instances are Burstable Performance Instances that provide a baseline level of CPU performance with the ability to burst above the baseline.
- M4 instances are the latest generation of General Purpose Instances. This family provides a balance of compute, memory, and network resources, and it is a good choice for many applications.
Cold HDD:– Cold HDD volumes are designed for less frequently accessed workloads, such as colder data requiring fewer scans per day. SC1 is backed by hard disk drives (HDDs) and provides the lowest cost per GB of all EBS volume types. It is ideal for less frequently accessed workloads with large, cold datasets. Similar to st1, sc1 provides a burst model.
- Volumes can be up to 16 TB with a maximum IOPS of 250 and maximum throughput of 250 MB/s. These volumes are significantly less expensive than Throughput-Optimized HDD volumes.
- COLD HDD defines performance in terms of throughput instead of IOPS. The use case for COLD HDD is noncritical, cold data workloads and is designed to support infrequently accessed data. Similar to st1, sc1 uses a burst-bucket.
- Lowest cost HDD volume designed for less frequently accessed workloads.
- Colder data requiring fewer scans per day
Magnetic volume:– Magnetic volumes have the lowest performance characteristics of all Amazon EBS volume types. Magnetic volumes are billed based on the amount of data space provisioned, regardless of how much data you actually store on the volume. A magnetic EBS volume can range in size from 1 GB to 1 TB and will average 100 IOPS, but has the ability to burst to hundreds of IOPS.
They are best suited for:
- Workloads where data is accessed infrequently Sequential reads
- Situations where low-cost storage is a requirement
- Magnetic volumes are billed based on the amount of data space provisioned, regardless on how much data customers actually store on the volume.
- Cold workloads where data is infrequently accessed
- Scenarios where the lowest storage cost is important.
Several factors, including I/O characteristics and the configuration of your instances and volumes, can affect the performance of Amazon EBS. Customers who follow the guidance on our Amazon EBS and Amazon EC2 product detail pages typically achieve good performance out of the box. However, there are some cases where customers may need to do some tuning in order to achieve peak performance on the platform. This topic discusses general best practices as well as performance tuning that is specific to certain use cases.
- Benchmarking AWS EBS Workloads with Fio
- Oracle Orion
Benchmarking AWS EBS Workloads with Fio
One of the main components of AWS EBS performance is I/O. Applications running on the AWS EC2 instance submits read and write operations to an EBS volume. Each operation then converted to a system call to the kernel,
- The kernel knows that the underlying file system is a virtualized block storage, and through internal mechanisms the kernel will redirect the read/write operation to the I/O domain where the I/O operation will pass through a grant mapping process to finally, once the I/O is mapped, be performed in the EBS volume.
- When customers create a new EBS volume they need to provide the size and the type of the volume.
- General Purpose SSD (gp2),
- Provisioned IOPS SSD (io1),
- Throughput Optimized HDD (st1), Cold HDD (sc1), and Magnetic.
Tools customers can use to benchmark the performance of EBS volumes
Oracle Orion:– Is used to gocalibratte the I/O performance of storage systems to be used with Oracle databases. Oracle Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database.
- Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle.
- Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.
- Orion can run tests using different I/O loads to measure performance metrics such as MBPS, IOPS, and I/O latency.
- Load is expressed in terms of the number of outstanding asynchronous I/Os.
- For random workloads, using either large or small sized I/Os, the load level is the number of outstanding I/Os.
- For large sequential workloads, the load level is a combination of the number of sequential streams and the number of outstanding I/Os per stream.
- Testing a given workload at a range of load levels can help you understand how performance is affected by load.
RAID 1:– can mirror two volumes together. RAID 1. Mirrored (take one disk, mirror a copy to another disk), Redundancy
- A RAID 1 array offers a “mirror” of your data for extra redundancy. Before you perform this procedure, you need to decide how large your RAID array should be and how many IOPS you want to provision.
- The resulting size and bandwidth of a RAID 1 array is equal to the size and bandwidth of the volumes in the array.
- Its ideal to use when fault tolerance is more important than I/O performance; for example, as in a critical application.
- It is safer from the standpoint of data durability.
- Does not provide a write performance improvement; requires more Amazon EC2 to Amazon EBS bandwidth than non-RAID configurations because the data is written to multiple volumes simultaneously.
RAID 5;–at least 3 disks, good for reads, bad for writes, AWS does not recommend ever putting RAID 5’s on EBS
RAID 10:– Striped & Mirrored, good redundancy, good performance
Amazon EBS does not recommend RAID 5 and RAID 6 because the parity write operations of these RAID modes consume some of the IOPS available to your volumes
fio:– fio was created to allow benchmarking specific disk IO workloads. It can issue its IO requests using one of many synchronous and asynchronous IO APIs, and can also use various APIs which allow many IO requests to be issued with a single API call.
- It helps tune how large the files fio uses.
- What offsets in those files IO is to happen at, how much delay if any there is between issuing IO requests.
- What if any filesystem sync calls are issued between each IO request.
- A sync call tells the operating system to make sure that any information that is cached in memory has been saved to disk and can thus introduce a significant delay.
- The options to fio allow customers to issue very precisely defined IO patterns and see how long it takes their disk subsystem to complete these tasks.
- fio is packaged in the standard repository for Fedora 8 and is available for openSUSE through the openSUSE Build Service.
AWS Snowball in transit
There are two ways to get started with Snowball. Customers can create an import or export job using the AWS Snowball Management Console or they can use the Snowball Job Management API and integrate AWS Snowball as a part of your data management solution.
- The primary functions of the API are to create, list, and describe import and export jobs, and it uses a simple standards-based REST web services interface.
Customers also have two ways to locally transfer data between a Snowball appliance and their on-premises data center.
- The Snowball client, available as a download from the AWS Import/Export Tools page, is a standalone terminal application that you run on your local workstation to do your data transfer. They can use simple copy (cp) commands to transfer data, and handling errors and logs are written to their local workstation for troubleshooting and auditing.
- The second option to locally transfer data between a Snowball appliance and your on-premises data center is the Amazon S3 Adapter for Snowball, which is also available as a download from the AWS Import/Export Tools page. You can programmatically transfer data between your on-premises data center and a Snowball appliance using a subset of the Amazon S3 REST API commands.
Snowball is a petabyte-scale data physical storage solution that uses devices designed to be secure to transfer large amounts of data into and out of the AWS Cloud. AWS Snowball accelerates moving large amounts of data into and out of AWS using secure Snowball appliances. AWS transfers customers data directly onto and off of Snowball storage devices using Amazon’s high-speed internal network and bypasses the Internet. The AWS Snowball is simple to connect to customers existing networks and applications. Customers can initiate a Snowball request through the AWS Management Console.
- The Snowball appliance is purpose-built for efficient data storage and transfer, including a high-speed, 10 Gbps network connection designed to minimize data transfer times, allowing you to transfer up to 80 TB of data from your data source to the appliance in 2.5 days, plus shipping time.
- While All AWS Regions have 80 TB Snowballs, US Regions have both 50 TB and 80 TB models. The Snowball appliance is rugged enough to withstand an 8.5-G jolt.
For datasets of significant size, transferring data with Snowball is simple, fast, more secure, and can be as little as one-fifth the cost of transferring data via high-speed Internet. AWS Snowball supports importing data into and exporting data from Amazon S3 buckets.
- Customers use Snowball to migrate analytics data, genomics data, video libraries, image repositories, backups, and to archive part of data center shutdowns, tape replacement or application migration projects.
- AWS Snowball Client is software that is installed on a local computer and is used to identify, compress, encrypt, and transfer data.
Parallelization can also help achieve maximum performance of customers data transfer. This could involve one or more of the following parallelization types:
- Using multiple instances of the Snowball client on a single workstation with a single Snowball appliance;
- Using multiple instances of the Snowball client on multiple workstations with a single Snowball appliance; and/or
- Using multiple instances of the Snowball client on multiple workstations with multiple Snowball appliances.
Customers can integrate Snowball with IAM to control which actions a user can perform. They can give the IAM users on their AWS account access to all Snowball actions or to a subset of them. Similarly, an IAM user that creates a Snowball job must have permissions to access the Amazon S3 buckets that will be used for the import operations.
- AWS KMS protects the encryption keys used to protect data on each Snowball appliance. All data loaded onto a Snowball appliance is encrypted using 256-bit encryption.
A job in AWS Snowball (Snowball) is a discrete unit of work, defined when the client creates it in the console or the job management API. Jobs have types, details, and statuses. Each of those elements is covered in greater detail in the sections that follow. There are two different job types:
- import jobs and export jobs. Both of the Snowball job types are summarized following, including the source of the data, how much data can be moved, and the result the client can expect at successful job completion.
- Although these two types of jobs have fundamental differences, they share some common details. The source can be local to your data center or office, or it can be an Amazon S3 bucket.
- Each import or export job for Snowball is defined by the details that customers specify when it’s created, which include name, type, ID, date, speed, IAM role ARN, AWS KMS key, Snowball capacity, Storage service, and Resources
Snowball includes a 10GBaseT network connection (both RJ45 as well as SFP+ with either a fiber or copper interface) to minimize data transfer times. The Snowball device is designed to transfer multiple terabytes of data from your data source to the device in about a day, plus shipping time.
Snowball includes a ruggedized case designed to be both durable and portable. The Snowball device weighs less than 50 pounds, so it’s portable.
Snowball uses an innovative, E Ink shipping label designed to ensure the device is automatically sent to the correct AWS facility and also aids in tracking. Once you have completed your data transfer job, it can be tracked via Amazon Simple Notification Service (SNS), text messages, and the Console.
Snowball supports APIs that enable customers and partners to integrate their own applications with Snowball. The Snowball Job Management API lets customers create and manage jobs outside of the AWS Management Console. In addition, the Snowball S3 Adapter gives customers direct access to Snowball as if it were a S3 endpoint.
All data transferred to Snowball is automatically encrypted with 256-bit encryption keys that you can manage by using the AWS Key Management Service (KMS). The encryption keys are never sent to, or stored on the device, to help ensure your data stays secure during transit.
The Snowball device is equipped with tamper-resistant seals and includes a built-in Trusted Platform Module (TPM) that uses a dedicated processor designed to detect any unauthorized modifications to the hardware, firmware, or software. AWS inspects every device for any signs of tampering and to verify that no changes were detected by the TPM.
Once the data transfer job has been processed and verified, AWS performs a software erasure of the Snowball device that follows the National Institute of Standards and Technology (NIST) guidelines for media sanitization.
Key pairs:- Secure login information for your instances using key pairs (AWS stores the public key, and you store the private key in a secure place)
Snowball Best Practices
The workstation should be a powerful computer, able to meet high demands in terms of processing, memory, and networking.
Run simultaneous instances of the Snowball client in multiple terminals, each using the copy operation to speed up your data transfer.
Workstation should be the local host for customers data.
Files must be in a static state while being copied. Files that are modified while they are being transferred are not imported into Amazon S3.
Don’t save a copy of the unlock code in the same location in the workstation as the manifest for that job. Saving the unlock code and manifest separately helps prevent unauthorized parties from gaining access to the Snowball.
To protect this potentially sensitive information, delete these logs after the job that the logs are associated with enters Completed status.
To prevent data corruption, don’t disconnect the Snowball or change its network settings while transferring data.
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. Customers use Storage Gateway to simplify storage management and reduce costs for key hybrid cloud storage use cases. These include moving tape backups to the cloud, reducing on-premises storage with cloud-backed file shares, providing low latency access to data in AWS for on-premises applications, as well as various migration, archiving, processing, and disaster recovery use cases.
- AWS Storage Gateway connects an on-premises software appliance with cloud-based storage to provide seamless integration with data security features between your on-premises IT environment and the AWS storage infrastructure.
- It provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier.
- For disaster recovery scenarios, AWS Storage Gateway, together with Amazon EC2, can serve as a cloud-hosted solution that mirrors your entire production environment.
- Customers can download the AWS Storage Gateway software appliance as a virtual machine (VM) image that they install on a host in their data center or as an EC2 instance.
- Gateway-cached volumes minimize the need to scale your on-premises storage infrastructure while still providing your applications with low-latency access to their frequently accessed data.
- Gateway-stored volumes store primary data locally, while asynchronously backing up that data to AWS. These volumes provide on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups.
- A gateway-VTL allows customers to perform offline data archiving by presenting their existing backup application with an iSCSI-based virtual tape library consisting of a virtual media changer and virtual tape drives.
A volume gateway provides cloud-backed storage volumes that customers can mount as Internet Small Computer System Interface (iSCSI) devices from their on-premises application servers.
- The volume gateway is deployed into your on-premises environment as a VM running on VMware ESXi, KVM, or Microsoft Hyper-V hypervisor.
- Data written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
When connecting to the Volume Gateway with the iSCSI block interface, The gateway uses two modes configurations: cached and stored.
- Cached volumes:– Customers store their primary data in Amazon S3 and retain your frequently accessed data locally in cache.
- Stored volumes:– Customers store their entire data set locally, while making an asynchronous copy of your volume in Amazon S3 and point-in-time EBS snapshots.
- Customers often choose the volume gateway to backup local applications, and use it for disaster recovery based on EBS Snapshots, or Cached Volume Clones.
- The Volume Gateway integration with AWS Backup enables customers to use the AWS Backup service to protect on-premises applications that use Storage Gateway volumes.
- Using AWS Backup with Volume Gateway helps centralize backup management, reduce operational burden, and meet compliance requirements.
A tape gateway provides cloud-backed virtual tape storage. The tape gateway is deployed into on-premises environment as a VM running on VMware ESXi, KVM, or Microsoft Hyper-V hypervisor.
- The Tape Gateway presents itself to existing backup application as an industry-standard iSCSI-based virtual tape library (VTL), consisting of a virtual media changer and virtual tape drives.
- Tape gateway is a cost-effective and durable archive backup data if it used in GLACIER or DEEP_ARCHIVE. A tape gateway provides a virtual tape infrastructure that scales seamlessly with your business needs and eliminates the operational burden of provisioning, scaling, and maintaining a physical tape infrastructure.
- Existing backup applications and workflows can be while writing to a nearly limitless collection of virtual tapes.
- It can be used to run AWS Storage Gateway either on-premises as a VM appliance, as a hardware appliance, or in AWS as an Amazon EC2 instance.
- It can be used as a gateways hosted on EC2 instances for disaster recovery, data mirroring, and providing storage for applications hosted on Amazon EC2.
A file gateway supports a file interface into Amazon Simple Storage Service (Amazon S3) and combines a service and a virtual software appliance. The File Gateway presents a file interface that enables customers to store files as objects in Amazon S3 using the industry-standard NFS and SMB file protocols, and access those files via NFS and SMB from your datacenter or Amazon EC2, or access those files as objects with the S3 API.
- POSIX-style metadata, including ownership, permissions, and timestamps are durably stored in Amazon S3 in the user-metadata of the object associated with the file.
- Once objects are transferred to S3, they can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication and apply directly to objects stored in the bucket.
- The gateway provides access to objects in S3 as files or file share mount points, which enables customers to perform;
- Store and retrieve files directly using the NFS version 3 or 4.1 protocol.
- Store and retrieve files directly using the SMB file system version, 2 and 3 protocol.
- Access data directly in Amazon S3 from any AWS Cloud application or service.
- Manage Amazon S3 data using lifecycle policies, cross-region replication, and versioning.
- A file gateway simplifies file storage in Amazon S3, integrates to existing applications and provides a cost-effective alternative to on-premises storage.
- It provides low-latency access to data through transparent local caching. A file gateway manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.
EC2 Instance Storage
Amazon EC2 instance store volumes also called ephemeral drives, provide temporary block-level storage for many EC2 instance types.This storage consists of a preconfigured and pre-attached block of disk storage on the same physical server that hosts the EC2 instance for which the block provides storage. The amount of the disk storage provided varies by EC2 instance type. In the EC2 instance families that provide instance storage, larger instances tend to provide both more and larger instance store volumes.
- Instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers.
- Some instance types, such as the micro instances (t1, t2) and the Compute-optimized c4 instances, use EBS storage only with no instance storage provided.
- Instances using Amazon EBS for the root device don’t expose the instance store volumes by default.
- AWS clients can choose to expose the instance store volumes at instance launch time by specifying a block device mapping.
Amazon EC2 local instance store volumes are not intended to be used as durable disk storage. Unlike Amazon EBS volume data, data on instance store volumes persists only during the life of the associated EC2 instance.
- This functionality means that data on instance store volumes is persistent across orderly instance reboots, but if the EC2 instance is stopped and restarted, terminates, or fails, all data on the instance store volumes will be lost.
- Don’t use local instance store volumes for any data that must persist over time, such as permanent file or database storage, without providing data persistence by replicating data or periodically copying data to durable storage such as Amazon EBS or Amazon S3.
The number and storage capacity of Amazon EC2 local instance store volumes are fixed and defined by the instance type. Although it can’t increased or decreased the number of instance store volumes on a single EC2 instance, the storage is still scalable and elastic;
- That means customers can scale the total amount of instance store up or down by increasing or decreasing the number of running EC2 instances.
- To achieve full storage elasticity, include one of the other suitable storage options, such as Amazon S3, Amazon EFS, or Amazon EBS, in your Amazon EC2 storage strategy.
Instance store volumes can only be mounted and accessed by the EC2 instances they belong to. When an instance stopped or terminated , the applications and data in its instance store are erased, so no other instance can have access to the instance store in the future.
EC2 local instance store volumes are ideal for temporary storage of information that is continually changing, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers.
- This storage can only be used from a single EC2 instance during that instance’s lifetime. Unlike EBS volumes, instance store volumes cannot be detached or attached to another instance.
- For high I/O and high storage, use EC2 instance storage targeted to these use cases. High I/O instances (the i2 family) provide instance store volumes backed by SSD and are ideally suited for many high-performance database workloads.
- Applications using instance storage for persistent data generally provide data durability through replication, or by periodically copying data to durable storage.
- You should not use local instance store volumes for any data that must persist over time, such as permanent file or database storage, without providing data persistence by replicating data or periodically copying data to durable storage such as Amazon EBS or Amazon S3.
Since EC2 instance virtual machine and the local instance store volumes are located on the same physical server, interaction with this storage is very fast, particularly for sequential access. To increase aggregate IOPS, or to improve sequential disk throughput, multiple instance store volumes can be grouped together using RAID 0 (disk striping) software.
- The SSD instance store volumes in EC2 high I/O instances provide from tens of thousands to hundreds of thousands of low-latency, random 4 KB random IOPS.
- The instance store volumes in EC2 high-storage instances provide very high storage density and high sequential read and write performance.