Amazon Elastic Block Store (Amazon EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale. Amazon EBS provides persistent block-level storage volumes for use with Amazon EC2 instances. Persistent storage means the storage is independent outside the lifespan of an EC2 instance. EBS volumes provide durable block-level storage for use with Amazon EC2 instances. Amazon EBS provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. AWS customers can mount these volumes as devices on their instances.
- Amazon EBS volumes are available in a variety of types that differ in performance characteristics and price.
- Although multiple Amazon EBS volumes can be attached to a single Amazon EC2 instance, a volume can only be attached to a single instance at a time.
- Amazon EBS is designed for mission-critical systems, Amazon EBS volumes are replicated within an Availability Zone (AZ) and can easily scale to petabytes of data. AWS clients can use EBS Snapshots with automated lifecycle policies to back up their volumes in Amazon S3,
Amazon EBS Features
In Amazon EBS, data changes relatively frequently and needs to persist beyond the life of EC2 instance. Amazon EBS is well-suited for use as the primary storage for a database or file system, or for any application or instance (operating system) that requires direct access to raw block-level storage. Amazon EBS provides a range of options that allow customers to optimize storage performance and cost for their workload. These options are divided into two major categories:
- solid-state drive (SSD)-backed storage for transactional workloads such as databases and boot volumes (performance depends primarily on IOPS) and
- hard disk drive (HDD)-backed storage for throughput-intensive workloads such as big data, data warehouse, and log processing (performance depends primarily on MB/s).
Amazon EBS provides the ability to save point-in-time snapshots of its clients volumes to Amazon S3. Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved.
- This minimizes the time required to create the snapshot and saves on storage costs by not duplicating data. Since Amazon EBS Snapshots are stored incrementally, only the blocks that have changed after clients’ last snapshot are saved, and they are billed only for the changed blocks.
- Customers can back up the data on their Amazon EBS volumes to Amazon S3 by taking point-in-time snapshots. When they delete a snapshot, only the data unique to that snapshot is removed.
- Each snapshot contains all of the information that is needed to restore your data (from the moment when the snapshot was taken) to a new Amazon EBS volume.
- EBS snapshots are incremental, point-in-time backups, containing only the data blocks changed since the last snapshot.
Amazon EBS is built to be secure for data compliance. Newly created EBS volumes can be encrypted by default with a single setting in your account. Amazon EBS volumes support encryption of data at-rest, data in-transit, and all volume backups. It offers seamless encryption of both Amazon EBS boot volumes and data volumes as well as snapshots, eliminating the need to build and manage a secure key management infrastructure.
- These encryption keys are Amazon-managed or keys that you create and manage using the AWS Key Management Service (AWS KMS).
- EBS encryption is supported by all volume types, includes built-in key management infrastructure, and has zero impact on performance.
- Datain-motion security occurs on the servers that host EC2 instances, providing encryption of data as it moves between EC2 instances and EBS volumes.
- Access control plus encryption offers a strong defense-in-depth security strategy for your data.
EBS volumes are created in a specific Availability Zone, and can then be attached to any instances in that same Availability Zone. Amazon EBS volumes are designed to be highly available and reliable. EBS volume data is replicated across multiple servers in a single Availability Zone to prevent the loss of data from the failure of any single component. Taking snapshots of your EBS volumes increases the durability of the data stored on your EBS volumes.
- To make a volume available outside of the Availability Zone, AWS customers can create a snapshot and restore that snapshot to a new volume anywhere in that Region.
- Customers can copy snapshots to other Regions and then restore them to new volumes there, making it easier to leverage multiple AWS Regions for geographical expansion, data center migration, and disaster recovery.
Customers can create their EBS volumes as encrypted volumes, in order to meet a wide range of data-at-rest encryption requirements for regulated/audited data and applications. Amazon EBS encryption offers a straight-forward encryption solution for customers EBS resources that doesn’t require them to build, maintain, and secure their own key management infrastructure. It uses AWS Key Management Service (AWS KMS) customer master keys (CMK) when creating encrypted volumes and snapshots.
- When customers create an encrypted EBS volume and attach it to a supported instance type, data stored at rest on the volume, disk I/O, and snapshots created from the volume are all encrypted.
- Encryption operations occur on the servers that host EC2 instances, ensuring the security of both data-at-rest and data-in-transit between an instance and its attached EBS storage.
- EBS encrypts customers volume with a data key using the industry-standard AES-256 algorithm.
Amazon EBS enables its clients to increase storage without any disruption to their critical workloads. Build applications that require as little as a single GB of storage, or scale up to petabytes of data — all in just a few clicks.
- Performance metrics, such as bandwidth, throughput, latency, and average queue length, are available through the AWS Management Console.
- These metrics, provided by Amazon CloudWatch, allow the clients to monitor the performance of their volumes to make sure that they are providing enough performance for their applications without extra cost.
Amazon offers a REST management API for Amazon EBS, as well as support for Amazon EBS operations within both the AWS SDKs and the AWS CLI.
- The API actions and EBS operations are used to create, delete, describe, attach, and detach EBS volumes for your EC2 instances; to create, delete, and describe snapshots from Amazon EBS to Amazon S3; and to copy snapshots from one region to another.
- The AWS Management Console gives customers all the capabilities of the API in a browser interface.
EBS Volume Types
As described previously, Amazon EBS provides a range of volume types that are divided into two major categories: SSD-backed storage volumes and HDD-backed storage volumes. SSD-backed storage volumes offer great price/performance characteristics for random small block workloads, such as transactional applications, whereas HDD-backed storage volumes offer the best price/performance characteristics for large block sequential workloads. You can attach and stripe data across multiple volumes of any type to increase the I/O performance available to your Amazon EC2 applications. The following table presents the storage characteristics of the current generation volume types.
SSD-backed volume is ideal for transactional workloads, such as databases and boot volumes (performance depends primarily on IOPS).
- SSD-backed volumes include the highest performance io1 for latency-sensitive transactional workloads and gp2, which balances price and performance for a wide variety of transactional data. .
- The performance of a block storage device is commonly measured and quoted in a unit called IOPS, short for input/output operations per second.
- SSD-backed volumes include General Purpose SSD (gp2) that balance price and performance for a wide variety of transactional data and the highest performance Provisioned IOPS SSD (io1) for latency-sensitive transactional workloads.
HDD-backed storage is good for throughput intensive workloads, such as MapReduce and log processing (performance depends primarily on MB/s).
- It also optimizes large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS.
- HDD-backed volumes include Throughput Optimized HDD (st1) for frequently accessed, throughput intensive workloads and the lowest cost Cold HDD (sc1) for less frequently accessed data.
Throughput-Optimized HDD volumes are low-cost HDD volumes designed for frequent access, throughput-intensive workloads such as big data, data warehouses, and log processing.
- Throughput Optimized HDD is designed for applications that require larger storage and bigger throughput, such as big data or data warehousing, where IOPS is not that relevant. St1 volumes, much like SSD gp2, use a burst model, where the initial baseline throughput is tied to the volume size, and credits are accumulated over time.
- Volumes can be up to 16 TB with a maximum IOPS of 500 and maximum throughput of 500 MB/s. These volumes are significantly less expensive than general purpose SSD volumes.
- ST1 is backed by hard disk drives (HDDs) and is ideal for frequently accessed, throughput intensive workloads with large datasets and large I/O sizes, such as MapReduce, Kafka, log processing, data warehouse, and ETL workloads.
- Low cost HDD volume designed for frequently accessed, throughput intensive workloads
- Big data, data warehouses, log processing.
- HDD st1:– Which can be used for frequently accessed, throughput-intensive workloads.
- It is good when the workload customers are going to run defines the performance metrics in terms of throughput instead of IOPS. The hard drives are based on magnetic drives
- HDD (sc1):– Which is for less frequently accessed data that has the lowest cost
Provisioned IOPS SSD (io1)
Provisioned IOPS SSD volumes are designed to meet the needs of I/O-intensive workloads, particularly database workloads that are sensitive to storage performance and consistency in random access I/O throughput.
- For customers, who have an IO-intense workload such as databases, predictable and consistent IO performance.
- Customers can use technologies such as RAID on top of multiple EBS volumes to stripe and mirror the data across multiple volumes.
- I/O-intensive NoSQL and relational databases
- It provide predictable, high performance and are well suited for:
- Critical business applications that require sustained IOPS performance
- Large database workloads.
- Consistently performs at provisioned level, up to 20,000 IOPS maximum
- I3:– High I/O instances. This family includes the High Storage Instances that provide Non-Volatile Memory Express (NVMe) SSD backed instance storage optimized for low latency, very high random I/O performance, high sequential read throughput and provide high IOPS at a low cost.
- D2:– Dense-storage instances. D2 instances feature up to 48 TB of HDD-based local storage, deliver high disk throughput, and offer the lowest price per disk throughput performance on Amazon EC2.
- For workloads requiring greater network performance, many instance types support enhanced networking.
- Enhanced networking reduces the impact of virtualization on network performance by enabling a capability called Single Root I/O Virtualization (SR-IOV). This results in:
- More Packets Per Second (PPS)
- Lower latency, and
- Less skittishness
General-purpose SSD volumes offer cost-effective storage that is ideal for a broad range of workloads. They deliver strong performance at a moderate price point that is suitable for a wide range of workloads.
- General Purpose SSD volume balances price performance for a wide variety of transactional workloads
- General-purpose SSD volumes are billed based on the amount of data space provisioned, regardless of how much data you actually store on the volume.
- Boot volumes, low-latency interactive apps, dev & test.
- General-Purpose SSD delivers single-digit millisecond latencies, which is actually a good use case for the majority of workloads.
- gp2s can deliver between 100 to 10,000 IOPS.
Some use cases that are a good fit for gp2 are
- System boot volumes,
- Applications requiring
low latency, virtual desktops, development and test environments
- It’s suited for a wide range of workloads where the very highest disk performance is not critical, such as:
- System boot volumes
- Small- to medium-sized databases
- Development and test environments
- T2 instances are Burstable Performance Instances that provide a baseline level of CPU performance with the ability to burst above the baseline.
- M4 instances are the latest generation of General Purpose Instances. This family provides a balance of compute, memory, and network resources, and it is a good choice for many applications.
Cold HDD volumes are designed for less frequently accessed workloads, such as colder data requiring fewer scans per day. SC1 is backed by hard disk drives (HDDs) and provides the lowest cost per GB of all EBS volume types. It is ideal for less frequently accessed workloads with large, cold datasets. Similar to st1, sc1 provides a burst model.
- Volumes can be up to 16 TB with a maximum IOPS of 250 and maximum throughput of 250 MB/s. These volumes are significantly less expensive than Throughput-Optimized HDD volumes.
- COLD HDD defines performance in terms of throughput instead of IOPS. The use case for COLD HDD is noncritical, cold data workloads and is designed to support infrequently accessed data. Similar to st1, sc1 uses a burst-bucket.
- Lowest cost HDD volume designed for less frequently accessed workloads.
- Colder data requiring fewer scans per day
Magnetic volumes have the lowest performance characteristics of all Amazon EBS volume types. Magnetic volumes are billed based on the amount of data space provisioned, regardless of how much data you actually store on the volume. A magnetic EBS volume can range in size from 1 GB to 1 TB and will average 100 IOPS, but has the ability to burst to hundreds of IOPS.
They are best suited for:
- Workloads where data is accessed infrequently Sequential reads
- Situations where low-cost storage is a requirement
- Magnetic volumes are billed based on the amount of data space provisioned, regardless on how much data customers actually store on the volume.
- Cold workloads where data is infrequently accessed
- Scenarios where the lowest storage cost is important.
Several factors, including I/O characteristics and the configuration of your instances and volumes, can affect the performance of Amazon EBS. Customers who follow the guidance on our Amazon EBS and Amazon EC2 product detail pages typically achieve good performance out of the box. However, there are some cases where customers may need to do some tuning in order to achieve peak performance on the platform. This topic discusses general best practices as well as performance tuning that is specific to certain use cases.
- Benchmarking AWS EBS Workloads with Fio
- Oracle Orion
Benchmarking AWS EBS Workloads with Fio
One of the main components of AWS EBS performance is I/O. Applications running on the AWS EC2 instance submits read and write operations to an EBS volume. Each operation then converted to a system call to the kernel,
- The kernel knows that the underlying file system is a virtualized block storage, and through internal mechanisms the kernel will redirect the read/write operation to the I/O domain where the I/O operation will pass through a grant mapping process to finally, once the I/O is mapped, be performed in the EBS volume.
- When customers create a new EBS volume they need to provide the size and the type of the volume.
- General Purpose SSD (gp2),
- Provisioned IOPS SSD (io1),
- Throughput Optimized HDD (st1), Cold HDD (sc1), and Magnetic.
Tools customers can use to benchmark the performance of EBS volumes
Oracle Orion:– Is used to gocalibratte the I/O performance of storage systems to be used with Oracle databases. Oracle Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database.
- Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle.
- Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.
- Orion can run tests using different I/O loads to measure performance metrics such as MBPS, IOPS, and I/O latency.
- Load is expressed in terms of the number of outstanding asynchronous I/Os.
- For random workloads, using either large or small sized I/Os, the load level is the number of outstanding I/Os.
- For large sequential workloads, the load level is a combination of the number of sequential streams and the number of outstanding I/Os per stream.
- Testing a given workload at a range of load levels can help you understand how performance is affected by load.
RAID 1:– can mirror two volumes together. RAID 1. Mirrored (take one disk, mirror a copy to another disk), Redundancy
- A RAID 1 array offers a “mirror” of your data for extra redundancy. Before you perform this procedure, you need to decide how large your RAID array should be and how many IOPS you want to provision.
- The resulting size and bandwidth of a RAID 1 array is equal to the size and bandwidth of the volumes in the array.
- Its ideal to use when fault tolerance is more important than I/O performance; for example, as in a critical application.
- It is safer from the standpoint of data durability.
- Does not provide a write performance improvement; requires more Amazon EC2 to Amazon EBS bandwidth than non-RAID configurations because the data is written to multiple volumes simultaneously.
RAID 5;–at least 3 disks, good for reads, bad for writes, AWS does not recommend ever putting RAID 5’s on EBS
RAID 10:– Striped & Mirrored, good redundancy, good performance
Amazon EBS does not recommend RAID 5 and RAID 6 because the parity write operations of these RAID modes consume some of the IOPS available to your volumes
fio:– fio was created to allow benchmarking specific disk IO workloads. It can issue its IO requests using one of many synchronous and asynchronous IO APIs, and can also use various APIs which allow many IO requests to be issued with a single API call.
- It helps tune how large the files fio uses.
- What offsets in those files IO is to happen at, how much delay if any there is between issuing IO requests.
- What if any filesystem sync calls are issued between each IO request.
- A sync call tells the operating system to make sure that any information that is cached in memory has been saved to disk and can thus introduce a significant delay.
- The options to fio allow customers to issue very precisely defined IO patterns and see how long it takes their disk subsystem to complete these tasks.
- fio is packaged in the standard repository for Fedora 8 and is available for openSUSE through the openSUSE Build Service.