AWS Auto Scaling provides a simple, powerful user interface that lets AWS clients build scaling plans for resources including Amazon EC2 instances and Spot Fleets, Amazon ECS tasks, Amazon DynamoDB tables and indexes, and Amazon Aurora Replicas. The AWS Auto Scaling console provides a single user interface to use the automatic scaling features of multiple AWS services. Using scaling plan of AWS Auto scaling, customers can configure and manage scaling of their resources. The scaling plan uses dynamic and predictive Scaling to automatically scale the application’s resources. This allows customers to add the required computing power to handle the load on the application and then remove it when it’s no longer required. There are two different ways Auto scaling; dynamic scaling and predictive scaling.
- Dynamic scaling creates target tracking scaling policies for the scalable resources in your application. This lets your scaling plan add and remove capacity for each resource as required to maintain resource utilization at the specified target value.
- Predictive Scaling looks at historic traffic patterns and forecasts them into the future to schedule changes in the number of EC2 instances at the appropriate times going forward.
- Predictive Scaling uses machine learning models to forecast daily and weekly patterns.
Auto Scaling Features
AWS Auto Scaling continually calculates the appropriate scaling adjustments and immediately adds and removes capacity as needed to keep the metrics on target. AWS target tracking scaling policies are self-optimizing, and learn the customer actual load patterns to minimize fluctuations in resource capacity.
- AWS Auto Scaling allows customers to build scaling plans that automate how groups of different resources respond to changes in demand.
- AWS Auto Scaling automatically creates all of the scaling policies and sets targets for customers based on their preference.
- AWS Auto Scaling monitors customers applications and automatically adds or removes capacity from their resource groups in real-time as demands change.
Predictive Scaling predicts future traffic, including regularly-occurring spikes, and provisions the right number of EC2 instances in advance of predicted changes. Predictive Scaling’s machine learning algorithms detect changes in daily and weekly patterns, automatically adjusting their forecasts. Auto Scaling enhanced with Predictive Scaling delivers faster, simpler, and more accurate capacity provisioning resulting in lower cost and more responsive applications.
- Load forecasting: AWS Auto Scaling analyzes up to 14 days of history for a specified load metric and forecasts the future demand for the next two days.
- Scheduled scaling actions: AWS Auto Scaling schedules the scaling actions that proactively add and remove resource capacity to reflect the load forecast. At the scheduled time, AWS Auto Scaling updates the resource’s minimum capacity with the value specified by the scheduled scaling action.
- Maximum capacity behavior: Each resource has a minimum and a maximum capacity limit between which the value specified by the scheduled scaling action is expected to lie.
AWS Auto Scaling automatically creates target tracking scaling policies for all of the resources in the scaling plan, using the customer selected scaling strategy to set the target values for each metric.
- AWS Auto Scaling also creates and manages the Amazon CloudWatch alarms that trigger scaling adjustments for each of the resources.
- AWS Auto Scaling continually monitors customers applications to make sure that they are operating at the desired performance levels. When demand spikes, AWS Auto Scaling automatically increases the capacity of constrained resources.
Using AWS Auto Scaling, AWS customers can select one of three predefined optimization strategies designed to optimize performance, optimize costs, or balance the two:
- The application has a Cyclical traffic such as high use of resources during regular business hours and low use of resources overnight.
- The application is experiencing On and off workload patterns, such as batch processing, testing, or periodic analysis.
- The application has Variable traffic patterns, such as marketing campaigns with periods of spiky growth.
AWS Auto Scaling scans customers environments and automatically discovers the scalable cloud resources underlying their application. Using AWS Auto Scaling, customers can set target utilization levels for multiple resources in a single, intuitive interface.
- Customers can quickly see the average utilization of all of their scalable resources without having to navigate to other consoles.
- For applications such as Amazon EC2 and Amazon DynamoDB, AWS Auto Scaling manages resource provisioning for all of the EC2 Auto Scaling groups and database tables in the customer application.
AWS Auto Scaling Plan
Predictive Scaling is an AWS Auto Scaling that looks at historic traffic patterns and forecasts them into the future to schedule changes in the number of EC2 instances at the appropriate times going forward. Predictive Scaling uses machine learning models to forecast daily and weekly patterns. Predictive Scaling works with in conjunction with target tracking to make the customer EC2 capacity changes more responsive to customers incoming application traffic.
- AWS Auto Scaling enhanced with Predictive Scaling delivers faster, simpler, and more accurate capacity provisioning resulting in lower cost and more responsive applications.
- By predicting traffic changes, Predictive Scaling provisions EC2 instances in advance of changing traffic, making AWS Auto Scaling faster and more accurate.
- While Predictive Scaling sets up the minimum capacity for customers application based on forecasted traffic, target tracking changes the actual capacity based on the actual traffic at the moment.
- Target tracking works to track the desired capacity utilization levels over varying traffic conditions and addresses unpredicted traffic spikes and other fluctuations.
Predictive Scaling and target tracking are configured together by a user to generate a scaling plan. A AWS auto scaling plan is a collection of scaling instructions for multiple AWS resources. Customers can configure a scaling plan by first selecting all the EC2 resources underlying their application in AWS Auto Scaling.
- The resource utilization metric and the incoming traffic metric are the key parameters for the scaling plan.
- The incoming traffic metric is used by Predictive Scaling to generate traffic forecasts. Based on these forecasts, Predictive Scaling then schedules future scaling actions to configure minimum capacity.
A launch configuration is an instance configuration template that an Auto Scaling group uses to launch EC2 instances. When AWS customers create a launch configuration, they also specify information for the instances. Include the ID of the Amazon Machine Image (AMI), the instance type, a key pair, one or more security groups, and a block device mapping. when creating an AWS Auto Scaling group, the launch configuration, the launch template, or an EC2 instance need to be specified .
- During the creation of an Auto Scaling group using an EC2 instance, Amazon EC2 Auto Scaling automatically creates a launch configuration for them and associates it with the Auto Scaling group.
A launch template is similar to a launch configuration, in addition to that it specifies instance configuration information. Which includes ID of the Amazon Machine Image (AMI), instance type, key pair, security groups, and the other parameters.
- Defining a launch template instead of a launch configuration allows you to have multiple versions of a template.
- With launch templates, customers are able to provision capacity across multiple instance types using both On-Demand Instances and Spot Instances to achieve the desired scale, performance, and cost.
An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables customers to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies. Both maintaining the number of instances in an Auto Scaling group and automatic scaling are the core functionality of the Amazon EC2 Auto Scaling service.
- The Auto Scaling group continues to maintain a fixed number of instances even if an instance becomes unhealthy. If an instance becomes unhealthy, the group terminates the unhealthy instance and launches another instance to replace it.
An AWS Auto Scaling group can launch On-Demand Instances, Spot Instances, or both. Spot Instances provide customers with access to unused Amazon EC2 capacity at steep discounts relative to On-Demand prices. For more information, see Amazon EC2 Spot Instances. There are key differences between Spot Instances and On-Demand Instances:
- The price for Spot Instances varies based
- On demand Amazon EC2 can terminate an individual Spot Instance as the availability of, or price for, Spot Instances changes.
Auto Scaling Resources
AWS customers have multiple options for scaling resources. To configure automatic scaling for multiple resources across multiple services, use AWS Auto Scaling to create a scaling plan for the resources underlying their application. AWS Auto Scaling is also used to create predictive scaling for EC2 resources.
- Amazon EC2 Auto Scaling helps AWS clients ensure that they have the correct number of Amazon EC2 instances available to handle the load for their application.
- In additon, Application Auto Scaling can scale Amazon ECS services, Amazon EC2 Spot fleets, Amazon EMR clusters, Amazon AppStream 2.0 fleets, provisioned read and write capacity for Amazon DynamoDB tables and global secondary indexes, Amazon Aurora Replicas, and Amazon SageMaker endpoint variants.
EC2 SPOT FLEET REQUESTS
Amazon EC2 Spot Fleet requests: Launch or terminate instances from a Spot Fleet request, or automatically replace instances that get interrupted for price or capacity reasons. Automatic scaling is the ability to increase or decrease the target capacity of the customer Spot Fleet automatically based on demand. A Spot Fleet can either launch instances (scale out) or terminate instances (scale in), within the range that was specified, in response to one or more scaling policies. Spot Fleet supports the following types of automatic scaling:
- Target tracking scaling:– Increase or decrease
- the current capacity of the fleet based on a target value for a specific metric. This is similar to the way that your thermostat maintains the temperature of your home—you select temperature and the thermostat does the rest.
- Step scaling:– Increase or decrease the current capacity of the fleet based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
- Scheduled scaling:– Increase or decrease the current capacity of the fleet based on the date and time
The scaling policies that were created for Spot Fleet support a cool down period. Which is the number of seconds after a scaling activity completes where previous trigger-related scaling activities can influence future scaling events.
- Use scale based on instance metrics with a 1-minute frequency to ensure a faster response to utilization changes. Scaling on metrics with a 5-minute frequency can result in slower response time and scaling on stale metric data.
DYNAMODB AUTO SCALING
Amazon DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on customers behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput.
Enabling a DynamoDB table or a global secondary index increases or decreases its provisioned read and write capacity to handle increases in traffic without throttling. With Application Auto Scaling, customers can create a scaling policy for a table or a global secondary index.
- The scaling policy contains a target utilization, the percentage of consumed provisioned throughput at a point in time. Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near the customer target utilization.
- DynamoDB auto scaling also supports global secondary indexes. Every global secondary index has its own provisioned throughput capacity, separate from that of its base table.
- DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes.
When AWS clients create a scaling policy, Application Auto Scaling creates a pair of Amazon CloudWatch alarms on their behalf. Each pair represents clients upper and lower boundaries for provisioned throughput settings. To enable DynamoDB auto scaling for the ProductCatalog table, clients need to create a scaling policy. This policy specifies includes:
- The table or global secondary index that the clients want to manage.
- Which capacity type to manage (read capacity or write capacity).
- The upper and lower boundaries for the provisioned throughput settings.
- The customer target utilization
EC2 AUTO SCALING
Amazon EC2 Auto Scaling groups enable customers to Launch or terminate EC2 instances in an Auto Scaling group.
- Amazon EC2 Auto Scaling scales out the client group (add more instances) to deal with high demand at peak times, and scale in the group (run fewer instances) to reduce costs during periods of low utilization.
- A scaling policy instructs Amazon EC2 Auto Scaling to track a specific CloudWatch metric, and it defines what action to take when the associated CloudWatch alarm is in ALARM.
- The metrics that are used to trigger an alarm are an aggregation of metrics coming from all of the instances in the Auto Scaling group.
Amazon EC2 Auto Scaling supports the following types of scaling policies:
- Target tracking scaling—Increase or decrease the current capacity of the group based on a target value for a specific metric.
- Step scaling—Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
- Simple scaling—Increase or decrease the current capacity of the group based on a single scaling adjustment.
ECS AUTO SCALING
Automatic scaling has the ability to increase or decrease the desired count of tasks in the customer Amazon ECS service automatically. Amazon ECS leverages the Application Auto Scaling service to provide this functionality. Amazon ECS publishes CloudWatch metrics with customers service’s average CPU and memory usage, so that they can use this and other CloudWatch metrics to scale out the service to deal with high demand at peak times, and to scale in the service to reduce costs during periods of low utilization. Amazon ECS Service Auto Scaling supports
- Target Tracking
- Scaling Policies.
- Scheduled Scaling
The Application Auto Scaling service needs permission to describe the Amazon ECS services, CloudWatch alarms, and to modify customers service’s desired count on their behalf. Service Auto Scaling is a combination of the Amazon ECS, CloudWatch, and Application Auto Scaling APIs.
- Services are created and updated with Amazon ECS,
- Alarms are created with CloudWatch, and
- Scaling policies are created with Application Auto Scaling.
AURORA AUTO SCALING
Aurora Auto Scaling dynamically adjusts the number of Aurora Replicas provisioned for an Aurora DB cluster using single-master replication. Aurora Auto Scaling is available for both Aurora MySQL and Aurora PostgreSQL. Aurora Auto Scaling enables the customer Aurora DB cluster to handle sudden increases in connectivity or workload.
- When the connectivity or workload decreases, Aurora Auto Scaling removes unnecessary Aurora Replicas.
- The scaling policy defines the minimum and maximum number of Aurora Replicas that Aurora Auto Scaling can manage.
- Using this policy customers can define and apply a scaling policy to an Aurora DB cluster.
Aurora Auto Scaling uses a scaling policy to adjust the number of Aurora Replicas in an Aurora DB cluster. Aurora Auto Scaling has the following components:
- A service-linked role
- Target metric:– A target metric is a predefined or custom metric and a target value for the metric is specified in a target-tracking scaling policy configuration.
- Minimum and maximum capacity:- Customers are able to specify the maximum number of Aurora Replicas (0 – 15) to be managed by Application Auto Scaling.
- A cooldown period:- A cooldown period blocks subsequent scale-in or scale-out requests until the period expires. These blocks slow the deletions of Aurora Replicas in the Aurora DB cluster.