Amazon EC2 Auto Scaling is a service that automatically adjusts the number of Amazon EC2 instances in a group based on the specified conditions.
Here are the key points:
Automated Scaling: Amazon EC2 Auto Scaling automates the process of launching (scaling out) and terminating (scaling in) Amazon EC2 instances based on the traffic demand for your application.
Elasticity and Scalability: It helps ensure that you have the correct number of EC2 instances available to handle the application load, providing elasticity and scalability to your infrastructure. You cannot have instances in an ASG across multiple Regions.
Auto Scaling Groups (ASGs): You create collections of EC2 instances, called Auto Scaling groups. ASGs allow you to specify the minimum and maximum number of instances, ensuring the group never goes below or above a certain size.
Desired Capacity: You can also specify a desired capacity for your ASG, and Amazon EC2 Auto Scaling will work to maintain this desired number of instances.
Scaling Policies: Scaling policies are used to control when Auto Scaling launches or terminates instances. These policies define the conditions under which scaling actions should occur, such as based on CPU utilization or other metrics.
Scaling Plans: Scaling plans define the triggers and conditions for when instances should be provisioned or de-provisioned. They allow you to define a more comprehensive strategy for scaling your infrastructure.
Launch Configuration: A launch configuration is a template used to create new EC2 instances. It includes parameters such as the instance type, AMI, key pair, security groups, and other configuration settings.
Health check types
Amazon EC2 Auto Scaling can determine the health status of an InService instance by using one or more of the following health checks:
Types
Health check type
What it checks
Health check type
What it checks
1
Amazon EC2 status checks and scheduled events
Checks that the instance is running.
Checks for underlying hardware or software issues that might impair the instance.
This is the default health check type for an Auto Scaling group.
2
Elastic Load Balancing health checks
Checks whether the load balancer reports the instance as healthy, confirming whether the instance is available to handle requests.
To run this health check type, you must turn it on for your Auto Scaling group.
3
VPC Lattice health checks
Checks whether VPC Lattice reports the instance as healthy, confirming whether the instance is available to handle requests.
To run this health check type, you must turn it on for your Auto Scaling group.
4
Custom health checks
Checks for any other problems that might indicate instance health issues, according to your custom health checks.
Health check grace period for an Auto Scaling group
When an Amazon EC2 Auto Scaling health check determines that an InService instance is unhealthy, it replaces it with a new instance. The health check grace period specifies the minimum amount of time (in seconds) to keep a new instance in service before terminating it if it's found to be unhealthy.
An example use case might be a requirement for Amazon EC2 Auto Scaling to avoid taking action if the Elastic Load Balancing health checks fail and the cause is that the instance is still initialising. Elastic Load Balancing health checks run in parallel, starting when the instance is registered with the load balancer. The grace period prevents Amazon EC2 Auto Scaling from marking your newly launched instances Unhealthy and terminating them unnecessarily if they don't immediately pass these health checks after they enter the InService state.
In the console, by default, the health check grace period is 300 seconds when you create an Auto Scaling group. Its default value is 0 seconds when you create an Auto Scaling group using the AWS CLI or an SDK. A value of 0 turns off the health check grace period.
Setting this value too high reduces the effectiveness of the Amazon EC2 Auto Scaling health checks. If you use lifecycle hooks for instance launch, you can set the health check grace period to 0. With lifecycle hooks, Amazon EC2 Auto Scaling provides a way to make sure that instances are always initialized before they enter the InService state. For more information, see
The EC2 instances in an Auto Scaling group have a path, or lifecycle, that differs from that of other EC2 instances. The lifecycle starts when the Auto Scaling group launches an instance and puts it into service. The lifecycle ends when you terminate the instance, or the Auto Scaling group takes the instance out of service and terminates it.
The following illustration shows the transitions between instance states in the Amazon EC2 Auto Scaling lifecycle.
: You can manually adjust the number of EC2 instances in your Auto Scaling group at any time. This process of changing the instance count manually is referred to as manual scaling. Manual scaling is an alternative to auto scaling, especially if you want to make one-time capacity changes.
: With scheduled scaling, you can set up automatic scaling for your application based on predictable load changes. You create scheduled actions that increase or decrease your group's desired capacity at specific times.
: Dynamic scaling scales the capacity of your Auto Scaling group as traffic changes occur.
Amazon EC2 Auto Scaling supports the following types of dynamic scaling policies:
Target tracking scaling: Increase and decrease the current capacity of the group based on a Amazon CloudWatch metric and a target value. It works similar to the way that your thermostat maintains the temperature of your home—you select a temperature and the thermostat does the rest.
Step scaling: Increase and decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
Simple scaling: Increase and decrease the current capacity of the group based on a single scaling adjustment, with a cooldown period between each scaling activity.
Step scaling and simple scaling policies scale the capacity of your Auto Scaling group in predefined increments based on CloudWatch alarms. You can define separate scaling policies to handle scaling out (increasing capacity) and scaling in (decreasing capacity) when an alarm threshold is breached.
: Predictive scaling works by analyzing historical load data to detect daily or weekly patterns in traffic flows. It uses this information to forecast future capacity needs so Amazon EC2 Auto Scaling can proactively increase the capacity of your Auto Scaling group to match the anticipated load.
Predictive scaling is well suited for situations where you have:
Cyclical traffic, such as high use of resources during regular business hours and low use of resources during evenings and weekends
Recurring on-and-off workload patterns, such as batch processing, testing, or periodic data analysis
Applications that take a long time to initialize, causing a noticeable latency impact on application performance during scale-out events