-
Notifications
You must be signed in to change notification settings - Fork 402
Increase Instance Utility
The intent of auto scaling is to improve instance utility and service availability. Rather than provision for peak needs, AWS has the ability to dynamically adjust the auto scaling group (ASG) size based on demand. The goal of this documentation is to help educate and provide the basic configuration for dynamic auto scaling.
Below is an example ASG being statically provisioned to meet peak demand. Throughout the day/week, regardless of demand, the size is set to a fixed value. Note: the Min, Desired, and Max instances all have the same value.
Let's assume load average is a strong signal for instance utility. From the graphs below, assuming at least dual core instance types, the ASG is under utilized. For example, instances with 2 cores, to be fully utilized, should have a load average in the range: 2-4.
In general, to increase utilization, the first step is to identify the limiting system/application resource. This is usually accomplished with a load (squeeze) test. Once the limiting resource is identified, the goal is to size the ASG with enough capacity, plus some additional headroom, to meet demand with respect to the resource. With dynamic auto scaling, throughout the period (minute, hour, day, week), the ASG is constantly adjusting size according to the resource. For example, the graph below is the same ASG using load average to auto scale.
Note, the load average range is between 1 and 3. Also, the duration of time near a load average of 3 is much long. This configuration has a much higher instance utility. In an optimal configuration, there would be low variance throughout the day. The above example was using m1.large (2 cores), scaling down by 10% when load average dropped below 1 and scaling up by 10% when load average exceeded 3. The setup also had the min set to 9 instances, the reason for the large variance for a portion of the day. To fill the gap (variance), with more aggressive scaling down, set the min to a lower number.
To further improve instance utility, consider auto scaling using an application metric. The previous example used system load average, a measure of CPU queue length. The problem with a single measurement for scaling is the assumption there is a single limiting resource. A measurement that takes into consideration multiple limiting resources is probably more accurate. For example, requests-per-second (RPS), can be used to measure the average number of requests per instance without impacting quality of service (QoS). The challenge is to find a strong correlation between RPS and QoS. Below are two graphs showing a strong correlation between RPS and QoS. The first graph is RPS, per instance, followed by aggregate system queuing, a proxy for QoS.
We can use this information to configure a much more specific scaling policy. From the graphs, queuing increases when the application exceeds 25 RPS. To avoid queuing, and improve the QoS, we define an auto scaling policy as follows. If RPS exceeds 20, scale by 10%. If RPS falls below 10 scale down by 10%. Below are the graphs: total RPS, RPS per instance, total queuing, with dynamic auto scaling.
Queuing still occurs, but is mitigated by the auto scaling policy. Note, in the morning, queuing occurs more frequently as the farm scales up to meet initial demand.
A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Jobs