Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of load balancing. [1] [2]
Autoscaling offers the following advantages:
Autoscaling differs from having a fixed daily, weekly, or yearly cycle of server use in that it is responsive to actual usage patterns, and thus reduces the potential downside of having too few or too many servers for the traffic load. For instance, if traffic is usually lower at midnight, then a static scaling solution might schedule some servers to sleep at night, but this might result in downtime on a night where people happen to use the Internet more (for instance, due to a viral news event). Autoscaling, on the other hand, can handle unexpected traffic spikes better. [3] [7]
In the list below, we use the terminology used by Amazon Web Services (AWS). [8] However, alternative names are noted and terminology that is specific to the names of Amazon services is not used for the names.
Name (used in AWS, [8] unless otherwise noted) | Meaning | Alternative names (used in Google Cloud Platform, [9] Microsoft Azure, [10] or other platforms) |
---|---|---|
Instance | A single server or machine that is part of the group of machines subject to autoscaling | |
Autoscaling group | The collection of instances subject to autoscaling, along with all the associated policies and state information | Managed instance group (Google Cloud Platform) |
Size | The number of instances currently part of the autoscaling group | |
Desired capacity (or desired size) | The number of instances that the autoscaling group should have at any given point in time. If the size is less than the desired size, the autoscaling group will try to launch (provision and attach) new instances. If the size is more than the desired size, the autoscaling group will try to remove (detach and terminate) instances | |
Minimum size | A number of instances below which the desired capacity is not allowed to fall | |
Maximum size | A number of instances above which the desired capacity is not allowed to rise | |
Metric | A measurement (such as CPU utilization, memory usage, network usage) associated with the autoscaling group, for which a time series of data points is generated regularly. Thresholds for metrics can be used to set autoscaling policies. Metrics can be based on aggregates of metrics for instances of the autoscaling group, or based on load balancers associated with the autoscaling group | |
Scaling policy (or autoscaling policy) | A policy that specifies a change to the autoscaling group's desired capacity (or sometimes, its minimum and maximum size) in response to metrics crossing specific thresholds. Scaling policies can have associated cooldown periods, which prevent additional scaling actions from occurring immediately after a specific scaling action. Changes to desired capacity could be incremental (increase or decrease by a specific number) or could specify a new value of the desired capacity. Policies that increase the desired capacity are called "scaling out" or "scaling up" policies, and policies that decrease the desired capacity are called "scaling in" or "scaling down" policies | |
Health check | A way for the autoscaling group to determine if the instances attached to it are functioning properly. A health check may be based on whether the instance still exists and is reachable, or it could be based on whether the instance is still registered and in service with an associated load balancer | |
Launch configuration | A description of the parameters and scripts used when launching a new instance. This includes the instance type, purchase options (such as spot versus on-demand in the case of AWS), possible availability zones for launch, machine image, and scripts to run on launch | Instance template (Google Cloud Platform) |
Manual scaling | A scaling action executed manually | |
Scheduled scaling | A scaling policy that is executed at a specific time, for instance, time of day or week or month or year. See #Scheduled scaling for more |
Amazon Web Services launched the Amazon Elastic Compute Cloud (EC2) service in August 2006, that allowed developers to programmatically create and terminate instances (machines). [11] [12] At the time of initial launch, AWS did not offer autoscaling, but the ability to programmatically create and terminate instances gave developers the flexibility to write their own code for autoscaling.
Third-party autoscaling software for AWS began appearing around April 2008. These included tools by Scalr [13] and RightScale. RightScale was used by Animoto, which was able to handle Facebook traffic by adopting autoscaling. [14] [15]
On May 18, 2009, Amazon launched its own autoscaling feature along with Elastic Load Balancing, as part of Amazon Elastic Compute Cloud. [16] Autoscaling is now an integral component of Amazon's EC2 offering. [2] [17] [18] Autoscaling on Amazon Web Services is done through a web browser or the command line tool. [19] In May 2016 Autoscaling was also offered in AWS ECS Service. [20]
On-demand video provider Netflix documented their use of autoscaling with Amazon Web Services to meet their highly variable consumer needs. They found that aggressive scaling up and delayed and cautious scaling down served their goals of uptime and responsiveness best. [7]
In an article for TechCrunch , Zev Laderman, the co-founder and CEO of Newvem, a service that helps optimize AWS cloud infrastructure, recommended that startups use autoscaling in order to keep their Amazon Web Services costs low. [4]
Various best practice guides for AWS use suggest using its autoscaling feature even in cases where the load is not variable. That is because autoscaling offers two other advantages: automatic replacement of any instances that become unhealthy for any reason (such as hardware failure, network failure, or application error), and automatic replacement of spot instances that get interrupted for price or capacity reasons, making it more feasible to use spot instances for production purposes. [6] [21] [22] Netflix's internal best practices require every instance to be in an autoscaling group, and its conformity monkey terminates any instance not in an autoscaling group in order to enforce this best practice. [23]
On June 27, 2013, Microsoft announced that it was adding autoscaling support to its Windows Azure cloud computing platform. [24] [25] [26] Documentation for the feature is available on the Microsoft Developer Network. [10] [27]
Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule. [28] These rules are based on CPU and/or memory utilization and determine when to add or remove nodes.
On November 17, 2014, the Google Compute Engine announced a public beta of its autoscaling feature for use in Google Cloud Platform applications. [29] [30] [31] [32] As of March 2015, the autoscaling tool is still in Beta. [9]
In a blog post in August 2014, a Facebook engineer disclosed that the company had started using autoscaling to bring down its energy costs. The blog post reported a 27% decline in energy use for low traffic hours (around midnight) and a 10-15% decline in energy use over the typical 24-hour cycle. [3] [33]
Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization (or, with beta support, on some other, application-provided metrics) [34]
Autoscaling by default uses reactive decision approach for dealing with traffic scaling: scaling only happens in response to real-time changes in metrics. In some cases, particularly when the changes occur very quickly, this reactive approach to scaling is insufficient. Two other kinds of autoscaling decision approaches are described below.
This is an approach to autoscaling where changes are made to the minimum size, maximum size, or desired capacity of the autoscaling group at specific times of day. Scheduled scaling is useful, for instance, if there is a known traffic load increase or decrease at specific times of the day, but the change is too sudden for reactive approach based autoscaling to respond fast enough. AWS autoscaling groups support scheduled scaling. [35]
This approach to autoscaling uses predictive analytics. The idea is to combine recent usage trends with historical usage data as well as other kinds of data to predict usage in the future, and autoscale based on these predictions.
For parts of their infrastructure and specific workloads, Netflix found that Scryer, their predictive analytics engine, gave better results than Amazon's reactive autoscaling approach. In particular, it was better for: [36] [33]
On November 20, 2018, AWS announced that predictive scaling would be available as part of its autoscaling offering. [37]
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis. Clients will often use this in combination with autoscaling. These cloud computing web services provide various services related to networking, compute, storage, middleware, IoT and other processing capacity, as well as software tools via AWS server farms. This frees clients from managing, scaling, and patching hardware and operating systems. One of the foundational services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a virtual cluster of computers, with extremely high availability, which can be interacted with over the internet via REST APIs, a CLI or the AWS console. AWS's virtual computers emulate most of the attributes of a real computer, including hardware central processing units (CPUs) and graphics processing units (GPUs) for processing; local/RAM memory; Hard-disk(HDD)/SSD storage; a choice of operating systems; networking; and pre-loaded application software such as web servers, databases, and customer relationship management (CRM).
Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.
Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment of applications by providing a web service through which a user can boot an Amazon Machine Image (AMI) to configure a virtual machine, which Amazon calls an "instance", containing any software desired. A user can create, launch, and terminate server-instances as needed, paying by the second for active servers – hence the term "elastic". EC2 provides users with control over the geographical location of instances that allows for latency optimization and high levels of redundancy. In November 2010, Amazon switched its own retail website platform to EC2 and AWS.
Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each of which is a data center. Cloud computing relies on sharing of resources to achieve coherence and typically uses a pay-as-you-go model, which can help in reducing capital expenses but may also lead to unexpected operating expenses for users.
Eucalyptus is a paid and open-source computer software for building Amazon Web Services (AWS)-compatible private and hybrid cloud computing environments, originally developed by the company Eucalyptus Systems. Eucalyptus is an acronym for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. Eucalyptus enables pooling compute, storage, and network resources that can be dynamically scaled up or down as application workloads change. Mårten Mickos was the CEO of Eucalyptus. In September 2014, Eucalyptus was acquired by Hewlett-Packard and then maintained by DXC Technology. After DXC stopped developing the product in late 2017, AppScale Systems forked the code and started supporting Eucalyptus customers.
An Amazon Machine Image (AMI) is a special type of virtual appliance that is used to create a virtual machine within the Amazon Elastic Compute Cloud ("EC2"). It serves as the basic unit of deployment for services delivered using EC2.
Amazon Virtual Private Cloud (VPC) is a commercial cloud computing service that provides a virtual private cloud, by provisioning a logically isolated section of Amazon Web Services (AWS) Cloud. Enterprise customers can access the Amazon Elastic Compute Cloud (EC2) over an IPsec based virtual private network. Unlike traditional EC2 instances which are allocated internal and external IP numbers by Amazon, the customer can assign IP numbers of their choosing from one or more subnets.
AppScale is a software company offering cloud infrastructure software and services to enterprises, government agencies, contractors, and third-party service providers. The company commercially supports one software product, AppScale ATS, a managed hybrid cloud infrastructure software platform that emulates the core AWS APIs. In 2019, the company ended commercial support for its open-source serverless computing platform AppScale GTS, but AppScale GTS source code remains freely available to the open-source community.
Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.
Amazon Silk is a web browser developed by Amazon. It was launched in November 2011 for Amazon Fire and Fire Phone, and a Fire TV version was launched in November 2017. The addition of Silk to the Echo Show was announced at an Amazon event in September 2018.
A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.
Amazon Route 53 is a scalable and highly available Domain Name System (DNS) service. Released on 5 December 2010, it is part of Amazon.com's cloud computing platform, Amazon Web Services (AWS). The name is a possible reference to U.S. Routes, and "53" is a reference to the TCP/UDP port 53, where DNS server requests are addressed.
AWS Elastic Beanstalk is an orchestration service offered by Amazon Web Services for deploying applications which orchestrates various AWS services, including EC2, S3, Simple Notification Service (SNS), CloudWatch, autoscaling, and Elastic Load Balancers. Elastic Beanstalk provides an additional layer of abstraction over the bare server and OS; users instead see a pre-built combination of OS and platform, such as "64bit Amazon Linux 2014.03 v1.1.0 running Ruby 2.0 (Puma)" or "64bit Debian jessie v2.0.7 running Python 3.4 ". Deployment requires a number of components to be defined: an 'application' as a logical container for the project, a 'version' which is a deployable build of the application executable, a 'configuration template' that contains configuration information for both the Beanstalk environment and for the product. Finally an 'environment' combines a 'version' with a 'configuration' and deploys them. Executables themselves are uploaded as archive files to S3 beforehand and the 'version' is just a pointer to this.
Cycle Computing is a company that provides software for orchestrating computing and storage resources in cloud environments. The flagship product is CycleCloud, which supports Amazon Web Services, Google Compute Engine, Microsoft Azure, and internal infrastructure. The CycleCloud orchestration suite manages the provisioning of cloud infrastructure, orchestration of workflow execution and job queue management, automated and efficient data placement, full process monitoring and logging, within a secure process flow.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.
An elastic cloud is a cloud computing offering that provides variable service levels based on changing needs.
This is a timeline of Amazon Web Services, which offers a suite of cloud computing services that make up an on-demand computing platform.
Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers. "Serverless" is a misnomer in the sense that servers are still used by cloud service providers to execute code for developers. However, developers of serverless applications are not concerned with capacity planning, configuration, management, maintenance, fault tolerance, or scaling of containers, virtual machines, or physical servers. When an app is not in use, there are no computing resources allocated to the app. Pricing is based on the actual amount of resources consumed by an application. It can be a form of utility computing.
Amazon Elastic File System is a cloud storage service provided by Amazon Web Services (AWS) designed to provide scalable, elastic, concurrent with some restrictions, and encrypted file storage for use with both AWS cloud services and on-premises resources. Amazon EFS is built to be able to grow and shrink automatically as files are added and removed. Amazon EFS supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4) protocol, and control access to files through Portable Operating System Interface (POSIX) permissions.
AWS Graviton is a family of 64-bit ARM-based CPUs designed by the Amazon Web Services (AWS) subsidiary Annapurna Labs. The processor family is distinguished by its lower energy use relative to x86-64, static clock rates, and omission of simultaneous multithreading. It was designed to be tightly integrated with AWS servers and datacenters, and is not sold outside Amazon.