Autoscaling

Last updated

Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of load balancing. [1] [2]

Contents

Advantages

Autoscaling offers the following advantages:

Autoscaling differs from having a fixed daily, weekly, or yearly cycle of server use in that it is responsive to actual usage patterns, and thus reduces the potential downside of having too few or too many servers for the traffic load. For instance, if traffic is usually lower at midnight, then a static scaling solution might schedule some servers to sleep at night, but this might result in downtime on a night where people happen to use the Internet more (for instance, due to a viral news event). Autoscaling, on the other hand, can handle unexpected traffic spikes better. [3] [7]

Terminology

In the list below, we use the terminology used by Amazon Web Services (AWS). [8] However, alternative names are noted and terminology that is specific to the names of Amazon services is not used for the names.

Name (used in AWS, [8] unless otherwise noted)MeaningAlternative names (used in Google Cloud Platform, [9] Microsoft Azure, [10] or other platforms)
InstanceA single server or machine that is part of the group of machines subject to autoscaling
Autoscaling groupThe collection of instances subject to autoscaling, along with all the associated policies and state informationManaged instance group (Google Cloud Platform)
SizeThe number of instances currently part of the autoscaling group
Desired capacity (or desired size)The number of instances that the autoscaling group should have at any given point in time. If the size is less than the desired size, the autoscaling group will try to launch (provision and attach) new instances. If the size is more than the desired size, the autoscaling group will try to remove (detach and terminate) instances
Minimum sizeA number of instances below which the desired capacity is not allowed to fall
Maximum sizeA number of instances above which the desired capacity is not allowed to rise
MetricA measurement (such as CPU utilization, memory usage, network usage) associated with the autoscaling group, for which a time series of data points is generated regularly. Thresholds for metrics can be used to set autoscaling policies. Metrics can be based on aggregates of metrics for instances of the autoscaling group, or based on load balancers associated with the autoscaling group
Scaling policy (or autoscaling policy)A policy that specifies a change to the autoscaling group's desired capacity (or sometimes, its minimum and maximum size) in response to metrics crossing specific thresholds. Scaling policies can have associated cooldown periods, which prevent additional scaling actions from occurring immediately after a specific scaling action. Changes to desired capacity could be incremental (increase or decrease by a specific number) or could specify a new value of the desired capacity. Policies that increase the desired capacity are called "scaling out" or "scaling up" policies, and policies that decrease the desired capacity are called "scaling in" or "scaling down" policies
Health checkA way for the autoscaling group to determine if the instances attached to it are functioning properly. A health check may be based on whether the instance still exists and is reachable, or it could be based on whether the instance is still registered and in service with an associated load balancer
Launch configurationA description of the parameters and scripts used when launching a new instance. This includes the instance type, purchase options (such as spot versus on-demand in the case of AWS), possible availability zones for launch, machine image, and scripts to run on launchInstance template (Google Cloud Platform)
Manual scalingA scaling action executed manually
Scheduled scalingA scaling policy that is executed at a specific time, for instance, time of day or week or month or year. See #Scheduled scaling for more

Practice

Amazon Web Services (AWS)

Auto-scaling AWS Simple Icons Compute Amazon Elastic MapReduce Auto Scaling.svg
Auto-scaling

Amazon Web Services launched the Amazon Elastic Compute Cloud (EC2) service in August 2006, that allowed developers to programmatically create and terminate instances (machines). [11] [12] At the time of initial launch, AWS did not offer autoscaling, but the ability to programmatically create and terminate instances gave developers the flexibility to write their own code for autoscaling.

Third-party autoscaling software for AWS began appearing around April 2008. These included tools by Scalr [13] and RightScale. RightScale was used by Animoto, which was able to handle Facebook traffic by adopting autoscaling. [14] [15]

On May 18, 2009, Amazon launched its own autoscaling feature along with Elastic Load Balancing, as part of Amazon Elastic Compute Cloud. [16] Autoscaling is now an integral component of Amazon's EC2 offering. [2] [17] [18] Autoscaling on Amazon Web Services is done through a web browser or the command line tool. [19] In May 2016 Autoscaling was also offered in AWS ECS Service. [20]

On-demand video provider Netflix documented their use of autoscaling with Amazon Web Services to meet their highly variable consumer needs. They found that aggressive scaling up and delayed and cautious scaling down served their goals of uptime and responsiveness best. [7]

In an article for TechCrunch , Zev Laderman, the co-founder and CEO of Newvem, a service that helps optimize AWS cloud infrastructure, recommended that startups use autoscaling in order to keep their Amazon Web Services costs low. [4]

Various best practice guides for AWS use suggest using its autoscaling feature even in cases where the load is not variable. That is because autoscaling offers two other advantages: automatic replacement of any instances that become unhealthy for any reason (such as hardware failure, network failure, or application error), and automatic replacement of spot instances that get interrupted for price or capacity reasons, making it more feasible to use spot instances for production purposes. [6] [21] [22] Netflix's internal best practices require every instance to be in an autoscaling group, and its conformity monkey terminates any instance not in an autoscaling group in order to enforce this best practice. [23]

Microsoft's Windows Azure

On June 27, 2013, Microsoft announced that it was adding autoscaling support to its Windows Azure cloud computing platform. [24] [25] [26] Documentation for the feature is available on the Microsoft Developer Network. [10] [27]

Oracle Cloud

Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule. [28] These rules are based on CPU and/or memory utilization and determine when to add or remove nodes.

Google Cloud Platform

On November 17, 2014, the Google Compute Engine announced a public beta of its autoscaling feature for use in Google Cloud Platform applications. [29] [30] [31] [32] As of March 2015, the autoscaling tool is still in Beta. [9]

Facebook

In a blog post in August 2014, a Facebook engineer disclosed that the company had started using autoscaling to bring down its energy costs. The blog post reported a 27% decline in energy use for low traffic hours (around midnight) and a 10-15% decline in energy use over the typical 24-hour cycle. [3] [33]

Kubernetes Horizontal Pod Autoscaler

Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization (or, with beta support, on some other, application-provided metrics) [34]

Alternative autoscaling decision approaches

Autoscaling by default uses reactive decision approach for dealing with traffic scaling: scaling only happens in response to real-time changes in metrics. In some cases, particularly when the changes occur very quickly, this reactive approach to scaling is insufficient. Two other kinds of autoscaling decision approaches are described below.

Scheduled autoscaling approach

This is an approach to autoscaling where changes are made to the minimum size, maximum size, or desired capacity of the autoscaling group at specific times of day. Scheduled scaling is useful, for instance, if there is a known traffic load increase or decrease at specific times of the day, but the change is too sudden for reactive approach based autoscaling to respond fast enough. AWS autoscaling groups support scheduled scaling. [35]

Predictive autoscaling

This approach to autoscaling uses predictive analytics. The idea is to combine recent usage trends with historical usage data as well as other kinds of data to predict usage in the future, and autoscale based on these predictions.

For parts of their infrastructure and specific workloads, Netflix found that Scryer, their predictive analytics engine, gave better results than Amazon's reactive autoscaling approach. In particular, it was better for: [36] [33]

On November 20, 2018, AWS announced that predictive scaling would be available as part of its autoscaling offering. [37]

See also

Related Research Articles

<span class="mw-page-title-main">Amazon Web Services</span> On-demand cloud computing company

Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis. Clients will often use this in combination with autoscaling. These cloud computing web services provide various services related to networking, compute, storage, middleware, IoT and other processing capacity, as well as software tools via AWS server farms. This frees clients from managing, scaling, and patching hardware and operating systems. One of the foundational services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a virtual cluster of computers, with extremely high availability, which can be interacted with over the internet via REST APIs, a CLI or the AWS console. AWS's virtual computers emulate most of the attributes of a real computer, including hardware central processing units (CPUs) and graphics processing units (GPUs) for processing; local/RAM memory; hard-disk/SSD storage; a choice of operating systems; networking; and pre-loaded application software such as web servers, databases, and customer relationship management (CRM).

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.

<span class="mw-page-title-main">Amazon Elastic Compute Cloud</span> Cloud computing platform

Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment of applications by providing a web service through which a user can boot an Amazon Machine Image (AMI) to configure a virtual machine, which Amazon calls an "instance", containing any software desired. A user can create, launch, and terminate server-instances as needed, paying by the second for active servers – hence the term "elastic". EC2 provides users with control over the geographical location of instances that allows for latency optimization and high levels of redundancy. In November 2010, Amazon switched its own retail website platform to EC2 and AWS.

Eucalyptus is a paid and open-source computer software for building Amazon Web Services (AWS)-compatible private and hybrid cloud computing environments, originally developed by the company Eucalyptus Systems. Eucalyptus is an acronym for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. Eucalyptus enables pooling compute, storage, and network resources that can be dynamically scaled up or down as application workloads change. Mårten Mickos was the CEO of Eucalyptus. In September 2014, Eucalyptus was acquired by Hewlett-Packard and then maintained by DXC Technology. After DXC stopped developing the product in late 2017, AppScale Systems forked the code and started supporting Eucalyptus customers.

An Amazon Machine Image (AMI) is a special type of virtual appliance that is used to create a virtual machine within the Amazon Elastic Compute Cloud ("EC2"). It serves as the basic unit of deployment for services delivered using EC2.

<span class="mw-page-title-main">Amazon Virtual Private Cloud</span> Cloud-based service

Amazon Virtual Private Cloud (VPC) is a commercial cloud computing service that provides a virtual private cloud, by provisioning a logically isolated section of Amazon Web Services (AWS) Cloud. Enterprise customers can access the Amazon Elastic Compute Cloud (EC2) over an IPsec based virtual private network. Unlike traditional EC2 instances which are allocated internal and external IP numbers by Amazon, the customer can assign IP numbers of their choosing from one or more subnets.

<span class="mw-page-title-main">AppScale</span> American cloud infrastructure software company

AppScale is a software company offering cloud infrastructure software and services to enterprises, government agencies, contractors, and third-party service providers. The company commercially supports one software product, AppScale ATS, a managed hybrid cloud infrastructure software platform that emulates the core AWS APIs. In 2019, the company ended commercial support for its open-source serverless computing platform AppScale GTS, however, its source code remains freely available to the open-source community.

Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.

<span class="mw-page-title-main">Amazon Silk</span> Web browser for Amazon Fire

Amazon Silk is a web browser developed by Amazon. It was launched in November 2011 for Kindle Fire and Fire Phone, and a Fire TV version was launched in November 2017. The addition of Silk to the Echo Show was announced at an Amazon event in September 2018.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

Amazon Route 53 is a scalable and highly available Domain Name System (DNS) service. Released on December 5, 2010, it is part of Amazon.com's cloud computing platform, Amazon Web Services (AWS). The name is a possible reference to U.S. Routes, and "53" is a reference to the TCP/UDP port 53, where DNS server requests are addressed. In addition to being able to route users to various AWS services, including EC2 instances, Route 53 also enables AWS customers to route users to non-AWS infrastructure and to monitor the health of their application and its endpoints. Route 53's servers are distributed throughout the world. Amazon Route 53 supports full, end-to-end DNS resolution over IPv6. Recursive DNS resolvers on IPv6 networks can use either IPv4 or IPv6 transport to send DNS queries to Amazon Route 53.

AWS Elastic Beanstalk is an orchestration service offered by Amazon Web Services for deploying applications which orchestrates various AWS services, including EC2, S3, Simple Notification Service (SNS), CloudWatch, autoscaling, and Elastic Load Balancers. Elastic Beanstalk provides an additional layer of abstraction over the bare server and OS; users instead see a pre-built combination of OS and platform, such as "64bit Amazon Linux 2014.03 v1.1.0 running Ruby 2.0 (Puma)" or "64bit Debian jessie v2.0.7 running Python 3.4 ". Deployment requires a number of components to be defined: an 'application' as a logical container for the project, a 'version' which is a deployable build of the application executable, a 'configuration template' that contains configuration information for both the Beanstalk environment and for the product. Finally an 'environment' combines a 'version' with a 'configuration' and deploys them. Executables themselves are uploaded as archive files to S3 beforehand and the 'version' is just a pointer to this.

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.

Cycle Computing is a company that provides software for orchestrating computing and storage resources in cloud environments. The flagship product is CycleCloud, which supports Amazon Web Services, Google Compute Engine, Microsoft Azure, and internal infrastructure. The CycleCloud orchestration suite manages the provisioning of cloud infrastructure, orchestration of workflow execution and job queue management, automated and efficient data placement, full process monitoring and logging, within a secure process flow.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics and machine learning. Registration requires a credit card or bank account details.

An elastic cloud is a cloud computing offering that provides variable service levels based on changing needs.

This is a timeline of Amazon Web Services, which offers a suite of cloud computing services that make up an on-demand computing platform.

Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers. "Serverless" is a misnomer in the sense that servers are still used by cloud service providers to execute code for developers. However, developers of serverless applications are not concerned with capacity planning, configuration, management, maintenance, fault tolerance, or scaling of containers, VMs, or physical servers. Serverless computing does not hold resources in volatile memory; computing is rather done in short bursts with the results persisted to storage. When an app is not in use, there are no computing resources allocated to the app. Pricing is based on the actual amount of resources consumed by an application. It can be a form of utility computing.

Amazon Elastic File System is a cloud storage service provided by Amazon Web Services (AWS) designed to provide scalable, elastic, concurrent with some restrictions, and encrypted file storage for use with both AWS cloud services and on-premises resources. Amazon EFS is built to be able to grow and shrink automatically as files are added and removed. Amazon EFS supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4) protocol, and control access to files through Portable Operating System Interface (POSIX) permissions.

AWS App Runner is a fully managed container application service offered by Amazon Web Services (AWS). Launched in May 2021, it is designed to simplify the process of building, deploying, and scaling containerized applications for developers. The service enables users to focus on writing code and developing features, without needing to manage the underlying infrastructure. It provides automatic scaling, load balancing, and security features, making it a suitable choice for deploying web applications and APIs. The service also simplifies MLOps.

References

  1. "Above the Clouds: A Berkeley View of Cloud Computing" (PDF). Berkeley EECS. February 10, 2009. Retrieved March 21, 2015.
  2. 1 2 "Auto Scaling". Amazon Web Services . Retrieved March 21, 2015.
  3. 1 2 3 Wu, Qiang (August 8, 2014). "Making Facebook's software infrastructure more energy efficient with Autoscale". Facebook Code Blog. Retrieved March 21, 2015.
  4. 1 2 Laderman, Zev (April 22, 2012). "The 10 Biggest Mistakes Made With Amazon Web Services". TechCrunch . Retrieved March 21, 2015.
  5. Park, Andrew; Denlinger, Darrell; Watson, Coburn (September 18, 2015). "Creating Your Own EC2 Spot Market". Netflix . Retrieved December 16, 2016.
  6. 1 2 Wittig, Michael (December 26, 2015). "5 AWS mistakes you should avoid". cloudonaut. Retrieved December 16, 2016.
  7. 1 2 Orzell, Greg; Becker, Justin (January 18, 2012). "Auto Scaling in the Amazon Cloud". Netflix Tech Blog. Retrieved March 21, 2012.
  8. 1 2 "What Is Auto Scaling?". Amazon Web Services . Retrieved December 16, 2016.
  9. 1 2 "Autoscaler". Google Cloud Platform . Retrieved March 21, 2015.
  10. 1 2 "Autoscaling Guidance". Microsoft Developer Network.
  11. Cubrilovic, Nik (August 24, 2006). "Almost Exclusive: Amazon Readies Utility Computing Service". TechCrunch . Retrieved December 4, 2016.
  12. Barr, Jeff (August 25, 2006). "Amazon EC2 Beta". Amazon Web Services Blog. Retrieved May 31, 2013.
  13. Work, Henry (April 3, 2008). "Scalr: The Auto-Scaling Open-Source Amazon EC2 Effort". TechCrunch . Retrieved March 21, 2015.
  14. Howlett, Dennis (June 25, 2008). "RightScale cloud management extends to MySQL. RightScale, which specializes in cloud computing management for the Amazon Web Services platform today announced support for MySQL Enterprise. The service, which goes live July 1, provides automated deployment, management and scaling, coupled with MySQL Enterprise premium-level support for large database applications". ZDNet . Retrieved December 16, 2016.
  15. von Eicken, Thorsten (April 23, 2008). "Animoto's Facebook Scale-Up". Archived from the original on December 20, 2016. Retrieved December 16, 2016.
  16. Barr, Jeff (May 18, 2009). "New Features for Amazon EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch". Amazon Web Services . Retrieved June 15, 2016.
  17. "What is autoscaling?". TechTarget. Archived from the original on April 29, 2019. Retrieved March 21, 2015.
  18. Barr, Jeff (July 30, 2014). "Auto Scaling Update – Lifecycle Management, Standby State, and DetachInstances". Amazon Web Services (official blog). Retrieved March 21, 2015.
  19. "Auto Scaling Command Line Tool". Amazon Web Services (community-edited page). Retrieved March 21, 2015.
  20. "Automatic Scaling with Amazon ECS". 18 May 2016. Archived from the original on 25 September 2019. Retrieved 12 February 2019.
  21. Adams, Rich (February 3, 2014). "AWS Tips I Wish I'd Known Before I Started. A collection of random tips for Amazon Web Services (AWS) that I wish I'd been told a few years ago, based on what I've learned by building and deploying various applications on AWS" . Retrieved December 16, 2016.
  22. "How to Use Amazon EC2 Spot Instances". wikiHow . Retrieved December 16, 2016.
  23. "The Netflix Simian Army". Netflix. July 19, 2011. Retrieved December 5, 2016.
  24. Lardinois, Frederic (June 27, 2013). "Microsoft Adds Auto Scaling To Windows Azure". TechCrunch . Retrieved March 21, 2015.
  25. "Microsoft to add autoscaling, alerts to Windows Azure". ZDNet . June 27, 2013. Retrieved March 21, 2015.
  26. Butler, Brandon (August 7, 2013). "Google, Microsoft play catch up to Amazon, add load balancing, auto-scaling to their clouds". Network World. Archived from the original on May 18, 2018. Retrieved March 21, 2015.
  27. "The Autoscaling Application Block". Microsoft Developer Network . Retrieved March 21, 2015.
  28. "Administering PaaS Services". Oracle Help Center. Retrieved 2018-05-16.
  29. Balejko, Filip (November 17, 2014). "Autoscaling, welcome to Google Compute Engine". Google Cloud Platform blog. Retrieved March 21, 2015.
  30. Protalinski, Emil (November 17, 2014). "Google Compute Engine gets Autoscaler to adjust app resources based on varying traffic and workloads". VentureBeat . Retrieved March 21, 2015.
  31. Lardinois, Frederic (November 17, 2014). "Google Brings Autoscaling To Compute Engine". TechCrunch . Retrieved March 21, 2015.
  32. Verge, Jason (November 17, 2014). "Google Launches Autoscaling Beta on Compute Engine". Data Center Knowledge. Retrieved March 21, 2015.
  33. 1 2 "Autoscaling: How the Cloud Provides a Tremendous Boost". Morpheus. November 2, 2016. Retrieved December 16, 2016.
  34. "Horizontal Pod Autoscaler Walkthrough" . Retrieved June 21, 2018.
  35. "Scheduled Scaling". Amazon Web Services . Retrieved December 16, 2016.
  36. Jacobson, Daniel; Yuan, Danny; Joshi, Neeraj. "Scryer: Netflix's Predictive Auto Scaling Engine". The Netflix Tech Blog. Netflix. Retrieved 28 May 2015.
  37. Barr, Jeff (November 20, 2018). "New – Predictive Scaling for EC2, Powered by Machine Learning". Amazon Web Services. Retrieved November 23, 2018.