Last updated

Scalability is the property of a system to handle a growing amount of work by adding resources to the system. [1]

In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a package delivery system is scalable because more packages can be delivered by adding more delivery vehicles. However, if all packages had to first pass through a single warehouse for sorting, the system would not be scalable, because one warehouse can handle only a limited number of packages. [2]

In computing, scalability is a characteristic of computers, networks, algorithms, networking protocols, programs and applications. An example is a search engine, which must support increasing numbers of users, and the number of topics it indexes. [3] Webscale is an computer architectural approach that brings the capabilities of large scale cloud computing companies into enterprise data centers. [4]

In mathematics, scalability mostly refers to closure under scalar multiplication.


The Incident Command System (ICS) is used by emergency response agencies in the United States. ICS can scale resource coordination from a single-engine roadside brushfire to an interstate wildfire. The first resource on scene establishes command, with authority to order resources and delegate responsibility (managing five to seven officers, who will again delegate to up to seven, and on as the incident grows). As an incident expands, more senior officers assume command. [5]


Scalability can be measured over multiple dimensions, such as: [6]


Horizontal (Scale Out) and Vertical Scaling (Scale Up)

Resources fall into two broad categories: horizontal and vertical. [7]

Horizontal or Scale Out

Scaling horizontally (out/in) means adding more nodes to (or removing nodes from) a system, such as adding a new computer to a distributed software application. An example might involve scaling out from one web server to three. High-performance computing applications such as seismic analysis and biotechnology workloads scaled horizontally to support tasks that once would have required expensive supercomputers. Other workloads, such as large social networks exceed the capacity of the largest supercomputer and can only be handled by scalable systems. Exploiting this scalability requires software for efficient resource management and maintenance. [6]

Vertical or Scale Up

Scaling vertically (up/down) means adding resources to (or removing resources from) a single node, typically involving the addition of CPUs, memory or storage to a single computer. [6]

Larger numbers of elements increases management complexity, more sophisticated programming to allocate tasks among resources and handle issues such as throughput and latency across nodes, while some applications do not scale horizontally.

Note that network function virtualization defines these terms differently: scaling out/in is the ability to scale by add/remove resource instances (e.g. virtual machine), whereas scaling up/down is the ability to scale by changing allocated resources (e.g. memory/CPU/storage capacity) [8]

Database scalability

Scalability for databases requires that the database system be able to perform additional work given greater hardware resources, such as additional servers, processors, memory and storage. Workloads have continued to grow and demands on databases have followed suit.

Algorithmic innovations have include row-level locking and table and index partitioning. Architectural innovations include shared nothing and shared everything architectures for managing multi-server configurations.

Strong versus eventual consistency (storage)

In the context of scale-out data storage, scalability is defined as the maximum storage cluster size which guarantees full data consistency, meaning there is only ever one valid version of stored data in the whole cluster, independently from the number of redundant physical data copies. Clusters which provide "lazy" redundancy by updating copies in an asynchronous fashion are called 'eventually consistent'. This type of scale-out design is suitable when availability and responsiveness are rated higher than consistency, which is true for many web file hosting services or web caches (if you want the latest version, wait some seconds for it to propagate). For all classical transaction-oriented applications, this design should be avoided. [9]

Many open source and even commercial scale-out storage clusters, especially those built on top of standard PC hardware and networks, provide eventual consistency only. Idem some NoSQL databases like CouchDB and others mentioned above. Write operations invalidate other copies, but often don't wait for their acknowledgements. Read operations typically don't check every redundant copy prior to answering, potentially missing the preceding write operation. The large amount of metadata signal traffic would require specialized hardware and short distances to be handled with acceptable performance (i.e. act like a non-clustered storage device or database).

Whenever strong data consistency is expected, look for these indicators:

Indicators for eventually consistent designs (not suitable for transactional applications!) are:

Performance tuning versus hardware scalability

It is often advised to focus system design on hardware scalability rather than on capacity. It is typically cheaper to add a new node to a system in order to achieve improved performance than to partake in performance tuning to improve the capacity that each node can handle. But this approach can have diminishing returns (as discussed in performance engineering). For example: suppose 70% of a program can be sped up if parallelized and run on multiple CPUs instead of one. If is the fraction of a calculation that is sequential, and is the fraction that can be parallelized, the maximum speedup that can be achieved by using P processors is given according to Amdahl's Law:

Substituting the value for this example, using 4 processors we get

If we double the compute power to 8 processors we get

Doubling the processing power has only improved the speedup by roughly one-fifth. If the whole problem was parallelizable, the speed would also double. Therefore, throwing in more hardware is not necessarily the optimal approach.

Weak versus strong scaling

In the context of high performance computing there are two common notions of scalability:

See also

Related Research Articles

Client–server model Distributed application structure in computing

Client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client-server model are Email, network printing, and the World Wide Web.

Parallel computing programming paradigm in which many calculations or the execution of processes are carried out simultaneously

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Distributed hash table a decentralized distributed system that provides a lookup service similar to a hash table

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as one logically shared address space. Here, the term "shared" does not mean that there is a single centralized memory, but that the address space is "shared". Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's non-shared private memory.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 2 and No. 3 ranked TOP500 supercomputers in 2014, Titan and Sequoia.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

IBM Spectrum Scale, formerly theGeneral Parallel File System(GPFS) is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit Supercomputer at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 top500 list of supercomputers . Summit is a 200 Petaflops system composed of more than 9,000 IBM POWER processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Distributed networking is a distributed computing network system where components of the program and data depend on multiple sources.

Computer cluster group of computers

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

Supercomputer architecture

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Computation offloading is the transfer of resource intensive computational tasks to a separate processor, such as a hardware accelerator, or an external platform, such as a cluster, grid, or a cloud. Offloading to a coprocessor can be used to accelerate applications including; image rendering and mathematical calculations. Offloading computing to an external platform over a network can provide computing power and overcome hardware limitations of a device, such as limited computational power, storage, and energy.

Dell EMC ScaleIO is a software-defined storage product from Dell EMC that creates a server-based storage area network (SAN) from local application server storage using existing customer hardware or EMC servers. It converts direct-attached storage into shared block storage.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases have adopted a host of techniques to cope.


  1. Bondi, André B. (2000). Characteristics of scalability and their impact on performance. Proceedings of the second international workshop on Software and performance – WOSP '00. p. 195. doi:10.1145/350391.350432. ISBN   158113195X.
  2. Hill, Mark D. (1990). "What is scalability?". ACM SIGARCH Computer Architecture News. 18 (4): 18. doi:10.1145/121973.121975. and
    Duboc, Leticia; Rosenblum, David S.; Wicks, Tony (2006). A framework for modelling and analysis of software systems scalability (PDF). Proceedings of the 28th international conference on Software engineering – ICSE '06. p. 949. doi:10.1145/1134285.1134460. ISBN   1595933751.
  3. Laudon, Kenneth Craig; Traver, Carol Guercio (2008). E-commerce: Business, Technology, Society. Pearson Prentice Hall/Pearson Education. ISBN   9780136006459.
  4. "Why web-scale is the future". Network World. 2020-02-13. Retrieved 2017-06-01.
  5. Bigley, Gregory A.; Roberts, Karlene H. (2001-12-01). "The Incident Command System: High-Reliability Organizing for Complex and Volatile Task Environments". Academy of Management Journal. 44 (6): 1281–1299. doi:10.5465/3069401. ISSN   0001-4273.
  6. 1 2 3 Hesham El-Rewini and Mostafa Abd-El-Barr (April 2005). Advanced Computer Architecture and Parallel Processing. John Wiley & Sons. p. 66. ISBN   978-0-471-47839-3.
  7. Michael, Maged; Moreira, Jose E.; Shiloach, Doron; Wisniewski, Robert W. (March 26, 2007). Scale-up x Scale-out: A Case Study using Nutch/Lucene. 2007 IEEE International Parallel and Distributed Processing Symposium. p. 1. doi:10.1109/IPDPS.2007.370631. ISBN   978-1-4244-0909-9.
  8. "Network Functions Virtualisation (NFV); Terminology for Main Concepts in NFV" (PDF).[ dead link ]
  9. Sadek Drobi (January 11, 2008). "Eventual consistency by Werner Vogels". InfoQ. Retrieved April 8, 2017.
  10. "The Weak Scaling of DL_POLY 3". STFC Computational Science and Engineering Department. Archived from the original on March 7, 2014. Retrieved March 8, 2014.