Shared-nothing architecture

Last updated

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share (independently access) the same memory or storage.

Contents

One alternative architecture is shared everything, in which requests are satisfied by arbitrary combinations of nodes. This may introduce contention, as multiple nodes may seek to update the same data at the same time. It also contrasts with shared-disk and shared-memory architectures.

SN eliminates single points of failure, allowing the overall system to continue operating despite failures in individual nodes and allowing individual nodes to upgrade hardware or software without a system-wide shutdown. [1]

A SN system can scale simply by adding nodes, since no central resource bottlenecks the system. [2] In databases, a term for the part of a database on a single node is a shard . A SN system typically partitions its data among many nodes. A refinement is to replicate commonly used but infrequently modified data across many nodes, allowing more requests to be resolved on a single node.

History

Michael Stonebraker at the University of California, Berkeley used the term in a 1986 database paper. [3] Teradata delivered the first SN database system in 1983. [4] Tandem Computers NonStop systems, a shared-nothing implementation of hardware and software was released to market in 1976. [5] [6] Tandem Computers later released NonStop SQL, a shared-nothing relational database, in 1984. [7]

Applications

Shared-nothing is popular for web development.

Shared-nothing architectures are prevalent for data warehousing applications, although requests that require data from multiple nodes can dramatically reduce throughput. [8]

See also

Related Research Articles

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location ; or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system.

Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, 911 systems, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded by Jimmy Treybig in 1974 in Cupertino, California. It remained independent until 1997, when it became a server division within Compaq. It is now a server division within Hewlett Packard Enterprise, following Hewlett-Packard's acquisition of Compaq and the split of Hewlett-Packard into HP Inc. and Hewlett Packard Enterprise.

NonStop is a series of server computers introduced to market in 1976 by Tandem Computers Inc., beginning with the NonStop product line. It was followed by the Tandem Integrity NonStop line of lock-step fault-tolerant computers, now defunct. The original NonStop product line is currently offered by Hewlett Packard Enterprise since Hewlett-Packard Company's split in 2015. Because NonStop systems are based on an integrated hardware/software stack, Tandem and later HPE also developed the NonStop OS operating system for them.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

In computing, the term data warehouse appliance (DWA) was coined by Foster Hinshaw for a computer architecture for data warehouses (DW) specifically marketed for big data analysis and discovery that is simple to use and has a high performance for the workload. A DWA includes an integrated set of servers, storage, operating systems, and databases.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Volt Active Data is an in-memory database designed by Michael Stonebraker, Sam Madden, and Daniel Abadi.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive require large volumes of data and devote most of their processing time to I/O and manipulation of data.

H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

<span class="mw-page-title-main">Apache Ignite</span>

Apache Ignite is a distributed database management system for high-performance computing.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope.

References

  1. Wright, Dave (2014-09-17). "The Advantages of a Shared Nothing Architecture for Truly Non-Disruptive Upgrades". netapp.com. Retrieved 2019-10-31.
  2. Blankenhorn, Dana (February 27, 2006). "Shared nothing coming to open source". ZDNet. Archived from the original on October 4, 2012. Retrieved June 21, 2012.
  3. Michael Stonebraker (1986). "The Case for Shared Nothing Architecture" (PDF). Database Engineering. 9 (1).
  4. "Teradata History". Teradata.com. Retrieved 2013-06-16.
  5. ""Tandem History: An Introduction"". Center Magazine: A Newsletter for Tandem Employees. 6 (1). Winter 1986.
  6. "History of TANDEM COMPUTERS, INC. – FundingUniverse". www.fundinguniverse.com. Retrieved 2023-03-01.
  7. "NonStop SQL, A Distributed, High-Performance, High-Availability Implementation of SQL, Tandem Technical Report TR-87.4" (PDF). Archived from the original (PDF) on 2012-03-16. Retrieved 2012-10-11.
  8. "Article on Shared Nothing from the point of view of a Shared Nothing Vendor" (PDF).