NewSQL

Last updated

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. [1] [2] [3] [4]

Contents

Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. [5] [6] The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts.

History

The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of a new generation of database management systems. [5] One of the first NewSQL systems was the H-Store parallel database system. [7] [8]

Applications

Typical applications are characterized by heavy OLTP transaction volumes. OLTP transactions;

However, some support hybrid transactional/analytical processing (HTAP) applications. Such systems improve performance and scalability by omitting heavyweight recovery or concurrency control. [10]

List of NewSQL-databases

Features

The two common distinguishing features of NewSQL database solutions are that they support online scalability of NoSQL databases and the relational data model (including ACID consistency) using SQL as their primary interface. [11]

NewSQL systems can be loosely grouped into three categories: [2] [12]

New architectures

NewSQL systems adopt various internal architectures. Some systems employ a cluster of shared-nothing nodes, in which each node manages a subset of the data. They include components such as distributed concurrency control, flow control, and distributed query processing.

SQL engines

The second category are optimized storage engines for SQL. These systems provide the same programming interface as SQL, but scale better than built-in engines.

Transparent sharding

These systems automatically split databases across multiple nodes using Raft or Paxos consensus algorithm.

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops in which case many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

NonStop SQL is a commercial relational database management system that is designed for fault tolerance and scalability, currently offered by Hewlett Packard Enterprise. The latest version is SQL/MX 3.4.

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share the same memory or storage.

Data engineering refers to the building of systems to enable the collection and usage of data. This data is usually used to enable subsequent analysis and data science, which often involves machine learning. Making the data usable usually involves substantial compute and storage, as well as data processing.

<span class="mw-page-title-main">MonetDB</span> Open source column-oriented relational database management system

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

Operational database management systems, are used to update data in real-time. These types of databases allow users to do more than simply view archived data. Operational databases allow you to modify that data, doing it in real-time. OLTP databases provide transactions as main abstraction to guarantee data consistency that guarantee the so-called ACID properties. Basically, the consistency of the data is guaranteed in the case of failures and/or concurrent access to the data.

<span class="mw-page-title-main">Michael Stonebraker</span> American computer scientist (born 1943)

Michael Ralph Stonebraker is an American computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Volt Active Data is an in-memory database designed by Michael Stonebraker, Sam Madden, and Daniel Abadi.

H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.

QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.

Clustrix, Inc. is a San Francisco-based private company founded in 2006 that developed a database management system marketed as NewSQL.

<span class="mw-page-title-main">Spanner (database)</span> Cloud-based distributed SQL DBMS service

Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos.

<span class="mw-page-title-main">CockroachDB</span> Distributed database management system

CockroachDB is a source-available distributed SQL database management system developed by Cockroach Labs.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

References

  1. Aslett, Matthew (2011). "How Will The Database Incumbents Respond To NoSQL And NewSQL?" (PDF). 451 Group (published April 4, 2011). Retrieved February 22, 2020.
  2. 1 2 Pavlo, Andrew; Aslett, Matthew (2016). "What's Really New with NewSQL?" (PDF). SIGMOD Record. Retrieved February 22, 2020.
  3. Stonebraker, Michael (June 16, 2011). "NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps". Communications of the ACM Blog. Retrieved February 22, 2020.
  4. Hoff, Todd (September 24, 2012). "Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In" . Retrieved February 22, 2020.
  5. 1 2 Aslett, Matthew (April 6, 2011). "What we talk about when we talk about NewSQL". 451 Group. Retrieved February 22, 2020.
  6. Lloyd, Alex (2012). "Building Spanner" (PDF). Berlin Buzzwords (published June 5, 2012). Retrieved February 22, 2020.
  7. Aslett, Matthew (March 4, 2008). "Is H-Store the future of database management systems?" . Retrieved February 22, 2020.
  8. Monash, Curt (February 20, 2008). "H-Store: Complete destruction of the old DBMS order?". ZDNet. Retrieved February 22, 2020.
  9. Stonebraker, Michael; et al. (2007). "The End of an Architectural Era (It's Time for a Complete Rewrite)" (PDF). VLDB '07: Proceedings of the 33rd international conference on Very large data bases. Vienna, Austria. Retrieved February 22, 2020.
  10. Stonebraker, Michael; Cattell, R. (2011). "10 rules for scalable performance in 'simple operation' datastores". Communications of the ACM. 54 (6): 72. doi:10.1145/1953122.1953144.
  11. Cattell, R. (2011). "Scalable SQL and NoSQL data stores" (PDF). ACM SIGMOD Record. 39 (4): 12–27. CiteSeerX   10.1.1.692.2621 . doi:10.1145/1978915.1978919. S2CID   3357124 . Retrieved February 22, 2020.
  12. Venkatesh, Prasanna (January 30, 2012). "NewSQL - The New Way to Handle Big Data" . Retrieved February 22, 2020.