NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. [1] [2] [3] [4]
Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. [5] [6] The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts.
The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of a new generation of database management systems. [5] One of the first NewSQL systems was the H-Store parallel database system. [7] [8]
Typical applications are characterized by heavy OLTP transaction volumes. OLTP transactions;
However, some support hybrid transactional/analytical processing (HTAP) applications. Such systems improve performance and scalability by omitting heavyweight recovery or concurrency control. [10]
The two common distinguishing features of NewSQL database solutions are that they support online scalability of NoSQL databases and the relational data model (including ACID consistency) using SQL as their primary interface. [11]
NewSQL systems can be loosely grouped into three categories: [2] [12]
NewSQL systems adopt various internal architectures. Some systems employ a cluster of shared-nothing nodes, in which each node manages a subset of the data. They include components such as distributed concurrency control, flow control, and distributed query processing.
The second category are optimized storage engines for SQL. These systems provide the same programming interface as SQL, but scale better than built-in engines.
These systems automatically split databases across multiple nodes using Raft or Paxos consensus algorithm.
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.
Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.
A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:
NonStop SQL is a commercial relational database management system that is designed for fault tolerance and scalability, currently offered by Hewlett Packard Enterprise. The latest version is SQL/MX 3.4.
A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share the same memory or storage.
Data engineering refers to the building of systems to enable the collection and usage of data. This data is usually used to enable subsequent analysis and data science, which often involves machine learning. Making the data usable usually involves substantial compute and storage, as well as data processing.
MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.
Operational database management systems, are used to update data in real-time. These types of databases allow users to do more than simply view archived data. Operational databases allow you to modify that data, doing it in real-time. OLTP databases provide transactions as main abstraction to guarantee data consistency that guarantee the so-called ACID properties. Basically, the consistency of the data is guaranteed in the case of failures and/or concurrent access to the data.
Michael Ralph Stonebraker is an American computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB, and served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."
Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.
Volt Active Data is an in-memory database designed by Michael Stonebraker, Sam Madden, and Daniel Abadi.
H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.
QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.
Clustrix, Inc. is a San Francisco-based private company founded in 2006 that developed a database management system marketed as NewSQL.
Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos.
CockroachDB is a source-available distributed SQL database management system developed by Cockroach Labs.
Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.
TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.
A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.