YugabyteDB

Last updated
Original author(s) Kannan Muthukkaruppan, Karthik Ranganathan, Mikhail Bautin
Developer(s) Yugabyte, Inc.
Initial release2016;8 years ago (2016)
Stable release
2.20 (Stable)
2.19 (Development) / January 25, 2024;10 months ago (2024-01-25)
October 25, 2023;13 months ago (2023-10-25)
Repository
Written in C++
Operating system Linux RedHat 7.x and derivatives, MacOS
Platform Bare Metal, Virtual Machine, Docker, Kubernetes and various container management platforms
Available in English
Type RDBMS
License Apache 2.0
Website www.yugabyte.com   OOjs UI icon edit-ltr-progressive.svg
Yugabyte, Inc.
Company type Private
Industry Software
Founded2016;8 years ago (2016)
FounderKannan Muthukkaruppan, Karthik Ranganathan, Mikhail Bautin
Headquarters Sunnyvale, California, USA
Key people
Kannan Muthukkaruppan
(Co-Founder & President,
Product Development)

Karthik Ranganathan
(Co-Founder & CTO)
Mikhail Bautin
(Co-Founder &
Software Architect)

Bill Cook
(CEO)
ServicesCommercial database management systems
Website yugabyte.com

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte. [1]

Contents

History

Yugabyte was founded by ex-Facebook engineers Kannan Muthukkaruppan, Karthik Ranganathan, and Mikhail Bautin. At Facebook, they were part of the team that built and operated Cassandra and HBase [2] [3] for workloads such as Facebook Messenger and Facebook's Operational Data Store. [4]

The founders came together in February 2016 to build YugabyteDB. [5] [6]

YugabyteDB was initially available in two editions: community and enterprise. In July 2019, Yugabyte open-sourced previously commercial features and launched YugabyteDB as open-source under the Apache 2.0 license. [7]

Funding

In October 2021, five years after the company's inception, Yugabyte closed a $188 Million Series C funding round to become a Unicorn start-up with a valuation of $1.3Bn [8]

Funding Rounds
SeriesDate AnnouncedAmountInvestors
A10 Feb 2016$8M Lightspeed Venture Partners, Jeff Rothschild [9] [10]
A12 Jun 2018$16MLightspeed Venture Partners, Dell Technology Capital [11] [12]
B09 Jun 2020$30M Wipro Ventures, Lightspeed Venture Partners. Dell Technology Capital. 8VC [13] [14]
B03 Mar 2021$48MWipro Ventures. Lightspeed Venture Partners. Greenspring Associates, Dell Technology Capital, 8VC [15] [16]
C28 Oct 2021$188MWells Fargo Strategic Capital, Sapphire Ventures, Meritech Capital Partners, Lightspeed Venture Partners, Dell Technology Capital, 8VC [17] [18] [19]

Architecture

YugabyteDB is a distributed SQL database that aims to be strongly transactionally consistent across failure zones (i.e. ACID compliance]. [20] [21] Jepsen testing, the de facto industry standard for verifying correctness, has never fully passed, mainly due to race conditions during schema changes. [22] In CAP Theorem terms YugabyteDB is a Consistent/Partition Tolerant (CP) database. [23] [24] [25]

YugabyteDB has two layers, [26] a storage engine known as DocDB and the Yugabyte Query Layer. [27]

YugabyteDB Architecture YugabyteDBArchitecture.png
YugabyteDB Architecture

DocDB

The storage engine consists of a customized RocksDB [27] [28] combined with sharding and load balancing algorithms for the data. In addition, the Raft consensus algorithm controls the replication of data between the nodes. [27] [28] There is also a Distributed transaction manager [27] [28] and Multiversion concurrency control (MVCC) [27] [28] to support distributed transactions. [28]

The engine also exploits a Hybrid Logical Clock [29] [27] that combines coarsely-synchronized physical clocks with Lamport clocks to track causal relationships. [30]

The DocDB layer is not directly accessible by users. [27]

YugabyteDB Query Layer

Yugabyte has a pluggable query layer that abstracts the query layer from the storage layer below. [31] There are currently two APIs that can access the database: [28]

YSQL [32] is a PostgreSQL code-compatible API [33] [34] based around v11.2. YSQL is accessed via standard PostgreSQL drivers using native protocols. [35] It exploits the native PostgreSQL code for the query layer [36] and replaces the storage engine with calls to the pluggable query layer. This re-use means that Yugabyte supports many features, including:

YCQL [37] is a Cassandra-like API based around v3.10 and re-written in C++. YCQL is accessed via standard Cassandra drivers [38] using the native protocol port of 9042. In addition to the 'vanilla' Cassandra components, YCQL is augmented with the following features:

Currently, data written to either API is not accessible via the other API, however YSQL can access YCQL using the PostgreSQL foreign data wrapper feature. [42]

The security model for accessing the system is inherited from the API, so access controls for YSQL look like PostgreSQL, [43] and YCQL looks like Cassandra access controls. [44]

Cluster-to-cluster replication

In addition to its core functionality of distributing a single database, YugabyteDB has the ability to replicate between database instances. [45] [46] The replication can be one-way or bi-directional and is asynchronous. One-way replication is used either to create a read-only copy for workload off-loading or in a read-write mode to create an active-passive standby. Bi-directional replication is generally used in read-write configurations and is used for active-active configurations, geo-distributed applications, etc.

Migration tooling

Yugabyte also provides YugabyteDB Voyager, tooling to facilitate the migration of Oracle and other similar databases to YugabyteDB. [47] [48] This tool supports the migration of schemas, procedural code and data from the source platform to YugabyteDB.

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

Optimistic concurrency control (OCC), also known as optimistic locking, is a non-locking concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted. Optimistic concurrency control was first proposed in 1979 by H. T. Kung and John T. Robinson.

MySQL Cluster, also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

<span class="mw-page-title-main">Navicat</span> SQL database management software

Navicat is a series of graphical database management and development software produced by CyberTech Ltd. for MySQL, MariaDB, Redis, MongoDB, Oracle, SQLite, PostgreSQL and Microsoft SQL Server. It has an Explorer-like graphical user interface and supports multiple database connections for local and remote databases. Its design is made to meet the needs of a variety of audiences, from database administrators and programmers to various businesses/companies that serve clients and share information with partners.

Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

<span class="mw-page-title-main">Postgres-XL</span>

Postgres-XL is a distributed relational database management system (RDBMS) software based on PostgreSQL. It aims to provide feature parity with PostgreSQL while distributing the workload over a cluster. The name "Postgres-XL" stands for "eXtensible Lattice".

Amazon DocumentDB is a managed proprietary NoSQL database service that supports document data structures, with some compatibility with MongoDB version 3.6 and version 4.0. As a document database, Amazon DocumentDB can store, query, and index JSON data. It is available on Amazon Web Services. As of March 2023, AWS introduced some compliance with MongoDB 5.0 but lacks time series collection support.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

References

  1. "YugabyteDB System Properties". DB-Engines. Retrieved 30 December 2021.
  2. "Karthik Ranganathan". Dataversity. Retrieved 30 December 2021.
  3. Borthakur, Dhruba; Rash, Samuel; Schmidt, Rodrigo; Aiyer, Amitanand; Gray, Jonathan; Sarma, Joydeep Sen; Muthukkaruppan, Kannan; Spiegelberg, Nicolas; Kuang, Hairong; Ranganathan, Karthik; Molkov, Dmytro; Menon, Aravind (2011). "Apache hadoop goes realtime at Facebook". Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. p. 1071. doi:10.1145/1989323.1989438. ISBN   9781450306614. S2CID   207188340 . Retrieved 15 January 2022.{{cite book}}: |website= ignored (help)
  4. "YugaByte Raises $8M in Series A Funding". FINSMES. 2 November 2017. Retrieved 30 December 2021.
  5. "Yugabyte CTO outlines a PostgreSQL path to distributed cloud". VentureBeat. 26 July 2021. Retrieved 31 December 2021.
  6. "Yugabyte expands its fully managed enterprise cloud service with $188M". VentureBeat. 28 October 2021. Retrieved 30 December 2021.
  7. "Yugabyte Expands Multi-Region Database Capabilities and Enterprise-Grade Security with YugabyteDB 2.5". businesswire.com. 12 November 2020. Retrieved 30 November 2024.
  8. "Another cloud native SQL database unicorn: Yugabyte raises $188M Series C funding at $1.3B valuation". ZDNet. Retrieved 12 January 2022.
  9. "YugaByte Raises $8M in Series A Funding". Finsmes. 2 November 2017.
  10. "YugaByte Receives $8M Series A Round". VC News Daily. Retrieved 12 January 2022.
  11. "YugaByte raises $16 Million to combine SQL and NoSQL in a single database". Technologies.org. Retrieved 12 January 2022.
  12. "YugaByte's new database software rakes in $16 million so developers can move to any cloud". TechCrunch. 12 June 2018. Retrieved 12 January 2022.
  13. "Another globally distributed cloud native SQL database on the rise: Yugabyte Raises $30 million in Series B Funding". ZDNet. Retrieved 12 January 2022.
  14. "Yugabyte raises $30M for its cloud-native distributed SQL database". SiliconAngle. 9 June 2020. Retrieved 12 January 2022.
  15. "Yugabyte raises $48M for open source SQL database alternative". VentureBeat. 3 March 2021. Retrieved 12 January 2022.
  16. "Yugabyte Raises $48 Million Funding Round to Accelerate Distributed SQL Enterprise Adoption and Fuel Global Expansion". YahoonFinance. Retrieved 12 January 2022.
  17. "Yugabyte's latest funding round values the distributed SQL system at $1.3bn". The Register. Retrieved 12 January 2022.
  18. "Another cloud native SQL database unicorn: Yugabyte raises $188M Series C funding at $1.3B valuation". ZDNet. Retrieved 12 January 2022.
  19. "High-performance database startup Yugabyte raises $188M in new funding round". Silicon Angle. 28 October 2021. Retrieved 12 January 2022.
  20. "ACID Transactions". Devopedia. 18 August 2019. Retrieved 12 January 2022.
  21. "ICT Solutions for local flexibility markets" (PDF). Academia de Studii Economice din Bucuresti. Proceedings of the IE 2020 International Conference. Retrieved 15 January 2022.
  22. "YugaByte DB 1.3.1". Jepsen.io. Retrieved 30 December 2021.
  23. "YugaByteDB: A Distributed Cloud Native Database for a Highly Scalable Data Store". Open Source Foru. 14 September 2020. Retrieved 15 January 2022.
  24. "Yugabyte Design Goals". Yugabyte.com. Retrieved 15 January 2022.
  25. Galić, Zdravko; Vuzem, Mario (2020). "A Generic and Extensible Core and Prototype of Consistent, Distributed, and Resilient LIS". ISPRS International Journal of Geo-Information. 9 (7): 437. Bibcode:2020IJGI....9..437G. doi: 10.3390/ijgi9070437 .
  26. "Yugabyte Layered Architecture". Yugabyte. Retrieved 15 January 2022.
  27. 1 2 3 4 5 6 7 Hirsch, Orhan Henrik. "Scalability of NewSQL Databases in a Cloud Environment" (PDF). Norwegian University of Science and Technology. NYNU Open. Retrieved 15 January 2022.
  28. 1 2 3 4 5 6 Budholia, Akash. "NewSQL Monitoring System". San Jose State University Scholar Works. Retrieved 15 January 2022.
  29. "Hybrid Clock". Martin Fowler. Retrieved 30 December 2021.
  30. "Distributed Transactions without Atomic Clocks" (PDF). Yugabyte. Retrieved 15 January 2022.
  31. "Yugabyte DB 2.0 Ships Production-Ready Distributed SQL Database for Going Cloud Native". Integration Developer News. Retrieved 15 January 2022.
  32. "Yugabyte Structured Query Language (YSQL)". Yugabyte. Retrieved 15 January 2022.
  33. "Yugabyte Meets Developer Demand for Comprehensive PostgreSQL Compatibility with YugabyteDB 2.11". BusinessWire. 23 November 2021. Retrieved 15 January 2022.
  34. 1 2 3 4 "PostgreSQL Compatibility in YugabyteDB 2.0". Yugabyte. 17 September 2019.
  35. "Client Drivers for YSQL". Yugabyte.
  36. "Why We Built YugabyteDB by Reusing the PostgreSQL Query Layer". Yugabyte. 24 April 2020. Retrieved 15 January 2022.
  37. "Yugabyte Cloud Query Language (YCQL)". Yugabyte. Retrieved 15 January 2022.
  38. "Client drivers for YCQL". Yugabyte.
  39. "ACID Transactions". Yugabyte.
  40. "YCQL JSONB Data Type". Yugabyte. Retrieved 15 January 2022.
  41. "YCQL Secondary Indexes". Yugabyte. Retrieved 15 January 2022.
  42. "YugabyteDB: Postgres foreign data wrapper". Gruchalski. 8 November 2021. Retrieved 15 January 2022.
  43. "YSQL Access Control". Yugabyte. Retrieved 15 January 2022.
  44. "YCWL access Controls". Yugabyte. Retrieved 15 January 2022.
  45. "Yugabyte Expands Multi-Region Database Capabilities and Enterprise-Grade Security with YugabyteDB 2.5". Business Wire. 12 November 2020. Retrieved 15 January 2022.
  46. "xCLuster Replication". Yugabyte. Retrieved 15 January 2022.
  47. "Yugabyte simplifies SQL database migration with YugabyteDB Voyager". siliconANGLE. 24 January 2023. Retrieved 15 March 2023.
  48. "Yugabyte chomps into cloud migration". Techzine. 2 February 2023. Retrieved 15 March 2023.