YugabyteDB

Last updated
Original author(s) Kannan Muthukkaruppan, Karthik Ranganathan, Mikhail Bautin
Developer(s) Yugabyte, Inc.
Initial release2016;8 years ago (2016)
Stable release
2.20 (Stable)
2.19 (Development) / January 25, 2024;3 months ago (2024-01-25)
October 25, 2023;6 months ago (2023-10-25)
Repository
Written in C++
Operating system Linux RedHat 7.x and derivatives, MacOS
Platform Bare Metal, Virtual Machine, Docker, Kubernetes and various container management platforms
Available in English
Type RDBMS
License Apache 2.0
Website www.yugabyte.com   OOjs UI icon edit-ltr-progressive.svg
Yugabyte, Inc.
Company type Private
Industry Software
Founded2016;8 years ago (2016)
FounderKannan Muthukkaruppan, Karthik Ranganathan, Mikhail Bautin
Headquarters Sunnyvale, California, USA
Key people
Kannan Muthukkaruppan
(Co-Founder & President,
Product Development)

Karthik Ranganathan
(Co-Founder & CTO)
Mikhail Bautin
(Co-Founder &
Software Architect)

Bill Cook
(CEO)
ServicesCommercial database management systems
Website yugabyte.com

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte. [1]

Contents

History

Yugabyte was founded by ex-Facebook engineers Kannan Muthukkaruppan, Karthik Ranganathan, and Mikhail Bautin. At Facebook, they were part of the team that built and operated Cassandra and HBase [2] [3] for workloads such as Facebook Messenger and Facebook's Operational Data Store. [4]

The founders came together in February 2016 to build YugabyteDB. [5] [6]

YugabyteDB was initially available in two editions: community and enterprise. In July 2019, Yugabyte open sourced previously commercial features and launched YugabyteDB as open-source under the Apache 2.0 license.

The rapid evolution of the product led to being named as a 2020 Gartner Cool Vendor in Data Management. [7]

Yugabyte launched Yugabyte Cloud, [8] now renamed YugabyteDB Managed, a fully managed database-as-a-service offering of YugabyteDB, in September 2021. [9]

YugabyteDB was named in the 2023 Gartner Magic Quadrant™ for Cloud Database Management Systems. [10]

Funding

Six years after the company's inception, Yugabyte closed a $188 Million Series C funding round to become a Unicorn start-up with a valuation of $1.3Bn [11]

Funding Rounds
SeriesDate AnnouncedAmountInvestors
A10 Feb 2016$8M Lightspeed Venture Partners, Jeff Rothschild [12] [13]
A12 Jun 2018$16MLightspeed Venture Partners, Dell Technology Capital [14] [15]
B09 Jun 2020$30M Wipro Ventures, Lightspeed Venture Partners. Dell Technology Capital. 8VC [16] [17]
B03 Mar 2021$48MWipro Ventures. Lightspeed Venture Partners. Greenspring Associates, Dell Technology Capital, 8VC [18] [19]
C28 Oct 2021$188MWells Fargo Strategic Capital, Sapphire Ventures, Meritech Capital Partners, Lightspeed Venture Partners, Dell Technology Capital, 8VC [20] [21] [22]

Architecture

YugabyteDB is a distributed SQL database that aims to be strongly transactionally consistent across failure zones (i.e. ACID compliance]. [23] [24] Jepsen testing, the de facto industry standard for verifying correctness, has never fully passed, mainly due to race conditions during schema changes. [25] In CAP Theorem terms YugabyteDB is a Consistent/Partition Tolerant (CP) database. [26] [27] [28]

YugabyteDB has two layers, [29] a storage engine known as DocDB and the Yugabyte Query Layer. [30]

YugabyteDB Architecture YugabyteDBArchitecture.png
YugabyteDB Architecture

DocDB

The storage engine consists of a customized RocksDB [30] [31] combined with sharding and load balancing algorithms for the data. In addition, the Raft consensus algorithm controls the replication of data between the nodes. [30] [31] There is also a Distributed transaction manager [30] [31] and Multiversion concurrency control (MVCC) [30] [31] to support distributed transactions. [31]

The engine also exploits a Hybrid Logical Clock [32] [30] that combines coarsely-synchronized physical clocks with Lamport clocks to track causal relationships. [33]

The DocDB layer is not directly accessible by users. [30]

YugabyteDB Query Layer

Yugabyte has a pluggable query layer that abstracts the query layer from the storage layer below. [34] There are currently two APIs that can access the database: [31]

YSQL [35] is a PostgreSQL code-compatible API [36] [37] based around v11.2. YSQL is accessed via standard PostgreSQL drivers using native protocols. [38] It exploits the native PostgreSQL code for the query layer [39] and replaces the storage engine with calls to the pluggable query layer. This re-use means that Yugabyte supports many features, including:

YCQL [40] is a Cassandra-like API based around v3.10 and re-written in C++. YCQL is accessed via standard Cassandra drivers [41] using the native protocol port of 9042. In addition to the 'vanilla' Cassandra components, YCQL is augmented with the following features:

Currently, data written to either API is not accessible via the other API, however YSQL can access YCQL using the PostgreSQL foreign data wrapper feature. [45]

The security model for accessing the system is inherited from the API, so access controls for YSQL look like PostgreSQL, [46] and YCQL looks like Cassandra access controls. [47]

Cluster-to-cluster replication

In addition to its core functionality of distributing a single database, YugabyteDB has the ability to replicate between database instances. [48] [49] The replication can be one-way or bi-directional and is asynchronous. One-way replication is used either to create a read-only copy for workload off-loading or in a read-write mode to create an active-passive standby. Bi-directional replication is generally used in read-write configurations and is used for active-active configurations, geo-distributed applications, etc.

Migration tooling

Yugabyte also provides YugabyteDB Voyager, tooling to facilitate the migration of Oracle and other similar databases to YugabyteDB. [50] [51] This tool supports the migration of schemas, procedural code and data from the source platform to YugabyteDB.

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

Oracle Database is a proprietary multi-model database management system produced and marketed by Oracle Corporation.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

<span class="mw-page-title-main">Navicat</span> SQL database management software

Navicat is a series of graphical database management and development software produced by CyberTech Ltd. for MySQL, MariaDB, Redis, MongoDB, Oracle, SQLite, PostgreSQL and Microsoft SQL Server. It has an Explorer-like graphical user interface and supports multiple database connections for local and remote databases. Its design is made to meet the needs of a variety of audiences, from database administrators and programmers to various businesses/companies that serve clients and share information with partners.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.

<span class="mw-page-title-main">Postgres-XL</span>

Postgres-XL is a distributed relational database management system (RDBMS) software based on PostgreSQL. It aims to provide feature parity with PostgreSQL while distributing the workload over a cluster. The name "Postgres-XL" stands for "eXtensible Lattice".

Amazon DocumentDB is a managed proprietary NoSQL database service that supports document data structures, with some compatibility with MongoDB version 3.6 and version 4.0. As a document database, Amazon DocumentDB can store, query, and index JSON data. It is available on Amazon Web Services. As of March 2023, AWS introduced some compliance with MongoDB 5.0 but lacks time series collection support.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

References

  1. "YugabyteDB System Properties". DB-Engines. Retrieved 30 December 2021.
  2. "Karthik Ranganathan". Dataversity. Retrieved 30 December 2021.
  3. Borthakur, Dhruba; Rash, Samuel; Schmidt, Rodrigo; Aiyer, Amitanand; Gray, Jonathan; Sarma, Joydeep Sen; Muthukkaruppan, Kannan; Spiegelberg, Nicolas; Kuang, Hairong; Ranganathan, Karthik; Molkov, Dmytro; Menon, Aravind (2011). "Apache hadoop goes realtime at Facebook". Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. p. 1071. doi:10.1145/1989323.1989438. ISBN   9781450306614. S2CID   207188340 . Retrieved 15 January 2022.{{cite book}}: |website= ignored (help)
  4. "YugaByte Raises $8M in Series A Funding". FINSMES. 2 November 2017. Retrieved 30 December 2021.
  5. "Yugabyte CTO outlines a PostgreSQL path to distributed cloud". VentureBeat. 26 July 2021. Retrieved 31 December 2021.
  6. "Yugabyte expands its fully managed enterprise cloud service with $188M". VentureBeat. 28 October 2021. Retrieved 30 December 2021.
  7. "Yugabyte Named a 2020 Gartner Cool Vendor in Data Management". BusinessWire. 2 November 2020. Retrieved 30 December 2021.
  8. "Yugabyte Cloud: a Managed Distributed SQL Database". InfoQ. Retrieved 31 December 2021.
  9. "Yugabyte Delivers Effortless Distributed SQL With Cloud Database-as-a-Service". BusinessWire. 21 September 2021. Retrieved 30 December 2021.
  10. "YugabyteDB Named in the 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems". 22 December 2023. Retrieved 22 December 2023.
  11. "Another cloud native SQL database unicorn: Yugabyte raises $188M Series C funding at $1.3B valuation". ZDNet. Retrieved 12 January 2022.
  12. "YugaByte Raises $8M in Series A Funding". Finsmes. 2 November 2017.
  13. "YugaByte Receives $8M Series A Round". VC News Daily. Retrieved 12 January 2022.
  14. "YugaByte raises $16 Million to combine SQL and NoSQL in a single database". Technologies.org. Retrieved 12 January 2022.
  15. "YugaByte's new database software rakes in $16 million so developers can move to any cloud". TechCrunch. 12 June 2018. Retrieved 12 January 2022.
  16. "Another globally distributed cloud native SQL database on the rise: Yugabyte Raises $30 million in Series B Funding". ZDNet. Retrieved 12 January 2022.
  17. "Yugabyte raises $30M for its cloud-native distributed SQL database". SiliconAngle. 9 June 2020. Retrieved 12 January 2022.
  18. "Yugabyte raises $48M for open source SQL database alternative". VentureBeat. 3 March 2021. Retrieved 12 January 2022.
  19. "Yugabyte Raises $48 Million Funding Round to Accelerate Distributed SQL Enterprise Adoption and Fuel Global Expansion". YahoonFinance. Retrieved 12 January 2022.
  20. "Yugabyte's latest funding round values the distributed SQL system at $1.3bn". The Register. Retrieved 12 January 2022.
  21. "Another cloud native SQL database unicorn: Yugabyte raises $188M Series C funding at $1.3B valuation". ZDNet. Retrieved 12 January 2022.
  22. "High-performance database startup Yugabyte raises $188M in new funding round". Silicon Angle. 28 October 2021. Retrieved 12 January 2022.
  23. "ACID Transactions". Devopedia. 18 August 2019. Retrieved 12 January 2022.
  24. "ICT Solutions for local flexibility markets" (PDF). Academia de Studii Economice din Bucuresti. Proceedings of the IE 2020 International Conference. Retrieved 15 January 2022.
  25. "YugaByte DB 1.3.1". Jepsen.io. Retrieved 30 December 2021.
  26. "YugaByteDB: A Distributed Cloud Native Database for a Highly Scalable Data Store". Open Source Foru. 14 September 2020. Retrieved 15 January 2022.
  27. "Yugabyte Design Goals". Yugabyte.com. Retrieved 15 January 2022.
  28. Galić, Zdravko; Vuzem, Mario (2020). "A Generic and Extensible Core and Prototype of Consistent, Distributed, and Resilient LIS". ISPRS International Journal of Geo-Information. 9 (7): 437. Bibcode:2020IJGI....9..437G. doi: 10.3390/ijgi9070437 .
  29. "Yugabyte Layered Architecture". Yugabyte. Retrieved 15 January 2022.
  30. 1 2 3 4 5 6 7 Hirsch, Orhan Henrik. "Scalability of NewSQL Databases in a Cloud Environment" (PDF). Norwegian University of Science and Technology. NYNU Open. Retrieved 15 January 2022.
  31. 1 2 3 4 5 6 Budholia, Akash. "NewSQL Monitoring System". San Jose State University Scholar Works. Retrieved 15 January 2022.
  32. "Hybrid Clock". Martin Fowler. Retrieved 30 December 2021.
  33. "Distributed Transactions without Atomic Clocks" (PDF). Yugabyte. Retrieved 15 January 2022.
  34. "Yugabyte DB 2.0 Ships Production-Ready Distributed SQL Database for Going Cloud Native". Integration Developer News. Retrieved 15 January 2022.
  35. "Yugabyte Structured Query Language (YSQL)". Yugabyte. Retrieved 15 January 2022.
  36. "Yugabyte Meets Developer Demand for Comprehensive PostgreSQL Compatibility with YugabyteDB 2.11". BusinessWire. 23 November 2021. Retrieved 15 January 2022.
  37. 1 2 3 4 "PostgreSQL Compatibility in YugabyteDB 2.0". Yugabyte. 17 September 2019.
  38. "Client Drivers for YSQL". Yugabyte.
  39. "Why We Built YugabyteDB by Reusing the PostgreSQL Query Layer". Yugabyte. 24 April 2020. Retrieved 15 January 2022.
  40. "Yugabyte Cloud Query Language (YCQL)". Yugabyte. Retrieved 15 January 2022.
  41. "Client drivers for YCQL". Yugabyte.
  42. "ACID Transactions". Yugabyte.
  43. "YCQL JSONB Data Type". Yugabyte. Retrieved 15 January 2022.
  44. "YCQL Secondary Indexes". Yugabyte. Retrieved 15 January 2022.
  45. "YugabyteDB: Postgres foreign data wrapper". Gruchalski. 8 November 2021. Retrieved 15 January 2022.
  46. "YSQL Access Control". Yugabyte. Retrieved 15 January 2022.
  47. "YCWL access Controls". Yugabyte. Retrieved 15 January 2022.
  48. "Yugabyte Expands Multi-Region Database Capabilities and Enterprise-Grade Security with YugabyteDB 2.5". Business Wire. 12 November 2020. Retrieved 15 January 2022.
  49. "xCLuster Replication". Yugabyte. Retrieved 15 January 2022.
  50. "Yugabyte simplifies SQL database migration with YugabyteDB Voyager". siliconANGLE. 24 January 2023. Retrieved 15 March 2023.
  51. "Yugabyte chomps into cloud migration". Techzine. 2 February 2023. Retrieved 15 March 2023.