YDB (database)

Last updated
YDB
Developer(s) Yandex
Initial releaseApril 19, 2022;20 months ago (2022-04-19)
Stable release
v23.1.26 / May 16, 2023;7 months ago (2023-05-16) [1]
Repository github.com/ydb-platform/ydb/
Written in C++
Operating system Linux, macOS
License Apache License 2.0
Website ydb.tech

YDB (Yet another DataBase) is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

Contents

Functionality

YDB is a technology that allows creating large web services capable of supporting large operational loads of up to millions requests per second. It uses a strongly typed dialect of SQL [2] — YDB Query Language (YQL) [3] as a default query language and supports ACID transactions. [4]

The closest analogues of this DBMS available as open-source software are YugabyteDB and CockroachDB.

YDB can be either self-deployed to computer clusters across physical hosts or on virtual machines via Kubernetes or as a managed service in Yandex Cloud. Serverless computing mode or dedicated mode are available for the managed service option.

Architecture

YDB works on clusters with shared-nothing architecture and uses standard commodity hardware. The system is based on tablets which implement a communication protocol for solving consensus in a network of unreliable processors. Functionally, this protocol is similar to Paxos and Raft.

User tablets in YDB have a mandatory primary key and are sharded by its ranges. Shards with user data are controlled by tablets, called DataShards. The size of a DataShard can reach several gigabytes. It can automatically split into multiple tablets when data storage threshold or shard load threshold is exceeded. This is how the system scales transparently based on the user load.

In addition to DataShard, other tablet types include, among others:

Data from tablets is stored in the Distributed Storage layer which is a key-value storage with a specialized protocol to support the tablet protocol. Distributed Storage ensures data replication, while data from tablets is stored as BLOBs.

YDB executes distributed transactions between data from one or more tables using a distributed transaction framework based on the Calvin [5] algorithm. Unlike Calvin, YDB supports interactive and non-deterministic transactions by using record locking.

YDB is based on the actor model. Actors are single-threaded back-end automats that exchange messages with each other while residing on different cluster servers. Messages within the network are exchanged using the interconnect library developed as part of the project.

A number of digital services, such as virtual block devices or persistent queues, have been developed as a layer over YDB.

YDB supports user interaction via the gRPC protocol with several client SDKs implementing procedures for node discovery, client balancing, etc. [4]

YDB does not support UUID as standalone data type. It doesn't have a built-in function to automatically increment field value when adding data to a table. [6]

History

In 2010, Yandex started working on its own NoSQL DBMS KiWi [3] and rolled it out for internal use in 2011. However, KiWi had eventual consistency, as well as other disadvantages of the NoSQL model. [4]

In 2012, to cover its needs for DBMS, Yandex starts the KiKiMR project, which later becomes known as YDB. [3]

In 2016, YDB was rolled out to Yandex services.

In 2018, the Yandex Cloud platform was launched with data storage based on YDB. [7] At the same time, the company announced that in the future it would make YDB available as a managed service in Yandex Cloud, and later provided customers with access to this service, as well as other managed services, such as PostgreSQL, MongoDB and others. [8] This cloud version was called Yandex Database (Managed service for YDB, later).

In April 2022, the YDB DBMS was published on GitHub as free software under the Apache 2.0 License.

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB utilizes JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and current versions are licensed under the Server Side Public License (SSPL). MongoDB is a member of the MACH Alliance.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

eXtremeDB is a high-performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It works on Windows, Linux, and other real-time and embedded operating systems.

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

SequoiaDB is a multi-model NewSQL database.

The following outline is provided as an overview of and topical guide to MySQL:

<span class="mw-page-title-main">ArangoDB</span> Multi-model database

ArangoDB is a graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models with one database core and a unified query language AQL. AQL is mainly a declarative language and allows the combination of different data access patterns in a single query.

<span class="mw-page-title-main">ClickHouse</span> Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

RavenDB is an open-source fully ACID document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.

References

  1. "Releasev23.1.26". Github. Retrieved 16 May 2023.
  2. "Как писать меньше кода для MR, или Зачем миру ещё один язык запросов? История Yandex Query Language". Хабр (in Russian). Retrieved 2022-07-01.
  3. 1 2 3 "YDB Is Now Available as Open-Source Project". medium.com. Retrieved 2022-07-01.
  4. 1 2 3 "Бессерверная альтернатива традиционным базам данных". osp.ru (in Russian). Retrieved 2022-07-01.
  5. "Calvin: Fast Distributed Transactions for Partitioned Database Systems" (PDF). cs.yale.edu. Retrieved 2022-07-04.
  6. "Автоинкремент в Yandex Database". medium.com (in Russian). Retrieved 2022-07-04.
  7. "001. Яндекс Облако: обзор платформы – Ян Лещинский". Youtube (in Russian). Retrieved 2022-07-04.
  8. "about:cloud". Youtube (in Russian). Retrieved 2022-07-04.