Ordered Key-Value Store

Last updated

An Ordered Key-Value Store (OKVS) is a type of data storage paradigm that can support multi-model database. An OKVS is an ordered mapping of bytes to bytes. It is a more powerful paradigm than Key-Value Store because OKVS allow to build higher level abstractions without the need to do full scans. An OKVS will keep the key-value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered Key-Value Store found their way into many modern database systems including NewSQL database systems.

Contents

History

The origin of Ordered Key-Value Store stems from the work of Ken Thompson on dbm in 1979. Later in 1991, Berkeley DB was released that featured a B-Tree backend that allowed the keys to stay sorted. Berkeley DB was said to be very fast and made its way into various commercial product. [1] It was included in Python standard library until 2.7. [2] In 2009, Tokyo Cabinet was released that was superseded by Kyoto Cabinet that support both transaction and ordered keys. In 2011, LMDB was created to replace Berkeley DB in OpenLDAP. There is also Google's LevelDB that was forked by Facebook in 2012 as RocksDB. In 2014, WiredTiger, successor of Berkeley DB was acquired by MongoDB and is since 2019 the primary backend of MongoDB database.

Other notable implementation of the OKVS paradigm are Sophia [3] and SQLite3 LSM extension. [4] Another notable use of OKVS paradigm is the multi-model database system called ArangoDB [5] based on RocksDB.

Some NewSQL databases are supported by Ordered Key-Value Stores. JanusGraph, a property graph database, has both a Berkeley DB backend and FoundationDB backend.

Key concepts

Lexicographic encoding

There are algorithms that encode basic data types (boolean, string, number) and composition of those data types inside sorted containers (tuple, list, vector) that preserve their natural ordering. It is possible to work with an Ordered Key-Value Store without having to work directly with bytes. In FoundationDB, it is called the tuple layer. [6]

Range query

Inside an OKVS, keys are ordered, and because of that it is possible to do range queries. A range query allow to retrieve all keys between two keys such as all keys that are fetched are ordered.

Subspaces

Key composition

One can construct key spaces to build higher level abstractions. The idea is to construct keys, that takes advantage of the ordered nature of the top level key space. When taking advantage of the ordered nature of the key space, one can query ranges of keys that have particular pattern.

Denormalization

Denormalization, as in, repeating the same piece of data in multiple subspace is common practice. It allows to create secondary representation, also called indices, that will allow to speed up queries.

Higher level abstractions

The following abstraction or databases were built on top Ordered Key-Value Stores:

All those abstraction can co-exist with the same OKVS database and when ACID is supported, the operations happens with the guarantees offered by the transaction system.

Feature matrix

Comparison of several Ordered Key-Value Stores
OKVSLicenseTransactionsDistributedIn-memoryMultiple threadsMultiple processes
Berkeley DB AGPLYesNoNoyesyes
WiredTiger GPLYesNoYesyesno
LevelDB ApacheNoNoNono
Kyoto Cabinet GPLYesNoNono
RocksDB ApacheYesNoNoyesno
SQLite LSM ExtensionPublic domainYes1NoYesyes2yes2
TiKV ApacheYesYesNoyesyes
FoundationDB ApacheYesYesYesyesyes
  1. SQLite LSM extension's transactions can be nested
  2. SQLite LSM extension support multiple readers, and only a single writer that do not block readers

See also

Related Research Articles

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

A table is a collection of related data held in a table format within a database. It consists of columns and rows.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Key–value database</span> Data storage paradigm

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

QAL is an open-source development project that aims to create a collection of libraries for mixing, moving, merging, substituting and transforming data; also in some cases, such as MongoDB, schemas.

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

SurrealDB is an NewSQL open-sourced multi-model database and database-as-a-service platform. The platform is designed to offer a structured yet flexible way of storing and querying data, aiming to simplify the app development process by minimizing the need for building backend APIs and database layers.

References

  1. "BerkeleyDB & other distributed high performance key/value databases - High Scalability -". highscalability.com. Retrieved 2020-01-16.
  2. "11.11. bsddb — Interface to Berkeley DB library — Python 2.7.17 documentation". docs.python.org. Retrieved 2020-01-16.
  3. "sophia - modern transactional key-value/row storage library". sophia.systems. Retrieved 2020-01-16.
  4. "LSM Key/Value Storage in SQLite3".
  5. "Comparing new RocksDB and MMFiles storage engines". ArangoDB. Retrieved 2020-01-16.
  6. "Python API — FoundationDB 6.2". apple.github.io. Retrieved 2020-01-19.
  7. "How FoundationDB Powers Snowflake Metadata Forward | Snowflake Blog". Snowflake. 2018-04-19. Retrieved 2020-01-17.
  8. A record-oriented store built on FoundationDB., FoundationDB, 2020-01-16, retrieved 2020-01-17
  9. "Generic Tuple Store Database". srfi.schemers.org. Retrieved 2020-01-17.
  10. "Generic Tuple Store". GitHub .
  11. A document data model on FoundationDB, implementing MongoDB® wire protocol: FoundationDB/fdb-document-layer, FoundationDB, 2019-12-09, retrieved 2020-01-17
  12. meilisearch/MeiliSearch, MeiliSearch, 2021-06-19, retrieved 2021-06-19
  13. "6.1. GeoMesa Index Structure — GeoMesa 1.3.1 Manuals". www.geomesa.org. Retrieved 2020-01-19.
  14. "The JanusGraph FoundationDB Storage Adapter - Ted Wilmes, Expero Inc". www.youtube.com. Retrieved 2020-01-17.
  15. "Lightning Talk: Entity Store: A FoundationDB Layer for Versioned... - Stephen Pimentel, - YouTube". www.youtube.com. Retrieved 2020-01-17.