Ordered Key-Value Store

Last updated

An Ordered Key-Value Store (OKVS) is a type of data storage paradigm that can support multi-model database. An OKVS is an ordered mapping of bytes to bytes. It is a more powerful paradigm than Key-Value Store because OKVS allow to build higher level abstractions without the need to do full scans. An OKVS will keep the key-value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered Key-Value Store found their way into many modern database systems including NewSQL database systems.

Contents

History

The origin of Ordered Key-Value Store stems from the work of Ken Thompson on dbm in 1979. Later in 1991, Berkeley DB was released that featured a B-Tree backend that allowed the keys to stay sorted. Berkeley DB was said to be very fast and made its way into various commercial product. It was included in Python standard library until 2.7. [1] In 2009, Tokyo Cabinet was released that was superseded by Kyoto Cabinet that support both transaction and ordered keys. In 2011, LMDB was created to replace Berkeley DB in OpenLDAP. There is also Google's LevelDB that was forked by Facebook in 2012 as RocksDB. In 2014, WiredTiger, successor of Berkeley DB was acquired by MongoDB and is since 2019 the primary backend of MongoDB database.

Other notable implementation of the OKVS paradigm are Sophia [2] and SQLite3 LSM extension. Another notable use of OKVS paradigm is the multi-model database system called ArangoDB [3] based on RocksDB.

Some NewSQL databases are supported by Ordered Key-Value Stores. JanusGraph, a property graph database, has both a Berkeley DB backend and FoundationDB backend.

Key concepts

Lexicographic encoding

There are algorithms that encode basic data types (boolean, string, number) and composition of those data types inside sorted containers (tuple, list, vector) that preserve their natural ordering. It is possible to work with an Ordered Key-Value Store without having to work directly with bytes. In FoundationDB, it is called the tuple layer. [4]

Range query

Inside an OKVS, keys are ordered, and because of that it is possible to do range queries. A range query allow to retrieve all keys between two keys such as all keys that are fetched are ordered.

Subspaces

Key composition

One can construct key spaces to build higher level abstractions. The idea is to construct keys, that takes advantage of the ordered nature of the top level key space. When taking advantage of the ordered nature of the key space, one can query ranges of keys that have particular pattern.

Denormalization

Denormalization, as in, repeating the same piece of data in multiple subspace is common practice. It allows to create secondary representation, also called indices, that will allow to speed up queries.

Higher level abstractions

The following abstraction or databases were built on top Ordered Key-Value Stores:

All those abstraction can co-exist with the same OKVS database and when ACID is supported, the operations happens with the guarantees offered by the transaction system.

Feature matrix

Comparison of several Ordered Key-Value Stores
OKVSLicenseTransactionsDistributedIn-memoryMultiple threadsMultiple processes
Berkeley DB AGPLYesNoNoyesyes
WiredTiger GPLYesNoYesyesno
LevelDB ApacheNoNoNono
Kyoto Cabinet GPLYesNoNono
RocksDB ApacheYesNoNoyesno
SQLite LSM ExtensionPublic domainYes1NoYesyes2yes2
TiKV ApacheYesYesNoyesyes
FoundationDB ApacheYesYesYesyesyes
  1. SQLite LSM extension's transactions can be nested
  2. SQLite LSM extension support multiple readers, and only a single writer that do not block readers

See also

Related Research Articles

Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open-source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data pairs as byte arrays and supports multiple data items for a single key. Berkeley DB is not a relational database, although it has database features including database transactions, multiversion concurrency control and write-ahead logging. BDB runs on a wide variety of operating systems, including most Unix-like and Windows systems, and real-time operating systems.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

<span class="mw-page-title-main">GConf</span> Software

GConf was a system used by the GNOME desktop environment for storing configuration settings for the desktop and applications. It is similar to the Windows Registry.

A table is a collection of related data held in a table format within a database. It consists of columns and rows.

Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-oriented (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Key–value database</span> Data storage paradigm

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

The following outline is provided as an overview of and topical guide to MySQL:

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

References

  1. "11.11. bsddb — Interface to Berkeley DB library — Python 2.7.17 documentation". docs.python.org. Retrieved 2020-01-16.
  2. "sophia - modern transactional key-value/row storage library". sophia.systems. Retrieved 2020-01-16.
  3. "Comparing new RocksDB and MMFiles storage engines". ArangoDB. Retrieved 2020-01-16.
  4. "Python API — FoundationDB 6.2". apple.github.io. Retrieved 2020-01-19.
  5. A record-oriented store built on FoundationDB., FoundationDB, 2020-01-16, retrieved 2020-01-17
  6. "Generic Tuple Store Database". srfi.schemers.org. Retrieved 2020-01-17.
  7. "Generic Tuple Store". GitHub .
  8. A document data model on FoundationDB, implementing MongoDB® wire protocol: FoundationDB/fdb-document-layer, FoundationDB, 2019-12-09, retrieved 2020-01-17
  9. meilisearch/MeiliSearch, MeiliSearch, 2021-06-19, retrieved 2021-06-19
  10. "6.1. GeoMesa Index Structure — GeoMesa 1.3.1 Manuals". www.geomesa.org. Retrieved 2020-01-19.
  11. "The JanusGraph FoundationDB Storage Adapter - Ted Wilmes, Expero Inc". www.youtube.com. Retrieved 2020-01-17.
  12. "Lightning Talk: Entity Store: A FoundationDB Layer for Versioned... - Stephen Pimentel, - YouTube". www.youtube.com. Retrieved 2020-01-17.