Ordered Key-Value Store

Last updated December 03, 2024

An Ordered Key-Value Store (OKVS) is a type of data storage paradigm that can support multi-model database. An OKVS is an ordered mapping of bytes to bytes. An OKVS will keep the key-value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered Key-Value Store found their way into many modern database systems including NewSQL database systems.

History

The origin of Ordered Key-Value Store stems from the work of Ken Thompson on dbm in 1979. Later in 1991, Berkeley DB was released that featured a B-Tree backend that allowed the keys to stay sorted. Berkeley DB was said to be very fast and made its way into various commercial product. It was included in Python standard library until 2.7.^[1] In 2009, Tokyo Cabinet was released that was superseded by Kyoto Cabinet that support both transaction and ordered keys. In 2011, LMDB was created to replace Berkeley DB in OpenLDAP. There is also Google's LevelDB that was forked by Facebook in 2012 as RocksDB. In 2014, WiredTiger, successor of Berkeley DB was acquired by MongoDB and is since 2019 the primary backend of MongoDB database.

Other notable implementation of the OKVS paradigm are Sophia^[2] and SQLite3 LSM extension. Another notable use of OKVS paradigm is the multi-model database system called ArangoDB ^[3] based on RocksDB.

Some NewSQL databases are supported by Ordered Key-Value Stores. JanusGraph, a property graph database, has both a Berkeley DB backend and FoundationDB backend.

Key concepts

Lexicographic encoding

There are algorithms that encode basic data types (boolean, string, number) and composition of those data types inside sorted containers (tuple, list, vector) that preserve their natural ordering. It is possible to work with an Ordered Key-Value Store without having to work directly with bytes. In FoundationDB, it is called the tuple layer.^[4]

Range query

Inside an OKVS, keys are ordered, and because of that it is possible to do range queries. A range query allow to retrieve all keys between two keys such as all keys that are fetched are ordered.

Subspaces

Key composition

One can construct key spaces to build higher level abstractions. The idea is to construct keys, that takes advantage of the ordered nature of the top level key space. When taking advantage of the ordered nature of the key space, one can query ranges of keys that have particular pattern.

Denormalization

Denormalization, as in, repeating the same piece of data in multiple subspace is common practice. It allows to create secondary representation, also called indices, that will allow to speed up queries.

Higher level abstractions

The following abstraction or databases were built on top Ordered Key-Value Stores:

Timeseries database,
Record Database,^[5] also known as Row store databases, they behave similarly to what is dubbed RDBMS,
Tuple Stores, also known as Triple Store or Quad Store but also Generic Tuple Store,^[6]^[7]
Document database,^[8] that mimics MongoDB API,
Full-text search^[9]
Geographic Information Systems^[10]
Property Graph^[11]
Versioned Data^[12]
Vector space database for Approximate Nearest Neighbor^[13]

All those abstraction can co-exist with the same OKVS database and when ACID is supported, the operations happens with the guarantees offered by the transaction system.

Feature matrix

Comparison of several Ordered Key-Value Stores
OKVS	License	Transactions	Distributed	In-memory	Multiple threads	Multiple processes	Nested Transactions
Berkeley DB	AGPL	Yes	No	No	Yes	Yes	Yes
FoundationDB	Apache	Yes	Yes	Yes	Yes	Yes	No
Kyoto Cabinet	GPL	Yes	No	No		No	No
LevelDB	Apache	No	No	No		No	No
LMDB	OpenLDAP	Yes	No	No	Yes	Yes	Yes
RocksDB	Apache	Yes	No	No	Yes	No	No
SQLite LSM Extension	Public domain	Yes	No	Yes	Yes	Yes	Yes
TiKV	Apache	Yes	Yes	No	Yes	Yes	No
WiredTiger	GPL	Yes	No	Yes	Yes	No	No

Related Research Articles

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.

GConf was a system used by the GNOME desktop environment for storing configuration settings for the desktop and applications. It is similar to the Windows Registry.

In a database, a table is a collection of related data organized in table format; consisting of columns and rows.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

The Zope Object Database (ZODB) is an object-oriented database for transparently and persistently storing Python objects. It is included as part of the Zope web application server, but can also be used independently of Zope.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

<span class="mw-page-title-main">Key–value database</span> Data storage paradigm

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Lightning Memory-Mapped Database (LMDB) is an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and DBM.

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

References

↑ "11.11. bsddb — Interface to Berkeley DB library — Python 2.7.17 documentation". docs.python.org. Retrieved 2020-01-16.
↑ "sophia - modern transactional key-value/row storage library". sophia.systems. Retrieved 2020-01-16.
↑ "Comparing new RocksDB and MMFiles storage engines". ArangoDB. Retrieved 2020-01-16.
↑ "Python API — FoundationDB 6.2". apple.github.io. Retrieved 2020-01-19.
↑ A record-oriented store built on FoundationDB., FoundationDB, 2020-01-16, retrieved 2020-01-17
↑ "Generic Tuple Store Database". srfi.schemers.org. Retrieved 2020-01-17.
↑ "Generic Tuple Store". GitHub .
↑ A document data model on FoundationDB, implementing MongoDB® wire protocol: FoundationDB/fdb-document-layer, FoundationDB, 2019-12-09, retrieved 2020-01-17
↑ meilisearch/MeiliSearch, MeiliSearch, 2021-06-19, retrieved 2021-06-19
↑ "6.1. GeoMesa Index Structure — GeoMesa 1.3.1 Manuals". www.geomesa.org. Retrieved 2020-01-19.
↑ "The JanusGraph FoundationDB Storage Adapter - Ted Wilmes, Expero Inc". www.youtube.com. Retrieved 2020-01-17.
↑ "Lightning Talk: Entity Store: A FoundationDB Layer for Versioned... - Stephen Pimentel, - YouTube". www.youtube.com. Retrieved 2020-01-17.
↑ meilisearch/arroy, Meilisearch, 2024-08-05, retrieved 2024-08-06

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "11.11. bsddb — Interface to Berkeley DB library — Python 2.7.17 documentation". docs.python.org. Retrieved 2020-01-16.

[2] "sophia - modern transactional key-value/row storage library". sophia.systems. Retrieved 2020-01-16.

[3] "Comparing new RocksDB and MMFiles storage engines". ArangoDB. Retrieved 2020-01-16.

[4] "Python API — FoundationDB 6.2". apple.github.io. Retrieved 2020-01-19.

[5] A record-oriented store built on FoundationDB., FoundationDB, 2020-01-16, retrieved 2020-01-17

[6] "Generic Tuple Store Database". srfi.schemers.org. Retrieved 2020-01-17.

[7] "Generic Tuple Store". GitHub .

[8] A document data model on FoundationDB, implementing MongoDB® wire protocol: FoundationDB/fdb-document-layer, FoundationDB, 2019-12-09, retrieved 2020-01-17

[9] meilisearch/MeiliSearch, MeiliSearch, 2021-06-19, retrieved 2021-06-19

[10] "6.1. GeoMesa Index Structure — GeoMesa 1.3.1 Manuals". www.geomesa.org. Retrieved 2020-01-19.

[11] "The JanusGraph FoundationDB Storage Adapter - Ted Wilmes, Expero Inc". www.youtube.com. Retrieved 2020-01-17.

[12] "Lightning Talk: Entity Store: A FoundationDB Layer for Versioned... - Stephen Pimentel, - YouTube". www.youtube.com. Retrieved 2020-01-17.

[13] meilisearch/arroy, Meilisearch, 2024-08-05, retrieved 2024-08-06

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

v t e Database models
Common models	Flat Hierarchical Dimensional Network Relational Entity–relationship Enhanced Graph Object-oriented Entity–attribute–value
Other models	Multi-dimensional Array Semantic Star schema XML database
Implementations	Flat file Column-oriented Document-oriented Object–relational Deductive Temporal Valid time Transaction time Decision time XML data store Key–value store Ordered Key-Value Store Triplestore