Original author(s) | Dhruba Borthakur |
---|---|
Developer(s) | Meta Platforms (was Facebook, Inc.) |
Initial release | May 2012 |
Stable release | |
Repository | |
Written in | C++ |
Operating system | Windows, macOS, Linux, FreeBSD, OpenBSD, Solaris, AIX |
Platform | Cross-platform |
Type | Embedded database |
License | Apache 2.0 or GPL 2 |
Website | rocksdb |
RocksDB is a high performance [2] [3] [4] [5] [6] embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree (LSM tree) data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. [7] [8] [9] However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. [10] This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause. [11] [12]
RocksDB is used in production systems at various web-scale enterprises [13] including Facebook, Yahoo!, [14] and LinkedIn. [15]
RocksDB, like LevelDB, stores keys and values in arbitrary byte arrays, and data is sorted byte-wise by key or by providing a custom comparator.
RocksDB provides all of the features of LevelDB, plus:
and others. [26]
RocksDB is not an SQL database (although MyRocks combines RocksDB with MySQL). Like other NoSQL and dbm stores, it has no relational data model, and it does not support SQL queries. Also, it has no direct support for secondary indexes, however a user may build their own internally using Column Families or externally. Applications use RocksDB as a library, as it provides no server or command-line interface.
RocksDB was created at Facebook by Dhruba Borthakur [27] [28] in April 2012, as a fork of LevelDB with the initial stated goal of improving performance for server workloads. [29] [30]
As an embeddable database, RocksDB can be used as a storage engine within a larger database management system (DBMS). For example, Rockset uses RocksDB [31] mostly for analytical data processing.
The following projects have been started to replace or offer alternative storage engines for already-established database systems with RocksDB:
ArangoDB has added RocksDB to its previous storage engine ("mmfiles"). [32] RocksDB is be the default storage engine since ArangoDB 3.4. [33]
Cassandra on RocksDB can improve the performance of Apache Cassandra significantly (3–4 times faster in general, 100 times faster in some use-cases).[ citation needed ] The Instagram team at Facebook developed and open-sourced their code, along with benchmarks of their performance results. [34]
MariaDB can use the MyRocks storage engine (which is forked from RocksDB) since MariaDB 10.2.5 (Alpha status) [35] and stable since MariaDB 10.2.16 in 2018. [36]
The MongoRocks project provides a storage module for MongoDB where the storage engine is RocksDB. [37] [38] [39]
A related program is Rocks Strata, a tool written in Go, which allows managing incremental backups of MongoDB when RocksDB is used as the storage engine. [40]
The MyRocks project created a new RocksDB-based storage engine for MySQL. [41] [42] In-depth details about MyRocks were presented at Percona Live 2016. [43]
Oxigraph [44] is a graph database implementing the SPARQL standard, based on RocksDB
The UKV [45] project allows users to use RocksDB on par with LevelDB as the underlying key-value store. It represents a shared abstraction for create, read, update and delete (CRUD) operations common to every storage engine. It augments it with structured bindings for several high-level languages, including Python, Java, and Go.
The following database systems and applications have chosen to use RocksDB as their embedded storage engine:
The Ceph's BlueStore storage layer uses RocksDB for metadata management in OSD devices. [46]
Apache Flink uses RocksDB to store checkpoints. [47]
FusionDB [48] uses RocksDB as its storage engine for XML, Key/Value, and JSON. [49]
LogDevice's LogsDB is built atop RocksDB. [50]
Kafka Streams uses RocksDB for its state stores. [51]
The Manhattan Distributed Key-Value Store has used RocksDB as its primary engine to store Twitter data since 2018. [52]
The Rockset [53] service that is used for operational data analytics uses RocksDB as its storage engine. [54]
The ssdb-rocks [55] project uses RocksDB as the storage engine for the SSDB [56] NoSQL Database.
The TiDB [57] project uses RocksDB as its storage engine. [58]
The YugabyteDB database uses a modified version of RocksDB as part of its DocDB storage engine. [59]
Third-party programming language bindings available for RocksDB include:
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.
A time series database is a software system that is optimized for storing and serving time series through associated pairs of time(s) and value(s). In some fields, time series may be called profiles, curves, traces or trends. Several early time series databases are associated with industrial applications which could efficiently store measured values from sensory equipment, but now are used in support of a much wider range of applications. In many cases, the repositories of time-series data will utilize compression algorithms to manage the data efficiently. Although it is possible to store time-series data in many different database types, the design of these systems with time as a key index is distinctly different from relational databases which reduce discrete relationships through referential models.
A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB utilizes JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and current versions are licensed under the Server Side Public License (SSPL). MongoDB is a member of the MACH Alliance.
NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.
LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android.
A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.
SequoiaDB is a multi-model NewSQL database.
Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.
MyRocks is open-source software developed at Facebook in order to use MySQL features with RocksDB implementations. It is based on Oracle MySQL 5.6.
DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. It is written in Java and based on the Eclipse platform.
TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.
An Ordered Key-Value Store (OKVS) is a type of data storage paradigm that can support multi-model database. An OKVS is an ordered mapping of bytes to bytes. An OKVS will keep the key-value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered Key-Value Store found their way into many modern database systems including NewSQL database systems.
Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query data lakes that contain open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage using the Hive and Iceberg table formats. Trino also has the ability to run federated queries that query tables in different data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is released under the Apache License.
HammerDB is an open source database benchmarking application developed by Steve Shaw. HammerDB supports databases such as Oracle, SQL Server, Db2, MySQL and MariaDB. HammerDB is written in TCL and C, and is licensed under the GPL v3.
{{cite web}}
: Missing or empty |title=
(help)... The story of why we decided to do RocksDB ...