YCSB

Last updated

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

The original benchmark was developed by workers in the research division of Yahoo! who released it in 2010 with the stated goal of "facilitating performance comparisons of the new generation of cloud data serving systems", particularly for transaction-processing workloads which differed from ones measured by benchmarks designed for more traditional database management systems. [1]

YCSB was contrasted with the TPC-H benchmark from the Transaction Processing Performance Council, with YCSB being called a big data benchmark while TPC-H is a decision support system benchmark. [2]

YCSB was used by DBMS vendors for "benchmark marketing". [3] It has been used in scholarly or tutorial discussions, particularly for Apache HBase. [4] [5] It has been used for multiple-product comparisons by industry observers such as Network World (comparing Cassandra, MongoDB, and Riak), [6] Thumbtack Technologies (comparing Aerospike, Cassandra, Couchbase, and MongoDB), [7] and the Polytechnic Institute and University of Coimbra (comparing Cassandra, HBase, Elasticsearch, MongoDB, Oracle NoSQL, OrientDB, Redis, Scalaris, Tarantool, and Voldemort). [8] SanDisk Corporation published results measured on the Oracle NoSQL Database. [9]

Implementations

Related Research Articles

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Exasol</span> Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

Structured storage is computer storage for structured data, often in the form of a distributed database. Computer software formally known as structured storage systems include Apache Cassandra, Google's Bigtable and Apache HBase.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

In computing, CloudTran, a transaction management product, enables applications running in distributed computing and cloud computing architectures to embed logical business transactions that adhere to the properties of ACID transactions. Specifically, CloudTran coordinates ACID transactionality for data stored within in-memory data grids, as well as from the data grid to persistent storage systems.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Aerospike is the company behind the Aerospike open source NoSQL distributed database management system. Citrusleaf, a Mountain View, California based company which rebranded to Aerospike in August 2012, announced the product in 2011. The software is used by developers to deploy real-time big data applications.

Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP.

Lightning Memory-Mapped Database (LMDB) is a software library that provides an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and dbm.

<span class="mw-page-title-main">RocksDB</span>

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java; alongside many third-party language bindings. RocksDB is open-source software, and was originally released under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license, possibly in response to the Apache Software Foundation's blacklist of the previous BSD+Patents license clause.

In theoretical computer science, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).

<span class="mw-page-title-main">ScyllaDB</span> Open-source distributed NoSQL wide-column data store

ScyllaDB is an open-source distributed NoSQL wide-column data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It supports the same protocols as Cassandra and the same file formats (SSTable), but is a completely rewritten implementation, using the C++20 language replacing Cassandra's Java, and the Seastar asynchronous programming library replacing classic Linux programming techniques such as threads, shared memory and mapped files. In addition to implementing Cassandra's protocols, ScyllaDB also implements the Amazon DynamoDB API.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

HammerDB is an open source database benchmarking application developed by Steve Shaw. HammerDB supports databases such as Oracle, SQL Server, Db2, MySQL and MariaDB. HammerDB is written in TCL and C, and is licensed under the GPL v3.

References

  1. Cooper, Brian F; et al. "Benchmarking cloud serving systems with YCSB" (PDF). Yahoo Research.
  2. Melyssa Barata, Jorge Bernadino and Pedro Furtado; et al. (June 27, 2014). "YCSB and TPC-H: Big Data and Decision Support Benchmarks". 2014 IEEE International Congress on Big Data. 2014 International Congress on Big Data. IEEE. pp. 800–801. doi:10.1109/BigData.Congress.2014.128. ISBN   978-1-4799-5057-7. S2CID   10756715.
  3. Monash, Curt. "YCSB benchmark notes". Monash Research.
  4. Dey, Akon; Nambiar, Raghunath; Fekete, Alan; Röhm, Uwe. "YCSB+T: Benchmarking web-scale transactional databases" (PDF). IEEE.
  5. Jiang, Lifeng (2012). HBase Administration Cookbook. Packt Publishing.
  6. Bushik, Sergey (2012-10-22). "A vendor-independent comparison of NoSQL databases". Network World.
  7. Abel, Avram. "NoSQL Benchmark Compares Aerospike, Cassandra, Couchbase and MongoDB". InfoQ.
  8. Abramova, Veronika; Bernardino, Jorge; Furtado, Pedro. "Experimental Evaluation of NoSQL Databases" (PDF). International Journal of Database Management Systems.
  9. "Oracle NoSQL Database Cluster YCSB Testing with Fusion ioMemory Storage" (PDF). June 15, 2016. Retrieved September 20, 2016.