YCSB

Last updated

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

The original benchmark was developed by workers in the research division of Yahoo! who released it in 2010 with the stated goal of "facilitating performance comparisons of the new generation of cloud data serving systems", particularly for transaction-processing workloads which differed from ones measured by benchmarks designed for more traditional database management systems. [1]

YCSB was contrasted with the TPC-H benchmark from the Transaction Processing Performance Council, with YCSB being called a big data benchmark while TPC-H is a decision support system benchmark. [2]

YCSB was used by DBMS vendors for "benchmark marketing". [3] It has been used in scholarly or tutorial discussions, particularly for Apache HBase. [4] [5] It has been used for multiple-product comparisons by industry observers such as Network World (comparing Cassandra, MongoDB, and Riak), [6] Thumbtack Technologies (comparing Aerospike, Cassandra, Couchbase, and MongoDB), [7] and the Polytechnic Institute and University of Coimbra (comparing Cassandra, HBase, Elasticsearch, MongoDB, Oracle NoSQL, OrientDB, Redis, Scalaris, Tarantool, and Voldemort). [8] SanDisk Corporation published results measured on the Oracle NoSQL Database. [9]

Implementations

Related Research Articles

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Exasol</span> Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

Structured storage is computer storage for structured data, often in the form of a distributed database. Computer software formally known as structured storage systems include Apache Cassandra, Google's Bigtable and Apache HBase.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

In computing, CloudTran, a transaction management product, enables applications running in distributed computing and cloud computing architectures to embed logical business transactions that adhere to the properties of ACID transactions. Specifically, CloudTran coordinates ACID transactionality for data stored within in-memory data grids, as well as from the data grid to persistent storage systems.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Aerospike is the company behind the Aerospike open source NoSQL distributed database management system. Citrusleaf, a Mountain View, California based company which rebranded to Aerospike in August 2012, announced the product in 2011. The software is used by developers to deploy real-time big data applications.

Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it, as remote service through TCP/IP.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">PACELC theorem</span> Theorem in theoretical computer science

In theoretical computer science, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).

<span class="mw-page-title-main">ScyllaDB</span> Open-source distributed NoSQL wide-column data store

ScyllaDB is an open-source distributed NoSQL wide-column data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It supports the same protocols as Cassandra and the same file formats (SSTable), but is a completely rewritten implementation, using the C++20 language replacing Cassandra's Java, and the Seastar asynchronous programming library replacing classic Linux programming techniques such as threads, shared memory and mapped files. In addition to implementing Cassandra's protocols, ScyllaDB also implements the Amazon DynamoDB API.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

HammerDB is an open source database benchmarking application developed by Steve Shaw. HammerDB supports databases such as Oracle, SQL Server, Db2, MySQL and MariaDB. HammerDB is written in TCL and C, and is licensed under the GPL v3.

References

  1. Cooper, Brian F; et al. "Benchmarking cloud serving systems with YCSB" (PDF). Yahoo Research.
  2. Melyssa Barata, Jorge Bernadino and Pedro Furtado; et al. (June 27, 2014). "YCSB and TPC-H: Big Data and Decision Support Benchmarks". 2014 IEEE International Congress on Big Data. IEEE. pp. 800–801. doi:10.1109/BigData.Congress.2014.128. ISBN   978-1-4799-5057-7. S2CID   10756715.{{cite book}}: |journal= ignored (help)
  3. Monash, Curt. "YCSB benchmark notes". Monash Research.
  4. Dey, Akon; Nambiar, Raghunath; Fekete, Alan; Röhm, Uwe. "YCSB+T: Benchmarking web-scale transactional databases" (PDF). IEEE.
  5. Jiang, Lifeng (2012). HBase Administration Cookbook. Packt Publishing.
  6. Bushik, Sergey (2012-10-22). "A vendor-independent comparison of NoSQL databases". Network World.
  7. Abel, Avram. "NoSQL Benchmark Compares Aerospike, Cassandra, Couchbase and MongoDB". InfoQ.
  8. Abramova, Veronika; Bernardino, Jorge; Furtado, Pedro. "Experimental Evaluation of NoSQL Databases" (PDF). International Journal of Database Management Systems.
  9. "Oracle NoSQL Database Cluster YCSB Testing with Fusion ioMemory Storage" (PDF). June 15, 2016. Retrieved September 20, 2016.