ScyllaDB

Last updated
ScyllaDB
Developer(s) ScyllaDB Inc.
Initial releaseSeptember 22, 2015;9 years ago (2015-09-22)
Stable release
ScyllaDB Open Source 6.2.0 / October 28, 2024;7 days ago (2024-10-28)
Repository
Written in C++
Operating system Linux
Type distributed data store
License GNU AGPL
Website www.scylladb.com

ScyllaDB is an open-source distributed NoSQL wide-column data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It supports the same protocols as Cassandra (CQL and Thrift) and the same file formats (SSTable), but is a completely rewritten implementation, using the C++20 language replacing Cassandra's Java, and the Seastar [1] asynchronous programming library replacing classic Linux programming techniques such as threads, shared memory and mapped files. In addition to implementing Cassandra's protocols, ScyllaDB also implements the Amazon DynamoDB API. [2]

Contents

ScyllaDB uses a sharded design on each node, meaning that each CPU core handles a different subset of data. Cores do not share data, but rather communicate explicitly when they need to. The ScyllaDB authors claim that this design allows ScyllaDB to achieve much better performance on modern NUMA SMP machines, and to scale very well with the number of cores. They have measured as much as 2 million requests per second on a single machine, [3] and also claim that a ScyllaDB cluster can serve as many requests as a Cassandra cluster 10 times its size – and do so with lower latencies. [4] Independent testing has not always been able to confirm such 10-fold throughput improvements, and sometimes measured smaller speedups, such as 2x. [5] A 2017 benchmark from Samsung observed the 10x speedup on high-end machines – the Samsung benchmark reported that ScyllaDB outperformed Cassandra on a cluster of 24-core machines by a margin of 10–37x depending on the YCSB workload. [6]

ScyllaDB is available on-premises, on major public cloud providers, or as a DBaaS (ScyllaDB Cloud).

History

ScyllaDB was started in December 2014 by the startup Cloudius Systems (later renamed ScyllaDB Inc.), previously known for having created OSv. Co-founders were Avi Kivity and Dor Laor. ScyllaDB was released as open source in September 2015, [7] under the AGPL license. Employees of ScyllaDB Inc. remain the primary coders behind Scylla, but its development is open to the public and uses public GitHub repositories and public mailing lists.

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node in a computer cluster. The intent is to eliminate contention among nodes. Nodes do not share the same memory or storage.

MySQL Cluster, also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers. Cassandra supports clusters and spanning of multiple data centers, featuring asynchronous and masterless replication. It enables low-latency operations for all clients and incorporates Amazon's Dynamo distributed storage and replication techniques, combined with Google's Bigtable data storage engine model.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

<span class="mw-page-title-main">Node.js</span> JavaScript runtime environment

Node.js is a cross-platform, open-source JavaScript runtime environment that can run on Windows, Linux, Unix, macOS, and more. Node.js runs on the V8 JavaScript engine, and executes JavaScript code outside a web browser.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a fully managed proprietary NoSQL database offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB offers a fast persistent key–value datastore with built-in support for replication, autoscaling, encryption at rest, and on-demand backup among other features.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

<span class="mw-page-title-main">Cosmos DB</span> Cloud-based NoSQL database service

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">ClickHouse</span> Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

<span class="mw-page-title-main">Avi Kivity</span> Software developer and entrepreneur

Avi Kivity is a software engineer who created the Kernel-based Virtual Machine (KVM) hypervisor underlying many production clouds. Following his work on KVM, Kivity developed the Seastar framework and the ScyllaDB database. He co-founded the company ScyllaDB with Dor Laor; Kivity is CTO and an active project contributor.

References

  1. Seastar is an advanced, open-source C++ framework for high-performance server applications on modern hardware.
  2. ScyllaDB Secures $25 Million to Open Source Amazon DynamoDB-compatible API
  3. ScyllaDB: Cassandra compatibility at 1.8 million requests per node by Don Marti (then a ScyllaDB Inc. employee), presented at the Fourteenth Annual Southern California Linux Expo, January 24, 2016.
  4. YCSB cluster benchmark, on the ScyllaDB Inc. website, read February 19, 2017.
  5. ScyllaDB vs Cassandra: towards a new myth?, by Marc Alonso and Thomas Mouron on the octo.com website, December 15, 2015.
  6. Rezaei, Arash; Guz, Zvika; Balakrishnan, Vijay (February 2017), ScyllaDB and Samsung NVMe SSDs Accelerate NoSQL Database Performance (PDF), Samsung Semiconductor Inc., p. 12, retrieved 2019-02-07
  7. "Cassandra Rewritten In C++, Ten Times Faster", September 22, 2015, Slashdot