Gizzard (Scala framework)

Last updated
Gizzard
Original author(s) Robey Pointer, Nick Kallen, Ed Ceaser, Matt Freels, John Kalucki from Twitter
Developer(s) Twitter
Initial releaseApril 2010 (April 2010)
Final release
3.0.2 / March 9, 2012;11 years ago (2012-03-09) [1]
Repository
Written in Scala, Java
Type Database
License Apache License 2.0
Website github.com/twitter/gizzard

Gizzard was an open source sharding framework to create custom fault-tolerant, distributed databases. It was initially used by Twitter and emerged from a wide variety of data storage problems. Gizzard operated as a middleware networking service that ran on the Java Virtual Machine. It managed partitioning data across arbitrary backend datastores, which allowed it to be accessed efficiently. [2] [3] The partitioning rules were stored in a forwarding table that maps key ranges to partitions. Each partition managed its own replication through a declarative replication tree. Gizzard handled both physical and logical shards. Physical shards point to a physical database backend whereas logical shards are trees of other shards. [4] In addition Gizzard also supported migrations and gracefully handled failures. The system was made eventually consistent by requiring that all write operations are idempotent and commutative. As operations fail they are retried at a later time. Gizzard is available at GitHub and licensed under the Apache License 2.0.

Contents

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. A database management system manages the data accordingly.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Btrieve is a database developed by Pervasive Software. The architecture of Btrieve has been designed with record management in mind. This means that Btrieve only deals with the underlying record creation, data retrieval, record updating and data deletion primitives. Together with the MicroKernel Database Engine it uses ISAM, Indexed Sequential Access Method, as its underlying storage mechanism.

<span class="mw-page-title-main">Partition (database)</span>

A partition is a division of a logical database or its constituent elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons, or for load balancing. It is popular in distributed database management systems, where each partition may be spread over multiple nodes, with users at the node performing local transactions on the partition. This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

A grid file system is a computer file system whose goal is improved reliability and availability by taking advantage of many smaller file storage areas.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

FlockDB was an open-source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships between users, e.g. followings and favorites. FlockDB differs from other graph databases, e.g. Neo4j in that it was not designed for multi-hop graph traversal but rather for rapid set operations, not unlike the primary use-case for Redis sets. FlockDB was posted on GitHub shortly after Twitter released its Gizzard framework, which it used to query the FlockDB distributed datastore. The database is licensed under the Apache License.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a fully managed proprietary NoSQL database offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB offers a fast persistent Key-Value Datastore with built-in support for replication, autoscaling, encryption at rest, and on-demand backup among other features.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope.

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is designed to be MySQL compatible. It is developed and supported primarily by PingCAP and licensed under Apache 2.0, though it is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

YDB is a distributed SQL database management system (DBMS) developed by Yandex, available as open-source technology.

References

  1. "Releases · twitter-archive/gizzard". github.com. Retrieved 2021-04-10.
  2. "English (US)".
  3. "Twitter Open Sources New Distributed Database Solution, Gizzard".
  4. "Gizzard - Twitter just sharded".