TokuDB

Last updated
TokuDB
Developer(s) Percona
Stable release
7.5.5 [1] / January 29, 2015[ citation needed ]
Repository
Type Database engine
License GNU General Public License (version 2) [2]
Website Percona TokuDB

TokuDB is an open-source, high-performance storage engine for MySQL and MariaDB. It achieves this by using a fractal tree index. It is scalable, ACID and MVCC compliant, provides indexing-based query improvements, offers online schema modifications, and reduces replication lag for both hard disk drives and flash memory.

Contents

TokuDB is included in Percona Server, MariaDB and Nagios based opmon. However, it is deprecated in Percona Server 8 and MariaDB 10.5.

Fractal tree indexes

Overview

TokuDB uses a Fractal tree index tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Fractal trees also allow for messages to be injected into the tree in such a fashion that schema changes (such as adding or dropping a column, or adding an index) can be done online and in the background. [3] As a result, more indexes can be maintained without a drop in performance. This is because adding data to indexes tends to stress the performance of B-trees, but performs well in fractal tree indexes. [4]

Uses

Fractal tree indexes can be applied to a number of applications characterized by near-real time analysis of streaming data. They can be used as the storage layer of a database or as the storage layer of a file system. When used in a database, they can be used in any setting where a B-tree is used, with improved performance. Examples include: network event management, online advertising networks, clickstream analytics, and air traffic control management. [5] Other uses include accelerated crawler performance for search engines for social media sites. It can also be used to create indexes and columns online, enabling query flexibility for e-commerce personalization. It is also suited to improving performance and reducing existing loads on transactional websites. In general, it performs well in applications that must simultaneously store log file data and execute ad hoc queries.

Origins

This approach to building memory-efficient systems was originally jointly developed by researchers at the Massachusetts Institute of Technology, [6] [7] Rutgers University, [8] and the Stony Brook University. [9]

Role on the big data market

TokuDB is named as one of the technologies that enable big data in MySQL. [10] Tokutek was a Startup Showcase Finalist at the O'Reilly Strata Conference 2012 on big data. [11]

See also

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

A database engine is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">H2 (DBMS)</span>

H2 is a relational database management system written in Java. It can be embedded in Java applications or run in client-server mode.

Sphinx is a fulltext search engine that provides text search functionality to client applications.

<span class="mw-page-title-main">Drizzle (database server)</span>

Drizzle is a discontinued free software/open-source relational database management system (DBMS) that was forked from the now-defunct 6.0 development branch of the MySQL DBMS.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">OrientDB</span>

OrientDB is an open source NoSQL database management system written in Java. It is a Multi-model database, supporting graph, document, key/value, and object models, but the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security profiling system based on users and roles and supports querying with Gremlin along with SQL extended for graph traversal. OrientDB uses several indexing mechanisms based on B-tree and Extendible hashing, the last one is known as "hash index", there are plans to implement LSM-tree and Fractal tree index based indexes. Each record has Surrogate key which indicates position of record inside of Array list, links between records are stored either as single value of record's position stored inside of referrer or as B-tree of record positions which allows fast traversal of one-to-many relationships and fast addition/removal of new links. OrientDB is the fifth most popular graph database according to the DB-Engines graph database ranking, as of December 2021.

<span class="mw-page-title-main">InfiniDB</span> Database management software company based in Frisco, Texas

InfiniDB was a database management software company based in Frisco, Texas. The company developed InfiniDB, a scalable, software-only columnar database management system for analytic applications.

Actian Zen is an ACID-compliant database management system (DBMS) developed by Pervasive Software. It is optimized for embedding in applications and used in several different types of packaged software applications offered by independent software vendors (ISVs) and original equipment manufacturers (OEMs). It is available for software as a service (SaaS) deployment due to a file-based architecture enabling partitioning of data for multitenancy needs.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

<span class="mw-page-title-main">CrateDB</span>

CrateDB is a distributed SQL database management system that integrates a fully searchable document-oriented data store. It is open-source, written in Java, based on a shared-nothing architecture, and designed for high scalability. CrateDB includes components from Trino, Lucene, Elasticsearch and Netty.

The following outline is provided as an overview of and topical guide to MySQL:

<span class="mw-page-title-main">Martin Farach-Colton</span> American computer scientist

Martin Farach-Colton is an American computer scientist, known for his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache-oblivious algorithms, and lowest common ancestor data structures. He is a Distinguished Professor of computer science at Rutgers University, and a co-founder of storage technology startup company Tokutek.

<span class="mw-page-title-main">RocksDB</span>

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java; alongside many third-party language bindings. RocksDB is open-source software, and was originally released under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license, possibly in response to the Apache Software Foundation's blacklist of the previous BSD+Patents license clause.

MEMORY is a storage engine for MySQL and MariaDB relational database management systems, developed by Oracle and MariaDB. Before the version 4.1 of MySQL it was called Heap.

Michael A. Bender is an American computer scientist, known for his work in cache-oblivious algorithms, lowest common ancestor data structures, scheduling (computing), and pebble games. He is David R. Smith Leading Scholar professor of computer science at Stony Brook University, and a co-founder of storage technology startup company Tokutek.

References

  1. "Release Notes" . Retrieved 2015-10-20.
  2. "Percona Server COPYING" . Retrieved 2015-12-17.
  3. "Covering Indexes: Orders-of-Magnitude Improvements" (PDF). Percona. Retrieved 2011-01-17.
  4. "Detailed review of Tokutek storage engine". Percona. Retrieved 2012-02-22.
  5. "Air traffic queries in MyISAM and Tokutek (TokuDB)". MySQL Performance Blog. Retrieved 2011-01-17.
  6. "How TokuDB Fractal Tree Databases Work". O'Reilly. Retrieved 2011-01-17.
  7. "Cache-Oblivious Search Trees Project". Massachusetts Institute of Technology. Retrieved 2011-01-17.
  8. "Cache-Oblivious B-trees" (PDF). Rutgers University. Retrieved 2011-01-17.
  9. "Cache Oblivious B-trees". State University of New York (SUNY) at Stony Brook. Retrieved 2011-01-17.
  10. "Big Data is Creating The Future - It's A $50 Billion Market". Forbes. Retrieved 2012-05-21.
  11. "Strata 2012 Startup Showcase". O'Reilly. Retrieved 2012-05-21.