TokuMX

Last updated
TokuMX
Developer(s) Tokutek
Stable release
2.0.0 / 30 September 2014;8 years ago (2014-09-30)
Repository
Type Database
License GNU Affero General Public License (version 3) [1]
Website www.tokutek.com

TokuMX is an open-source distribution of MongoDB [2] which, among other functions, replaces the default B-tree data structure found in the basic MongoDB distribution with a fractal tree index. It is a drop-in replacement for MongoDB (applications will run "as is") that offers the scalability and performance improvements associated with fractal tree indexing. It also adds support for document-level locking, transaction support with ACID and MVCC, and replication optimization; it does not support full-text search.

Contents

TokuMX is specifically designed for high performance on write-intensive workloads. It achieves this using a fractal tree index, [3] which replaces 40-year-old B-tree indexing and is based on cache-oblivious algorithms. This approach to building memory-efficient systems was originally jointly developed by researchers at the Massachusetts Institute of Technology, [4] Rutgers University, [5] and the State University of New York at Stony Brook (SUNY). [6] TokuMX is a scalable, ACID, and MVCC compliant distribution of MongoDB that provides indexing-based query improvements, offers online schema modifications, and reduces slave lag for both hard disk drives and flash memory. It also adds transactions with MVCC and ACID reliability to any MongoDB application, making MongoDB suitable for a much wider range of solutions. [7]

Most TokuMX source files are made available under the terms of the GNU Affero General Public License (AGPL). The TokuKV Fractal Tree Indexing library is made available under the terms of the GNU General Public License (GPL) version 2, with an additional grant of a patent license.

B-trees

Most relational databases use indexes to increase query performance. Databases can leverage indexes to significantly reduce the amount of data they examine while responding to queries. Indexes are commonly implemented with B-trees, a data structure first described in 1970. The B-tree data structure allows for operations like inserting data and sorted order iteration, the primary operation used by an index. Depending on the workload and implementation, B-tree performance can be limited by the random I/O characteristics of disks. In addition, while freshly loaded databases tend to have good sequential behavior, this behavior becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.

With the advent of big data and the ever-increasing database needs of the 21st century, many niche databases have been created to get around the limitations of 50-year-old B-tree indexing. These include some optimized for reads, some optimized for writes, and a series of other special-purpose databases designed for a narrow problem set. [8]

Fractal tree indexes

Overview

Fractal tree indexing technology is a new approach to indexing that replaces B-trees.

Fractal tree indexes implement the same operations as a B-tree, and thus are a drop-in replacement for B-trees. Fractal tree indexes effectively replaces small, frequent writes with larger, less frequent ones, enabling better compression and insertion performance. [9] [10] Fractal trees also allow for messages to be injected into the tree in such a fashion that schema changes such as adding or dropping a column, or adding an index can be done online and in the background. [11] As a result, more indexes can be maintained without a drop in performance. This is because adding data to indexes tends to stress the performance of B-trees, but performs well in fractal tree indexes. [12] Fractal tree index modifications do not cause the database files to fragment, so periodic maintenance to compact files is not needed. [13]

Uses

Fractal tree indexes can be applied to a number of applications characterized by near-real time analysis of streaming data. They can be used as the storage layer of a database or as the storage layer of a file system. When used in a database, they can be used in any setting where a B-tree is used, with improved performance. Examples include: network event management, online advertising networks, web 2.0 and clickstream analytics, and air traffic control management. [14] Other uses include accelerated crawler performance for search engines for social media sites. It can also be used to create indexes and columns online, enabling query flexibility for ecommerce personalization. It is also suited to improving performance and reducing existing loads on transactional websites. In general it performs well in applications that must simultaneously store log file data and execute ad hoc queries.

See also

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

<span class="mw-page-title-main">Navicat</span> SQL database management software

Navicat is a series of graphical database management and development software produced by CyberTech Ltd. for MySQL, MariaDB, MongoDB, Oracle, SQLite, PostgreSQL and Microsoft SQL Server. It has an Explorer-like graphical user interface and supports multiple database connections for local and remote databases. Its design is made to meet the needs of a variety of audiences, from database administrators and programmers to various businesses/companies that serve clients and share information with partners.

Archive is a storage engine for the MySQL relational database management system. Users can use this analytic storage engine to create a table that is “archive” only. Data cannot be deleted from this table, only added. The Archive engine uses a compression strategy based on the zlib library and it packs the rows using a bit header to represent nulls and removes all whitespace for character type fields. When completed, the row is inserted into the compression buffer and flushed to disk by an explicit flush table, a read, or the closing of the table.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

<span class="mw-page-title-main">Drizzle (database server)</span>

Drizzle is a discontinued free software/open-source relational database management system (DBMS) that was forked from the now-defunct 6.0 development branch of the MySQL DBMS.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL) which is deemed non-free by several distributions. MongoDB is a member of the MACH Alliance.

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

TokuDB is an open-source, high-performance storage engine for MySQL and MariaDB. It achieves this by using a fractal tree index. It is scalable, ACID and MVCC compliant, provides indexing-based query improvements, offers online schema modifications, and reduces replication lag for both hard disk drives and flash memory.

eXtremeDB is a high performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It works on Windows, Linux, and other real-time and embedded operating systems.

<span class="mw-page-title-main">SingleStore</span>

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system, also productized as BigQuery. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

The following outline is provided as an overview of and topical guide to MySQL:

<span class="mw-page-title-main">Martin Farach-Colton</span> American computer scientist

Martin Farach-Colton is an American computer scientist, known for his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache-oblivious algorithms, and lowest common ancestor data structures. He is a Distinguished Professor of computer science at Rutgers University, and a co-founder of storage technology startup company Tokutek.

<span class="mw-page-title-main">RocksDB</span>

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java; alongside many third-party language bindings. RocksDB is open-source software, and was originally released under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license, possibly in response to the Apache Software Foundation's blacklist of the previous BSD+Patents license clause.

Amazon DocumentDB is a managed proprietary NoSQL database service that supports document data structures, with some compatibility with MongoDB version 3.6 and version 4.0. As a document database, Amazon DocumentDB can store, query, and index JSON data. It is available on Amazon Web Services. As of March 2023, AWS introduced some compliance with MongoDB 5.0, but lacks time series collection support.

Michael A. Bender is an American computer scientist, known for his work in cache-oblivious algorithms, lowest common ancestor data structures, scheduling (computing), and pebble games. He is David R. Smith Leading Scholar professor of computer science at Stony Brook University, and a co-founder of storage technology startup company Tokutek.

References

  1. "TokuMX README" . Retrieved 2014-03-19.
  2. "TokuMX – High-Performance MongoDB Distribution". Tokutek. Retrieved 2014-03-10.
  3. "How TokuDB Fractal Tree Databases Work". O'Reilly. Retrieved 2011-01-17.
  4. "Cache-Oblivious Search Trees Project". Massachusetts Institute of Technology. Retrieved 2011-01-17.
  5. "Cache-Oblivious B-trees" (PDF). Rutgers University. Retrieved 2011-01-17.
  6. "Cache Oblivious B-trees". State University of New York (SUNY) at Stony Brook. Retrieved 2011-01-17.
  7. "TokuMX is MongoDB on steroids". Percona. Retrieved 2014-04-30.
  8. "Cache Oblivious B-trees". State University of New York (SUNY) at Stony Brook. Retrieved 2011-01-17.
  9. "TokuMX VS MongoDB Bake Off Based on a Primary AOL Use case" (PDF). Meetup / AOL. Retrieved 2014-04-30.
  10. "Insert benchmark for InnoDB, MongoDB and TokuMX and flash storage" . Retrieved 2014-04-30.
  11. "Covering Indexes: Orders-of-Magnitude Improvements" (PDF). Percona. Retrieved 2011-01-17.
  12. "Detailed review of Tokutek storage engine". Percona. Retrieved 2012-02-22.
  13. "NoSQL Battle of the East Coast - Benchmarking MongoDB vs TokuMX Cluster". SeveralNines. Retrieved 2014-04-30.
  14. "Air traffic queries in MyISAM and Tokutek (TokuDB)". MySQL Performance Blog. Retrieved 2011-01-17.