Developer(s) | Oracle |
---|---|
Initial release | November 2004 |
Stable release | 8.4.3 / October 16, 2024 [1] |
Operating system | Cross-platform [ which? ] |
Available in | English |
Type | RDBMS |
License | GNU General Public License (version 2, with linking exception) or commercial EULA |
Website | [2] |
MySQL Cluster , also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. [3] MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL ("NDB" stands for Network Database).
MySQL Cluster is designed around a distributed, multi-master ACID compliant architecture with no single point of failure. MySQL Cluster uses automatic sharding (partitioning) to scale out read and write operations on commodity hardware and can be accessed via SQL and Non-SQL (NoSQL) APIs.
Internally MySQL Cluster uses synchronous replication through a two-phase commit mechanism in order to guarantee that data is written to multiple nodes upon committing the data. Two copies (known as replicas) of the data are required to guarantee availability. MySQL Cluster automatically creates “node groups” from the number of replicas and data nodes specified by the user. Updates are synchronously replicated between members of the node group to protect against data loss and support fast failover between nodes. Cluster replication differs from "MySQL Replication", which is instead asynchronous.
It is also possible to replicate asynchronously between clusters; this is sometimes referred to as "MySQL Cluster Replication" or "geographical replication". This is typically used to replicate clusters between data centers for IT disaster recovery or to reduce the effects of network latency by locating data physically closer to a set of users. Unlike standard MySQL replication, MySQL Cluster's geographic replication uses optimistic concurrency control and the concept of Epochs to provide a mechanism for conflict detection and resolution, [4] enabling active/active clustering between data centers.
Starting with MySQL Cluster 7.2, support for synchronous replication between data centers was supported with the Multi-Site Clustering feature. [5]
MySQL Cluster is implemented as a fully distributed multi-master database ensuring updates made by any application or SQL node are instantly available to all of the other nodes accessing the cluster, and each data node can accept write operations.
Data within MySQL Cluster (NDB) tables is automatically partitioned across all of the data nodes in the system. This is done based on a hashing algorithm based on the primary key on the table, and is transparent to the end application. Clients can connect to any node in the cluster and have queries automatically access the correct shards needed to satisfy a query or commit a transaction. MySQL Cluster is able to support cross-shard queries and transactions.
Users can define their own partitioning schemes. This allows developers to add “distribution awareness” to applications by partitioning based on a sub-key that is common to all rows being accessed by high running transactions. This ensures that data used to complete transactions is localized on the same shard, thereby reducing network hops.
MySQL Cluster allows datasets larger than the capacity of a single machine to be stored and accessed across multiple machines.
MySQL Cluster maintains all indexed columns in distributed memory. Non-indexed columns can also be maintained in distributed memory or can be maintained on disk with an in-memory page cache. Storing non-indexed columns on disk allows MySQL Cluster to store datasets larger than the aggregate memory of the clustered machines.
MySQL Cluster writes Redo logs to disk for all data changes as well as check pointing data to disk regularly. This allows the cluster to consistently recover from disk after a full cluster outage. As the Redo logs are written asynchronously with respect to transaction commit, some small number of transactions can be lost if the full cluster fails, however this can be mitigated by using geographic replication or multi-site cluster discussed above. The current default asynchronous write delay is 2 seconds, and is configurable. Normal single point of failure scenarios do not result in any data loss due to the synchronous data replication within the cluster.
When a MySQL Cluster table is maintained in memory, the cluster will only access disk storage to write Redo records and checkpoints. As these writes are sequential and limited random access patterns are involved, MySQL Cluster can achieve higher write throughput rates with limited disk hardware compared to a traditional disk-based caching RDBMS. This checkpointing to disk of in-memory table data can be disabled (on a per-table basis) if disk-based persistence isn't needed.
MySQL Cluster is designed to have no single point of failure. Provided that the cluster is set up correctly, any single node, system, or piece of hardware can fail without the entire cluster failing. Shared disk (SAN) is not required. The interconnects between nodes can be standard Ethernet, Gigabit Ethernet, InfiniBand, or SCI interconnects.
As MySQL Cluster stores tables in data nodes, rather than in the MySQL Server, there are multiple interfaces available to access the database:
Part of the commercial MySQL Cluster CGE, MySQL Cluster Manager is a tool designed to simplify the creation and administration of the MySQL Cluster CGE database by automating common management tasks, including on-line scaling, upgrades, backup/restore and reconfiguration. MySQL Cluster Manager also monitors and automatically recovers MySQL Server application nodes and management nodes, as well as the MySQL Cluster data nodes.
The open source MySQL Ndb Operator simplifies the deployment and operation of MySQL Cluster on a Kubernetes cluster. Ndb Operator deploys containerized MySQL Cluster Data, Management and SQL nodes in a number of StatefulSets with data stored in Persistent Volumes. Kubernetes mechanisms extend the high availability features of MySQL Cluster, for example automatically restoring HA redundancy after hardware failures by migrating pods to new hardware. Operating MySQL Cluster on Kubernetes allows a full stack of cloud native software to be operated in the same way on private or public clouds.
NDB Cluster is the distributed database system underlying MySQL Cluster. It can be used independently of a MySQL Server with users accessing the Cluster via the NDB API (C++). "NDB" stands for Network Database.
From the MySQL Server perspective the NDB Cluster is a Storage engine for storing tables of rows.
From the NDB Cluster perspective, a MySQL Server instance is an API process connected to the Cluster. NDB Cluster can concurrently support access from other types of API processes including Memcached, JavaScript/Node.JS, Java, JPA and HTTP/REST. All API processes can operate on the same tables and data stored in the NDB Cluster.
MySQL Cluster uses the MySQL Server to provide the following capabilities on top of Ndb Cluster:
All API processes including the MySQL Server use the NDBAPI [6] C++ client library to connect to the NDB Cluster and perform operations.
MySQL Cluster uses three different types of nodes (processes) :
Generally, it is expected that each node will run on a separate physical host, VM or cloud instance (although it is very common to co-locate Management Nodes with MySQL Servers). For best practice, it is recommended not to co-locate nodes within the same node group on a single physical host (as that would represent a single point of failure).
From the 8.0 release onwards, MySQL Cluster is based directly on the corresponding releases of the MySQL Server. Previously, MySQL Cluster version numbers were decoupled from those of MySQL Server - for example MySQL Cluster 7.6 was based on/contained the server component from MySQL 5.7.
Higher versions of MySQL Cluster include all of the features of lower versions, plus some new features. Currently available versions:
Older versions (no longer in development):
For evaluation purposes, it is possible to run MySQL Cluster on a single physical server. For production deployments, the minimum system requirements are for 3 x instances / hosts:
or
Configurations as follows:
Tips and recommendations on deploying highly performant, production grade clusters can be found in the MySQL Cluster Evaluation Guide and the Guide to Optimizing Performance of the MySQL Cluster Database.
MySQL AB acquired the technology behind MySQL Cluster from Alzato, a small venture company started by Ericsson. NDB was originally designed for the telecom market, with its high availability and high performance requirements. [14]
MySQL Cluster based on the NDB storage engine has since been integrated into the MySQL product, with its first release being in MySQL 4.1.
MySQL Cluster 7.5 inside and out. [15] Book written by Mikael Ronström, the founder of the NDB technology.
Pro MySQL NDB Cluster. [16] Book written by Jesper Wisborg Krogh and Mikiya Okuno, support engineers of MySQL.
MySQL Cluster is licensed under the GPLv2 license. Commercial support is available as part of MySQL Cluster CGE, which also includes non-open source addons such as MySQL Cluster Manager, MySQL Enterprise Monitor, in addition to MySQL Enterprise Security and MySQL Enterprise Audit.
MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the initialism for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.
PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.
Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.
Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:
oVirt is a free, open-source virtualization management platform. It was founded by Red Hat as a community project on which Red Hat Virtualization is based. It allows centralized management of virtual machines, compute, storage and networking resources, from an easy-to-use web-based front-end with platform independent access. KVM on x86-64, PowerPC64 and s390x architecture are the only hypervisors supported, but there is an ongoing effort to support ARM architecture in a future releases.
Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL, database management system intended to handle large amounts of data across multiple commodity servers, providing availability with no single point of failure. Cassandra supports clusters and spanning of multiple data centers with asynchronous and master-less replication. It allows low latency operations for all clients and implements Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.
A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.
Log shipping is the process of automating the backup of transaction log files on a primary (production) database server, and then restoring them onto a standby server. This technique is supported by Microsoft SQL Server, 4D Server, MySQL, and PostgreSQL. Similar to replication, the primary purpose of log shipping is to increase database availability by maintaining a backup server that can replace a production server quickly. Other databases such as Adaptive Server Enterprise and Oracle Database support the technique but require the Database Administrator to write code or scripts to perform the work.
InMage was a computer software company based in the US and India. It marketed a product line called Scout that used continuous data protection (CDP) for backup and replication. Scout consisted of two product lines: the host-offload line, which uses a software agent on the protected servers, and the fabric line, which uses an agent on the Fibre Channel switch fabric. The software protects at the volume or block level, tracking all write changes. It allows for local or remote protection policies. The first version of the product was released in 2002.
DataBlitz is a general purpose main memory database management system, developed by Lucent Bell Labs Research from 1993 to 1995. It replaced various home-grown database products used throughout Lucent beginning in 1997.
Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.
Amazon Relational Database Service is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call to the AWS control plane on-demand. AWS does not offer an SSH connection to the underlying virtual machine as part of the managed service.
eXtremeDB is a high-performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/C++ based programs. It runs on Windows, Linux, and other real-time and embedded operating systems.
Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.
Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.
The following outline is provided as an overview of and topical guide to MySQL:
Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.