Partition (database)

Last updated December 29, 2024

Partitioning options on a table in MySQL in the environment of the Adminer tool. Adminer04.png — Partitioning options on a table in MySQL in the environment of the *Adminer* tool.

A partition is a division of a logical database or its constituent elements into distinct independent parts. Database partitioning refers to intentionally breaking a large database into smaller ones for scalability purposes, distinct from network partitions which are a type of network fault between nodes.^[1] In a partitioned database, each piece of data belongs to exactly one partition, effectively making each partition a small database of its own.^[1] Database partitioning is normally done for manageability, performance or availability ^[2] reasons, or for load balancing. It is popular in distributed database management systems, where each partition may be spread over multiple nodes, with users at the node performing local transactions on the partition. This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security.

Partitioning enables distribution of datasets across multiple disks and query loads across multiple processors. For queries that operate on a single partition, each node executes queries independently on its local partition, enabling linear scaling of query throughput with additional nodes. More complex queries can be parallelized across multiple nodes, though this presents additional challenges.^[1]

History

Database partitioning emerged in the 1980s with systems like Teradata and NonStop SQL. The approach was later adopted by NoSQL databases and Hadoop-based data warehouses. While implementations vary between transactional and analytical workloads, the core principles of partitioning remain consistent across both use cases.^[1]

Terminology

Different databases use varying terminology for partitioning:

Shard in MongoDB, Elasticsearch, and SolrCloud
Region in HBase
Tablet in Bigtable
vnode in Cassandra and Riak
vBucket in Couchbase

^[1]

Partitioning and Replication

Partitioning is commonly implemented alongside replication, storing partition copies across multiple nodes. Each record belongs to one partition but may exist on multiple nodes for fault tolerance. In leader-follower replication systems, nodes can simultaneously serve as leaders for some partitions and followers for others.^[1]

Load Balancing and Hot Spots

Partitioning aims to distribute data and query load evenly across nodes. With ideal distribution, system capacity scales linearly with added nodes—ten nodes should process ten times the data and throughput of a single node. Uneven distribution, termed skew, reduces partitioning efficiency. Partitions with excessive load are called hot spots.^[1]

Several strategies address hot spots:

Random record assignment to nodes, at the cost of retrieval complexity
Key-range partitioning with optimized boundaries
Hash-based partitioning for even load distribution^[1]

Partitioning criteria

Current high-end relational database management systems provide for different criteria to split the database. They take a partitioning key and assign a partition based on certain criteria. Some common criteria include:

Range partitioning: assigns continuous key ranges to partitions, analogous to encyclopedia volumes. Known range boundaries enable direct request routing. Boundaries can be set manually or automatically for balanced distribution. While this enables efficient range scans, certain access patterns create hot spots. For instance, in sensor networks using timestamp keys, writes concentrate in the current time period's partition. Using compound keys—such as prefixing timestamps with sensor identifiers—can distribute this load.^[1] An example could be a partition for all rows where the "zipcode" column has a value between 70000 and 79999.
List partitioning: a partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example, all rows where the column Country is either Iceland, Norway, Sweden, Finland or Denmark could build a partition for the Nordic countries.
Composite partitioning: allows for certain combinations of the above partitioning schemes, by for example first applying a range partitioning and then a hash partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.
Round-robin partitioning: the simplest strategy, it ensures uniform data distribution. With n partitions, the ith tuple in insertion order is assigned to partition (i mod n). This strategy enables the sequential access to a relation to be done in parallel. However, the direct access to individual tuples, based on a predicate, requires accessing the entire relation.
Hash partitioning: applies a hash function to convert skewed data into uniform distributions for even load distribution across partitions. While this effectively prevents hot spots, it sacrifices range query efficiency as adjacent keys scatter across partitions. Common implementations include MD5 in Cassandra and MongoDB. Some systems, like Cassandra, combine approaches using compound primary keys: hashing the first component for partitioning while maintaining sort order for remaining components within partitions.^[1]

In any partitioning scheme, data is typically arranged so that each piece of data (record, row, or document) belongs to exactly one partition.^[1] While some databases support operations that span multiple partitions, this single-partition association is fundamental to the partitioning concept.

Partitioning methods

The partitioning can be done by either building separate smaller databases (each with its own tables, indices, and transaction logs), or by splitting selected elements, for example just one table.

Horizontal partitioning

Horizontal partitioning involves putting different rows into different tables. For example, customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Vertical partitioning

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns.^[2] Generally, this practice is known as normalization. However, vertical partitioning extends further, and partitions columns even when already normalized. This type of partitioning is also called "row splitting", since rows get split by their columns, and might be performed explicitly or implicitly. Distinct physical machines might be used to realize vertical partitioning: storing infrequently used or very wide columns, taking up a significant amount of memory, on a different machine, for example, is a method of vertical partitioning. A common form of vertical partitioning is to split static data from dynamic data, since the former is faster to access than the latter, particularly for a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, but accessing the static data alone will show higher performance. A columnar database can be regarded as a database that has been vertically partitioned until each column is stored in its own table.

Related Research Articles

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

Extract, transform, load (ETL) is a three-phase computing process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collected from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

In distributed data storage, a P-Grid is a self-organizing structured peer-to-peer system, which can accommodate arbitrary key distributions, still providing storage load-balancing and efficient search by using randomized routing.

MySQL Cluster, also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In a database, a table is a collection of related data organized in table format; consisting of columns and rows.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

In computer science, data is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represented using the binary number system of ones (1) and zeros (0), instead of analog representation. In modern (post-1960) computer systems, all data is digital.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers. The system prioritizes availability and scalability over consistency, making it particularly suited for systems with high write throughput requirements due to its LSM tree indexing storage layer. As a wide-column database, Cassandra supports flexible schemas and efficiently handles data models with numerous sparse columns. The system is optimized for applications with well-defined data access patterns that can be incorporated into the schema design. Cassandra supports computer clusters which may span multiple data centers, featuring asynchronous and masterless replication. It enables low-latency operations for all clients and incorporates Amazon's Dynamo distributed storage and replication techniques, combined with Google's Bigtable data storage engine model.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard may be held on a separate database server instance, to spread load.

The standard column family is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key–value pair being a "row". Each column is a tuple consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.

Within database management systems, the record columnar file or RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

Sherpa is a cloud storage platform developed by Yahoo!. It is a hosted, distributed, and geographically replicated key-value data store. The service is a NoSQL system that address the scalability, availability, and latency needs of the conglomerate's websites. Sherpa has abilities such as elastic growth, multi-tenancy, global footprint for local low-latency access, asynchronous replication, representational state transfer (REST) based web service APIs, novel per-record consistency knobs, high availability, compression, secondary indexes, and record-level replication.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a managed NoSQL database service provided by Amazon Web Services (AWS). It supports key-value and document data structures and is designed to handle a wide range of applications requiring scalability and performance.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.

References

1 2 3 4 5 6 7 8 9 10 11 Kleppmann, Martin (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media. pp. 199–200. ISBN 9781491903100.
1 2 "Vertical Partitioning Algorithms for Database Design", by Shamkant Navathe, Stefano Ceri, Gio Wiederhold, and Jinglie Dou, Stanford University 1984

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[kleppmann-1] 1 2 3 4 5 6 7 8 9 10 11 Kleppmann, Martin (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media. pp. 199–200. ISBN 9781491903100.

[:0-2] 1 2 "Vertical Partitioning Algorithms for Database Design", by Shamkant Navathe, Stefano Ceri, Gio Wiederhold, and Jinglie Dou, Stanford University 1984

[1]

[2]

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud Blockchain-based database
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key PACELC theorem Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing
Category Outline