Block Range Index

Last updated

A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large [lower-roman 1] tables.

Contents

BRIN indexes provide similar benefits to horizontal partitioning or sharding but without needing to explicitly declare partitions. [1]

A BRIN is applicable to an index on a table that is large and where the index key value is easily sorted and evaluated with a MinMax function. [lower-roman 2]

BRIN were originally proposed by Alvaro Herrera of 2ndQuadrant in 2013 as 'Minmax indexes'. [2] Implementations thus far are tightly coupled to internal implementation and storage techniques for the database tables. This makes them efficient, but limits them to particular vendors. So far PostgreSQL is the only vendor to have announced a live product with this specific feature, in PostgreSQL 9.5. [3] [4] Other vendors have described some similar features, [2] including Oracle, [5] [6] Netezza 'zone maps', [7] Infobright 'data packs', [8] MonetDB [9] and Apache Hive with ORC/Parquet. [10]

Design

B-tree index structure B-tree.svg
B-tree index structure
BRIN index structure BRIN index.svg
BRIN index structure

BRIN operate by "summarising" large blocks of data into a compact form, which can be efficiently tested to exclude many of them from a database query, early on. These tests exclude a large block of data for each comparison. By reducing the data volume so early on, both by representing large blocks as small tuples, and by eliminating many blocks, BRIN substantially reduce the amount of detailed data that must be examined by the database node on a row-by-row basis. [11]

Data storage in large databases is layered and chunked, with the table storage arranged into 'blocks'. Each block contains perhaps 1MB in each chunk [lower-roman 3] [13] and they are retrieved by requesting specific blocks from a disk-based storage layer. BRIN are a lightweight in-memory summary layer above this: each tuple in the index summarises one block as to the range of the data contained therein: its minimum and maximum values, and if the block contains any non-null data for the column(s) of interest. [14]

Unlike a traditional index which locates the regions of the table containing values of interest, BRIN act as "negative indexes", [5] showing the blocks that are definitely not of interest and thus do not need to be processed further.

Some simple benchmarks suggest a five-fold improvement in search performance with an index scan, compared to the unindexed table. [3] Compared to B-trees, they avoid their maintenance overhead. [2]

As BRIN are so lightweight, they may be held entirely in memory, thus avoiding disk overhead during the scan. The same may not be true of B-tree: B-tree requires a tree node for every approximately N rows in the table, where N is the capacity of a single node, thus the index size is large. As BRIN only requires a tuple for each block (of many rows), the index becomes sufficiently small to make the difference between disk and memory. For a 'narrow' table [lower-roman 4] the B-tree index volume approaches that of the table itself; the BRIN may be only 5-15% of it. [15]

Advantages

Search and index scan

A large database index would typically use B-tree algorithms. BRIN is not always a substitute for B-tree, it is an improvement on sequential scanning of an index, with particular (and potentially large) advantages when the index meets particular conditions for being ordered and for the search target to be a narrow set of these values. In the general case, with random data, B-tree may still be superior. [3]

A particular advantage of the BRIN technique, shared with Oracle Exadata's Smart Scanning, [6] is in the use of this type of index with Big Data or data warehousing applications, where it is known that almost all of the table is irrelevant to the range of interest. BRIN allows the table to be queried in such cases by only retrieving blocks that may contain data of interest and excluding those which are clearly outside the range, or contain no data for this column.

Insert

A regular problem with the processing of large tables is that retrieval requires the use of an index, but maintaining this index slows down the addition of new records. Typical practices have been to group additions together and add them as a single bulk transaction, or to drop the index, add the batch of new records and then recreate the index. Both of these are disruptive to simultaneous read / write operations and may not be possible in some continuously-operating businesses. [16]

With BRIN, the slowdown from maintaining the index is much reduced compared to B-tree. [17] Wong reports that B-tree slowed down additions to an unindexed 10GB table by 85%, but a comparable BRIN only had an overhead of 11%. [1]

Index creation

BRIN may be created for extremely large data where B-tree would require horizontal partitioning. [14]

Creating the BRIN is also much faster than for a B-tree, by 80%. [1] This would be a useful improvement to refactoring existing database applications that use the drop-add-reindex approach, without requiring code changes.

Implementation

Dependence on table ordering

Multiple BRIN may be defined for different columns on a single table. However, there are restrictions.

BRIN are only efficient if the ordering of the key values follows the organisation of blocks in the storage layer. [13] [15] In the simplest case, this could require the physical ordering of the table, which is often the creation order of the rows within it, to match the key's order. Where this key is a creation date, that may be a trivial requirement. [14] :9

If the data is truly random, or if there is much churn of the key values in a 'hot' database, the assumptions underlying BRIN may break down. All blocks contain entries "of interest" and so few may be excluded early on by the BRIN range filter.

In most cases, BRIN is restricted to a single index per table. Multiple BRIN may be defined, but only one is likely to have suitable ordering. If two (or more) indexes have similar ordering behaviour, it may be possible and useful to define multiple BRIN on the same table. An obvious example is where both a creation date and a record_id column both increase monotonically with the record creation sequence. In other cases, the key value may not be monotonic, but provided that there is still a strong grouping within the record's physical order, BRIN is effective.

Exadata Storage Indexes

BRIN have some similarities to Oracle Exadata "Storage Indexes". [2] [5] [18] Exadata has the strong concept of a 'storage layer' in its architecture stack. Table data is held in blocks or 'storage cells' on the storage servers. These storage cells are opaque to the storage server and are returned to the database engine on request, by their identifier. Previously, the database nodes must request all the storage cells in order to scan them. [6]

Storage Indexes provides data pruning at this layer: efficiently indicating sections that are of no further interest. [13] [lower-roman 5] [19] The Storage Index is loaded into memory on the storage server, so that when a request for cells is issued it may be predicated with search values. These are compared to the Storage Index and then only the relevant cells need be returned to the database node.

Performance advantages with a Storage Index are most evident when the indexed column contains many nulls. Massive performance advantages are gained when scanning across sparse data. [20]

Development

Development for PostgreSQL was carried out as part of the AXLE project (Advanced Analytics for Extremely Large European Databases) [21] This work was partially funded by the European Union's Seventh Framework Programme (FP7/2007-2013). [22]

PostgreSQL

Implementation for PostgreSQL was first evident in 2013. [2] BRIN appeared in release 9.5 of PostgreSQL at the start of 2016. [15] [23]

See also

Notes

  1. 'Large' here refers to the number of rows in a table, rather than the field sizes or overall size.
  2. A function which efficiently evaluates a large number of data items and returns their minimum and maximum values. The concepts of "minimum" and "maximum" are broad and may be applied to any data type, or their combinations, which is sortable.
  3. PostgreSQL has a default BRIN chunk size of 128× 8k pages, or 1MB. [3] Oracle terms these 'storage regions' and gives them a default size of 1MB. [12]
  4. The table columns are barely wider than the indexed columns.
  5. Foote [13] describes the Index as holding "a flag to denote whether any Nulls exist". This is probably a typo: Oracle describes them as, "negative indexes" where "they identify the areas that definitely will not contain the values" [5] In such a case, the flag would be more clearly described as denoting either "Only Nulls exist" or if "Any non-Nulls exist".

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

Structured Query Language (SQL), is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

An SQL INSERT statement adds one or more records to any single table in a relational database.

<span class="mw-page-title-main">T-tree</span>

In computer science a T-tree is a type of binary tree data structure that is used by main-memory databases, such as Datablitz, eXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and triangulated irregular networks (TINs). While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium (OGC) developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/IEC standard is a part of the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations. Almost all current relational and object-relational database management systems now have spatial extensions, and some GIS software vendors have developed their own spatial extensions to database management systems.

An Entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix.

<span class="mw-page-title-main">Greenplum</span>

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.

A full table scan is a scan made on a database where each row of the table is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition. Full table scans are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers.

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, Times Ten in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

<span class="mw-page-title-main">Oracle NoSQL Database</span>

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. 1 2 3 Mark Wong (October 10, 2014). "Loading Tables and Creating B-tree and Block Range Indexes". AXLE project.
  2. 1 2 3 4 5 Alvaro Herrera (2013-06-14). "Minmax indexes". Pg Hackers.
  3. 1 2 3 4 "What's new in PostgreSQL 9.5". PostgreSQL.
  4. "Chapter 62. BRIN Indexes". PostgreSQL 9.5.0 Documentation. 2016.
  5. 1 2 3 4 Arup Nanda (May–June 2011). "Smart Scans Meet Storage Indexes". Oracle Magazine. Oracle.
  6. 1 2 3 "Understanding Storage Indexes in Oracle Exadata". 2 June 2015.
  7. "With Netezza Always Use Integer Join Keys For Good Compression, Zone Maps, And Joins". Netezza. 2010.
  8. "Data packs". Infobright. Archived from the original on 2009-06-27.
  9. "Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS". 2007. pp. 723–734. CiteSeerX   10.1.1.108.2662 .
  10. "Hive Optimizations with Indexes, Bloom-Filters and Statistics". Jörn Franke. 2015.
  11. Herrera, Alvaro (7 November 2014). "commitdiff - BRIN: Block Range Indexes". git.postgresql.org. Retrieved 2017-10-03.
  12. "When is Exadata's storage indexes used?". OakTable.net.
  13. 1 2 3 4 Richard Foote (4 October 2012). "Exadata Storage Indexes – Part I".
  14. 1 2 3 Mark Wong (10 March 2015). "PostgreSQL Performance Presentation" (PDF). pp. 7–10.
  15. 1 2 3 "Block Range (BRIN) Indexes in PostgreSQL 9.5". Python Sweetness. May 22, 2015.
  16. Petr Jelinek (28 November 2014). "Progress on online upgrade". AXLE project.
  17. Mark Wong (10 October 2014). "Index Overhead on a Growing Table". 2Ndquadrant | Postgresql.
  18. "Oracle Sun Database Machine Application Best Practices for Data Warehousing". Oracle. 1094934.1.{{cite web}}: Missing or empty |url= (help)
  19. Marc Fielding (July 20, 2010). "Exadata's Best Kept Secret: Storage Indexes".
  20. Kerry Osborne (August 10, 2010). "Oracle Exadata – Storage Indexes".
  21. "AXLE project (Advanced Analytics for Extremely Large European Databases)". 2014.
  22. "European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633".{{cite web}}: Missing or empty |url= (help)
  23. Alvaro Herrera (2014-11-07). "BRIN: Block Range Indexes". PostgreSQL. Retrieved 2016-01-14.