Reverse index

Last updated

Database management systems provide multiple types of indexes to improve performance and data integrity across diverse applications. Index types include b-trees, bitmaps, and r-trees.

Contents

In database management systems, a reverse key index strategy reverses the key value before entering it in the index. [1] E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where each new key value is greater than the prior value, i.e., values monotonically increase. Reverse key indexes have become particularly important in high volume transaction processing systems because they reduce contention for index blocks.

Creating data

Reversed key indexes use b-tree structures, but preprocess key values before inserting them. Simplifying, b-trees place similar values on a single index block, e.g., storing 24538 on the same block as 24539. This makes them efficient both for looking up a specific value and for finding values within a range. However, if the application inserts values in sequence, each insert must have access to the newest block in the index in order to add the new value. If many users attempt to insert at the same time, they all must write to that block and have to get in line, slowing down the application. This is particularly a problem in clustered databases, which may require the block to be copied from one computer's memory to another's to allow the next user to perform their insert.

Reversing the key spreads similar new values across the entire index instead of concentrating them in any one leaf block. This means that 24538 appears on the same block as 14538 while 24539 goes to a different block, eliminating this cause of contention. (Since 14538 would have been created long before 24538, their inserts don't interfere with each other.)

Querying data

Reverse indexes are just as efficient as unreversed indexes for finding specific values, although they aren't helpful for range queries. Range queries are uncommon for artificial values such as sequence numbers. When searching the index, the query processor simply reverses the search target before looking it up.

Deleting data

Typically, applications delete data that is older on average before deleting newer data. Thus, data with lower sequence numbers generally go before those with higher values. As time passes, in standard b-trees, index blocks for lower values end up containing few values, with a commensurate increase in unused space, referred to as "rot". Rot not only wastes space, but slows query speeds, because a smaller fraction of a rotten index's blocks fit in memory at any one time. In a b-tree, if 14538 gets deleted, its index space remains empty. In a reverse index, if 14538 goes before 24538 arrives, 24538 can reuse 14538's space.[ citation needed ]

See also

Footnotes

  1. "Introduction To Reverse Key Indexes: Part I". Richard Foote's Oracle Blog. 2008-01-14. Retrieved 2019-04-13.

Related Research Articles

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems have an option of using the SQL for querying and maintaining the database.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

B+ tree

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

T-tree

In computer science a T-tree is a type of binary tree data structure that is used by main-memory databases, such as Datablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It's also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

In a database, a view is the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object. This pre-established query command is kept in the database dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view. In some NoSQL databases, views are the only way to query data.

In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

Cartesian tree

In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. Introduced by Vuillemin (1980) in the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence.

In pattern recognition, the iDistance is an indexing and query processing technique for k-nearest neighbor queries on point data in multi-dimensional metric spaces. The kNN query is one of the hardest problems on multi-dimensional data, especially when the dimensionality of the data is high. The iDistance is designed to process kNN queries in high-dimensional spaces efficiently and it is especially good for skewed data distributions, which usually occur in real-life data sets. The iDistance can be augmented with machine learning models to learn the data distributions for searching and storing the multi-dimensional data.

In computer science, the Bx tree is basically a query that is used to update efficient B+ tree-based index structures for moving objects.

In database management systems, block contention refers to multiple processes or instances competing for access to the same index or data block at the same time. In general this can be caused by very frequent index or table scans, or frequent updates. Concurrent statement executions by two or more instances may also lead to contention, and subsequently busy waiting for the process without the lock.

Amazon DynamoDB Cloud-based service

Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key–value and document data structures and is offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB exposes a similar data model to and derives its name from Dynamo, but has a different underlying implementation. Dynamo had a multi-leader design requiring the client to resolve version conflicts and DynamoDB uses synchronous replication across multiple data centers for high durability and availability. DynamoDB was announced by Amazon CTO Werner Vogels on January 18, 2012, and is presented as an evolution of Amazon SimpleDB.

Log-structured merge-tree

In computer science, the log-structured merge-tree is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables.

Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases have adopted a host of techniques to cope.