Partial index

Last updated

In databases, a partial index, also known as filtered index is an index which has some condition applied to it so that it includes a subset of rows in the table.

Contents

This allows the index to remain small, even though the table may be rather large, and have extreme selectivity.

Suppose you have a transaction table where entries start out with STATUS = 'A' (active), and then may pass through other statuses ('P' for pending, 'W' for "being worked on") before reaching a final status, 'F', at which point it is no longer likely to be processed again.

In PostgreSQL, a useful partial index might be defined as:

createindexpartial_statusontxn_table(status)wherestatusin('A','P','W');

This index would not bother storing any of the millions of rows that have reached "final" status, 'F', and would allow queries looking for transactions that still "need work" to efficiently search via this index.

Similarly, a partial index can be used to index only those rows where a column is not null, which will be of benefit when the column usually is null.

createindexpartial_object_updateonobject_table(updated_on)whereupdated_onisnotnull;

This index would allow the following query to read only the updated tuples:

select*fromobject_tablewhereupdated_onisnotnullorderbyupdated_on;

It is not necessary that the condition be the same as the index criterion; Stonebraker's paper below presents a number of examples with indexes similar to the following:

createindexpartial_salaryonemployee(age)wheresalary>2100;

Support

In SQL Server, this type of index is called a filtered index. [1]

Partial indexes have been supported in PostgreSQL since version 7.2, released in February 2002. [2]

SQLite supports partial indexes since version 3.8.0. [3]

MongoDB supports partial indexes since version 3.2. [4]

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

A join clause in SQL – corresponding to a join operation in relational algebra – combines columns from one or more tables into a new table. Informally, a join stitches two tables and puts on the same row records with matching fields on the join condition. ANSI-standard SQL specifies five types of JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

In a database, a view is the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object. This pre-established query command is kept in the database dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view. In some NoSQL databases, views are the only way to query data.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

A pivot table is a table of grouped values that aggregates the individual items of a more extensive table within one or more discrete categories. This summary might include sums, averages, or other statistics, which the pivot table groups together using a chosen aggregation function applied to the grouped values.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and triangulated irregular networks (TINs). While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium (OGC) developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/IEC standard is a part the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations.

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER.

<span class="mw-page-title-main">Navicat</span> SQL database management software

Navicat is a series of graphical database management and development software produced by CyberTech Ltd. for MySQL, MariaDB, MongoDB, Oracle, SQLite, PostgreSQL and Microsoft SQL Server. It has an Explorer-like graphical user interface and supports multiple database connections for local and remote databases. Its design is made to meet the needs of a variety of audiences, from database administrators and programmers to various businesses/companies that serve clients and share information with partners.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In database management systems (DBMS), a prepared statement, parameterized statement, or parameterized query is a feature used to pre-compile SQL code, separating it from data. Benefits of prepared statements are:

A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. SQL Server 2008 Documentation: Filtered Index Design Guidelines. Microsoft TechNet.
  2. "PostgreSQL Documentation: Release Notes: Release 7.2". PostgreSQL. Retrieved 2009-10-09. Enable partial indexes (Martijn van Oosterhout)
  3. "Partial Indexes" . Retrieved 2014-02-04.
  4. MongoDB V302 Release Notes.