Materialized view

Last updated

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

Contents

The process of setting up a materialized view is sometimes called materialization. [1] This is a form of caching the results of a query, similar to memoization of the value of a function in functional languages, and it is sometimes described as a form of precomputation. [2] [3] As with other forms of precomputation, database users typically use materialized views for performance reasons, i.e. as a form of optimization. [4]

Materialized views that store data based on remote tables were also known as snapshots [5] (deprecated Oracle terminology).

In any database management system following the relational model, a view is a virtual table representing the result of a database query. Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date. Materialized views find use especially in data warehousing scenarios, where frequent queries of the actual base tables can be expensive.[ citation needed ]

In a materialized view, indexes can be built on any column. In contrast, in a normal view, it's typically only possible to exploit indexes on columns that come directly from (or have a mapping to) indexed columns in the base tables; often this functionality is not offered at all.

Implementations

Oracle

Materialized views were implemented first by the Oracle Database: the Query rewrite feature was added from version 8i. [6]

Example syntax to create a materialized view in Oracle:

CREATEMATERIALIZEDVIEWMV_MY_VIEWREFRESHFASTSTARTWITHSYSDATENEXTSYSDATE+1ASSELECT*FROM<table_name>;

PostgreSQL

In PostgreSQL, version 9.3 and newer natively support materialized views. [7] In version 9.3, a materialized view is not auto-refreshed, and is populated only at time of creation (unless WITH NO DATA is used). It may be refreshed later manually using REFRESH MATERIALIZED VIEW. [8] In version 9.4, the refresh may be concurrent with selects on the materialized view if CONCURRENTLY is used. [9]

Example syntax to create a materialized view in PostgreSQL:

CREATEMATERIALIZEDVIEWMV_MY_VIEW[WITH(storage_parameter[=value][,...])][TABLESPACEtablespace_name]ASSELECT*FROM<table_name>;

SQL Server

Microsoft SQL Server differs from other RDBMS by the way of implementing materialized view via a concept known as "Indexed Views". The main difference is that such views do not require a refresh because they are in fact always synchronized to the original data of the tables that compound the view. To achieve this, it is necessary that the lines of origin and destination are "deterministic" in their mapping, which limits the types of possible queries to do this. This mechanism has been realised since the 2000 version of SQL Server.

Example syntax to create a materialized view in SQL Server:

CREATEVIEWMV_MY_VIEWWITHSCHEMABINDINGASSELECTCOL1,SUM(COL2)ASTOTALFROM<table_name>GROUPBYCOL1;GOCREATEUNIQUECLUSTEREDINDEXXVONMV_MY_VIEW(COL1);

Stream processing frameworks

Apache Kafka (since v0.10.2), Apache Spark (since v2.0), Apache Flink, Kinetica DB, [10] Materialize, [11] and RisingWave [12] all support materialized views on streams of data.

Others

Materialized views are also supported in Sybase SQL Anywhere. [13] In IBM Db2, they are called "materialized query tables". [14] ClickHouse supports materialized views that automatically refresh on merges. [15] MySQL doesn't support materialized views natively, but workarounds can be implemented by using triggers or stored procedures [16] or by using the open-source application Flexviews. [17] Materialized views can be implemented in Amazon DynamoDB using data modification events captured by DynamoDB Streams. Google announced in 8 April 2020 [18] the availability of materialized views for BigQuery [19] as a beta release.

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of records, from one or more tables.

<span class="mw-page-title-main">Physical schema</span> Representation of a data design

A physical data model is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

In a database, a view is the result set of a stored query, which can be queried in the same manner as a persistent database collection object. This pre-established query command is kept in the data dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

The nested set model is a technique for representing nested set collections in relational databases.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In various SQL implementations, a hint is an addition to the SQL standard that instructs the database engine on how to execute the query. For example, a hint may tell the engine to use or not to use an index.

SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.

The following outline is provided as an overview of and topical guide to MySQL:

MEMORY is a storage engine for MySQL and MariaDB relational database management systems, developed by Oracle and MariaDB. Before the version 4.1 of MySQL it was called Heap.

References

  1. Compare: C.J. Date (28 August 2006). The Relational Database Dictionary: A Comprehensive Glossary of Relational Terms and Concepts, with Illustrative Examples. "O'Reilly Media, Inc.". p. 59. ISBN   978-1-4493-9115-7 . Retrieved 26 October 2016. materialization[:] A somewhat unsophisticated technique for implementing operations on views according to which (a) the relational expression that defines the view is evaluated at the time the operation is invoked, (b) the view is thereby materialized, and (c) the operation in question is then executed against the relation so materialized.
  2. Karen Morton; Kerry Osborne; Robyn Sands; Riyaj Shamsudeen; Jared Still (28 October 2013). Pro Oracle SQL. Apress. p. 48. ISBN   978-1-4302-6220-6.
  3. Marie-Aude Aufaure; Esteban Zimányi (16 January 2012). Business Intelligence: First European Summer School, EBISS 2011, Paris, France, July 3-8, 2011, Tutorial Lectures. Springer Science & Business Media. p. 43. ISBN   978-3-642-27357-5.
  4. Michael L. Gonzales (25 February 2003). IBM Data Warehousing: with IBM Business Intelligence Tools. John Wiley & Sons. p. 214. ISBN   978-0-471-45736-7.
  5. C.J. Date (28 August 2006). The Relational Database Dictionary: A Comprehensive Glossary of Relational Terms and Concepts, with Illustrative Examples. "O'Reilly Media, Inc.". p. 59. ISBN   978-1-4493-9115-7 . Retrieved 26 October 2016. materialized view[:] Deprecated term for a snapshot. [...] The problem is [...] that (as the definition indicates) snapshots have come to be known, at least in some circles, not as snapshots at all but as materialized views. But snapshots aren't views; views are virtual and snapshots aren't, and 'materialized view' is a contradiction in terms (at least as far as the model is concerned). Worse yet, the unqualified term view is often taken to mean a materialized view specifically, and thus we're in danger of no longer having a good term for a view in the original sense.
  6. Oracle8i Tuning Release 8.1.5. Ecst.csuchico.edu. Retrieved on 2012-02-09.
  7. "Materialized Views - PostgreSQL wiki". wiki.postgresql.org. Retrieved 29 November 2022.
  8. "CREATE MATERIALIZED VIEW". PostgreSQL Documentation. 10 November 2022. Retrieved 29 November 2022.
  9. "REFRESH MATERIALIZED VIEW". PostgreSQL Documentation. 13 February 2020. Retrieved 29 November 2022.
  10. "Materialized Views" . Retrieved 28 December 2022.
  11. "CMU DB Talk: Building Materialize" . Retrieved 30 March 2022.
  12. "Is RisingWave the Next Apache Flink?". www.singularity-data.com. 28 April 2022. Retrieved 30 June 2022.
  13. Materialized Views – Sybase SQL Anywhere Archived 2009-12-14 at the Wayback Machine . Ianywhere.com. Retrieved on 2012-02-09.
  14. Improving Performance with SQL Server 2005 Indexed Views. Microsoft.com. Retrieved on 2012-02-09.
  15. ClickHouse Documentation MaterializedView. Clickhouse.yandex. Retrieved on 2019-09-05.
  16. Implementing materialized views in MySQL. Shinguz.ch (2006-11-06). Retrieved on 2012-02-09.
  17. Flexviews for MySQL – incrementally refreshable materialized views w/ MySQL. Flexviews.sourceforge.net. Retrieved on 2012-02-09.
  18. "Release notes". Google.com. 8 April 2020. Retrieved 21 July 2021.
  19. Google BigQuery Materialized Views documentation Google.com Retrieved on 2020-05-20.