Window function (SQL)

Last updated

In SQL, a window function or analytic function [1] is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function. [2]

Contents

Example

As an example, here is a query which uses a window function to compare the salary of each employee with the average salary of their department (example from the PostgreSQL documentation): [3]

SELECTdepname,empno,salary,avg(salary)OVER(PARTITIONBYdepname)FROMempsalary;

Output:

 depname  | empno | salary |          avg           ----------+-------+--------+---------------------- develop   |    11 |   5200 | 5020.0000000000000000 develop   |     7 |   4200 | 5020.0000000000000000 develop   |     9 |   4500 | 5020.0000000000000000 develop   |     8 |   6000 | 5020.0000000000000000 develop   |    10 |   5200 | 5020.0000000000000000 personnel |     5 |   3500 | 3700.0000000000000000 personnel |     2 |   3900 | 3700.0000000000000000 sales     |     3 |   4800 | 4866.6666666666666667 sales     |     1 |   5000 | 4866.6666666666666667 sales     |     4 |   4800 | 4866.6666666666666667 (10 rows)

The PARTITION BY clause groups rows into partitions, and the function is applied to each partition separately. If the PARTITION BY clause is omitted (such as with an empty OVER() clause), then the entire result set is treated as a single partition. [4] For this query, the average salary reported would be the average taken over all rows.

Window functions are evaluated after aggregation (after the GROUP BY clause and non-window aggregate functions, for example). [1]

Syntax

According to the PostgreSQL documentation, a window function has the syntax of one of the following: [4]

function_name([expression[,expression...]])OVERwindow_namefunction_name([expression[,expression...]])OVER(window_definition)function_name(*)OVERwindow_namefunction_name(*)OVER(window_definition)

where window_definition has syntax:

[existing_window_name][PARTITIONBYexpression[,...]][ORDERBYexpression[ASC|DESC|USINGoperator][NULLS{FIRST|LAST}][,...]][frame_clause]

frame_clause has the syntax of one of the following:

{RANGE|ROWS|GROUPS}frame_start[frame_exclusion]{RANGE|ROWS|GROUPS}BETWEENframe_startANDframe_end[frame_exclusion]

frame_start and frame_end can be UNBOUNDED PRECEDING, offset PRECEDING, CURRENT ROW, offset FOLLOWING, or UNBOUNDED FOLLOWING. frame_exclusion can be EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, or EXCLUDE NO OTHERS.

expression refers to any expression that does not contain a call to a window function.

Notation:

Example

Window functions allow access to data in the records right before and after the current record. [5] [6] [7] [8] A window function defines a frame or window of rows with a given length around the current row, and performs a calculation across the set of data in the window. [9] [10]

      NAME | ------------       Aaron| <-- Preceding (unbounded)      Andrew|      Amelia|       James|        Jill|      Johnny| <-- 1st preceding row     Michael| <-- Current row        Nick| <-- 1st following row     Ophelia|        Zach| <-- Following (unbounded)

In the above table, the next query extracts for each row the values of a window with one preceding and one following row:

SELECTLAG(name,1)OVER(ORDERBYname)"prev",name,LEAD(name,1)OVER(ORDERBYname)"next"FROMpeopleORDERBYname

The result query contains the following values:

|     PREV |     NAME |     NEXT | |----------|----------|----------| |    (null)|     Aaron|    Andrew| |     Aaron|    Andrew|    Amelia| |    Andrew|    Amelia|     James| |    Amelia|     James|      Jill| |     James|      Jill|    Johnny| |      Jill|    Johnny|   Michael| |    Johnny|   Michael|      Nick| |   Michael|      Nick|   Ophelia| |      Nick|   Ophelia|      Zach| |   Ophelia|      Zach|    (null)|

History

Window functions were introduced to the SQL:2003 standard and had functionality expanded in later specifications. [11]

Support for particular database implementations was added as follows:

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of records, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

A HAVING clause in SQL specifies that an SQL SELECT statement must only return rows where aggregate values meet the specified conditions.

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER.

SQL/XML or XML-Related Specifications is part 14 of the Structured Query Language (SQL) specification. In addition to the traditional predefined SQL data types like NUMERIC, CHAR, TIMESTAMP, ... it introduces the predefined data type XML together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In a SQL database query, a correlated subquery is a subquery that uses values from the outer query. Because the subquery may be evaluated once for each row processed by the outer query, it can be slow.

An alias is a feature of SQL that is supported by most, if not all, relational database management systems (RDBMSs). Aliases provide users with the ability to reduce the amount of code required for a query, and to make queries simpler to understand. In addition, aliasing is required when doing self joins

In relational databases a virtual column is a table column whose value(s) is automatically computed using other columns values, or another deterministic expression. Virtual columns are defined of SQL:2003 as Generated Column, and are only implemented by some DBMSs, like MariaDB, SQL Server, Oracle, PostgreSQL, SQLite and Firebird.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. 1 2 "Analytic function concepts in Standard SQL | BigQuery". Google Cloud. Retrieved 2021-03-23.
  2. "Window Functions". sqlite.org. Retrieved 2021-03-23.
  3. "3.5. Window Functions". PostgreSQL Documentation. 2021-02-11. Retrieved 2021-03-23.
  4. 1 2 "4.2. Value Expressions". PostgreSQL Documentation. 2021-02-11. Retrieved 2021-03-23.
  5. Leis, Viktor; Kundhikanjana, Kan; Kemper, Alfons; Neumann, Thomas (June 2015). "Efficient Processing of Window Functions in Analytical SQL Queries". Proc. VLDB Endow. 8 (10): 1058–1069. doi:10.14778/2794367.2794375. ISSN   2150-8097.
  6. Cao, Yu; Chan, Chee-Yong; Li, Jie; Tan, Kian-Lee (July 2012). "Optimization of Analytic Window Functions". Proc. VLDB Endow. 5 (11): 1244–1255. arXiv: 1208.0086 . doi:10.14778/2350229.2350243. ISSN   2150-8097.
  7. "Probably the Coolest SQL Feature: Window Functions". Java, SQL and jOOQ. 2013-11-03. Retrieved 2017-09-26.
  8. "Window Functions in SQL - Simple Talk". Simple Talk. 2013-10-31. Retrieved 2017-09-26.
  9. "SQL Window Functions Introduction". Apache Drill.
  10. "PostgreSQL: Documentation: Window Functions". www.postgresql.org. Retrieved 2020-04-04.
  11. "Window Functions Overview". MariaDB KnowledgeBase. Retrieved 2021-03-23.
  12. "PostgreSQL Release 8.4". www.postgresql.org. Retrieved 2024-03-10.
  13. "MySQL :: What's New in MySQL 8.0? (Generally Available)". dev.mysql.com. Retrieved 2022-11-21.
  14. "MySQL :: MySQL 8.0 Reference Manual :: 12.21.2 Window Function Concepts and Syntax". dev.mysql.com.
  15. "MariaDB 10.2.0 Release Notes". mariadb.com. Retrieved 2024-03-10.