Window function (SQL)

Last updated

In SQL, a window function or analytic function [1] is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function. [2]

Contents

Example

As an example, here is a query which uses a window function to compare the salary of each employee with the average salary of their department (example from the PostgreSQL documentation): [3]

SELECTdepname,empno,salary,avg(salary)OVER(PARTITIONBYdepname)FROMempsalary;

Output:

 depname  | empno | salary |          avg           ----------+-------+--------+---------------------- develop   |    11 |   5200 | 5020.0000000000000000 develop   |     7 |   4200 | 5020.0000000000000000 develop   |     9 |   4500 | 5020.0000000000000000 develop   |     8 |   6000 | 5020.0000000000000000 develop   |    10 |   5200 | 5020.0000000000000000 personnel |     5 |   3500 | 3700.0000000000000000 personnel |     2 |   3900 | 3700.0000000000000000 sales     |     3 |   4800 | 4866.6666666666666667 sales     |     1 |   5000 | 4866.6666666666666667 sales     |     4 |   4800 | 4866.6666666666666667 (10 rows)

The PARTITION BY clause groups rows into partitions, and the function is applied to each partition separately. If the PARTITION BY clause is omitted (such as with an empty OVER() clause), then the entire result set is treated as a single partition. [4] For this query, the average salary reported would be the average taken over all rows.

Window functions are evaluated after aggregation (after the GROUP BY clause and non-window aggregate functions, for example). [1]

Syntax

According to the PostgreSQL documentation, a window function has the syntax of one of the following: [4]

function_name([expression[,expression...]])OVERwindow_namefunction_name([expression[,expression...]])OVER(window_definition)function_name(*)OVERwindow_namefunction_name(*)OVER(window_definition)

where window_definition has syntax:

[existing_window_name][PARTITIONBYexpression[,...]][ORDERBYexpression[ASC|DESC|USINGoperator][NULLS{FIRST|LAST}][,...]][frame_clause]

frame_clause has the syntax of one of the following:

{RANGE|ROWS|GROUPS}frame_start[frame_exclusion]{RANGE|ROWS|GROUPS}BETWEENframe_startANDframe_end[frame_exclusion]

frame_start and frame_end can be UNBOUNDED PRECEDING, offset PRECEDING, CURRENT ROW, offset FOLLOWING, or UNBOUNDED FOLLOWING. frame_exclusion can be EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, or EXCLUDE NO OTHERS.

expression refers to any expression that does not contain a call to a window function.

Notation:

Example

Window functions allow access to data in the records right before and after the current record. [5] [6] [7] [8] A window function defines a frame or window of rows with a given length around the current row, and performs a calculation across the set of data in the window. [9] [10]

      NAME | ------------       Aaron| <-- Preceding (unbounded)      Andrew|      Amelia|       James|        Jill|      Johnny| <-- 1st preceding row     Michael| <-- Current row        Nick| <-- 1st following row     Ophelia|        Zach| <-- Following (unbounded)

In the above table, the next query extracts for each row the values of a window with one preceding and one following row:

SELECTLAG(name,1)OVER(ORDERBYname)"prev",name,LEAD(name,1)OVER(ORDERBYname)"next"FROMpeopleORDERBYname

The result query contains the following values:

|     PREV |     NAME |     NEXT | |----------|----------|----------| |    (null)|     Aaron|    Andrew| |     Aaron|    Andrew|    Amelia| |    Andrew|    Amelia|     James| |    Amelia|     James|      Jill| |     James|      Jill|    Johnny| |      Jill|    Johnny|   Michael| |    Johnny|   Michael|      Nick| |   Michael|      Nick|   Ophelia| |      Nick|   Ophelia|      Zach| |   Ophelia|      Zach|    (null)|

History

Window functions were incorporated into the SQL:2003 standard and had functionality expanded in later specifications. [11]

Support for particular database implementations was added as follows:

See also

Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

<span class="mw-page-title-main">Data definition language</span> Syntax for defining data structures

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP. If you see a .ddl file, that means the file contains a statement to create a table. Oracle SQL Developer contains the ability to export from an ERD generated with Data Modeler to either a .sql file or a .ddl file.

The SQL SELECT statement returns a result set of rows, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A data control language (DCL) is a syntax similar to a computer programming language used to control access to data stored in a database (authorization). In particular, it is a component of Structured Query Language (SQL). Data Control Language is one of the logical group in SQL Commands. SQL is the standard language for relational database management systems. SQL statements are used to perform tasks such as insert data to a database, delete or update data in a database, or retrieve data from a database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE or DELETE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER.

SQL/XML or XML-Related Specifications is part 14 of the Structured Query Language (SQL) specification. In addition to the traditional predefined SQL data types like NUMERIC, CHAR, TIMESTAMP, ... it introduces the predefined data type XML together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In a SQL database query, a correlated subquery is a subquery that uses values from the outer query. This can have major impact on performance because the correlated subquery might get recomputed every time for each row of the outer query is processed. A correlated subquery can contain another correlated subquery.

Hyper Text Structured Query Language (HTSQL) is a schema-driven URI-to-SQL query language that takes a request over HTTP, converts it to a SQL query, executes the query against a database, and returns the results in a format best suited for the user agent The HTSQL language is implemented on "HTSQL servers," which use HTSQL to convert web requests into equivalent SQL, executes requests on a server-side database, and returns results in XML, HTML, CSV, JSON, or YAML formats. The current implementation as of April 2010 uses Python and works with PostgreSQL, MySQL, SQLite, Oracle, and Microsoft SQL Server databases.

In database management systems (DBMS), a prepared statement, parameterized statement, is a feature where the database pre-compiles SQL code and stores the results, separating it from data. Benefits of prepared statements are:

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, TimesTen in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

In relational databases a virtual column is a table column whose value(s) is automatically computed using other columns values, or another deterministic expression. Virtual columns are defined of SQL:2003 as Generated Column, and are only implemented by some DBMSs, like MariaDB, SQL Server, Oracle, PostgreSQL, SQLite and Firebird.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. 1 2 "Analytic function concepts in Standard SQL | BigQuery". Google Cloud. Retrieved 2021-03-23.
  2. "Window Functions". sqlite.org. Retrieved 2021-03-23.
  3. "3.5. Window Functions". PostgreSQL Documentation. 2021-02-11. Retrieved 2021-03-23.
  4. 1 2 "4.2. Value Expressions". PostgreSQL Documentation. 2021-02-11. Retrieved 2021-03-23.
  5. Leis, Viktor; Kundhikanjana, Kan; Kemper, Alfons; Neumann, Thomas (June 2015). "Efficient Processing of Window Functions in Analytical SQL Queries". Proc. VLDB Endow. 8 (10): 1058–1069. doi:10.14778/2794367.2794375. ISSN   2150-8097.
  6. Cao, Yu; Chan, Chee-Yong; Li, Jie; Tan, Kian-Lee (July 2012). "Optimization of Analytic Window Functions". Proc. VLDB Endow. 5 (11): 1244–1255. arXiv: 1208.0086 . doi:10.14778/2350229.2350243. ISSN   2150-8097.
  7. "Probably the Coolest SQL Feature: Window Functions". Java, SQL and jOOQ. 2013-11-03. Retrieved 2017-09-26.
  8. "Window Functions in SQL - Simple Talk". Simple Talk. 2013-10-31. Retrieved 2017-09-26.
  9. "SQL Window Functions Introduction". Apache Drill.
  10. "PostgreSQL: Documentation: Window Functions". www.postgresql.org. Retrieved 2020-04-04.
  11. "Window Functions Overview". MariaDB KnowledgeBase. Retrieved 2021-03-23.
  12. "Oracle 8i Release 2 (8.1.6) New Features". www.oracle.com. Retrieved 2025-01-23.
  13. "Analytic Functions in Oracle 8i" (PDF). www.stanford.edu. Retrieved 2025-01-23.
  14. "PostgreSQL Release 8.4". www.postgresql.org. 24 July 2014. Retrieved 2024-03-10.
  15. "MySQL :: What's New in MySQL 8.0? (Generally Available)". dev.mysql.com. Retrieved 2022-11-21.
  16. "MySQL :: MySQL 8.0 Reference Manual :: 12.21.2 Window Function Concepts and Syntax". dev.mysql.com.
  17. "MariaDB 10.2.0 Release Notes". mariadb.com. Retrieved 2024-03-10.
  18. "SQLite Release 3.25.0 On 2018-09-15". www.sqlite.org. Retrieved 5 February 2025.