Hierarchical and recursive queries in SQL

Last updated

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

Contents

In standard SQL:1999 hierarchical queries are implemented by way of recursive common table expressions (CTEs). Unlike Oracle's earlier connect-by clause, recursive CTEs were designed with fixpoint semantics from the beginning. [1] Recursive CTEs from the standard were relatively close to the existing implementation in IBM DB2 version 2. [1] Recursive CTEs are also supported by Microsoft SQL Server (since SQL Server 2008 R2), [2] Firebird 2.1, [3] PostgreSQL 8.4+, [4] SQLite 3.8.3+, [5] IBM Informix version 11.50+, CUBRID, MariaDB 10.2+ and MySQL 8.0.1+. [6] Tableau has documentation describing how CTEs can be used. TIBCO Spotfire does not support CTEs, while Oracle 11g Release 2's implementation lacks fixpoint semantics.

Without common table expressions or connected-by clauses it is possible to achieve hierarchical queries with user-defined recursive functions. [7]

Common table expression

A common table expression, or CTE, (in SQL) is a temporary named result set, derived from a simple query and defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement.

CTEs can be thought of as alternatives to derived tables (subquery), views, and inline user-defined functions.

Common table expressions are supported by Teradata (starting with version 14), IBM Db2, Informix (starting with version 14.1), Firebird (starting with version 2.1), [8] Microsoft SQL Server (starting with version 2005), Oracle (with recursion since 11g release 2), PostgreSQL (since 8.4), MariaDB (since 10.2), MySQL (since 8.0), SQLite (since 3.8.3), HyperSQL, Informix (since 14.10), [9] Google BigQuery, Sybase (starting with version 9), Vertica, H2 (experimental), [10] and many others. Oracle calls CTEs "subquery factoring". [11]

The syntax for a CTE (which may or may not be recursive) is as follows:

WITH[RECURSIVE]with_query[,...]SELECT...

where with_query's syntax is:

query_name[(column_name[,...])]AS(SELECT...)

Recursive CTEs can be used to traverse relations (as graphs or trees) although the syntax is much more involved because there are no automatic pseudo-columns created (like LEVEL below); if these are desired, they have to be created in the code. See MSDN documentation [2] or IBM documentation [12] [13] for tutorial examples.

The RECURSIVE keyword is not usually needed after WITH in systems other than PostgreSQL. [14]

In SQL:1999 a recursive (CTE) query may appear anywhere a query is allowed. It's possible, for example, to name the result using CREATE [RECURSIVE] VIEW. [15] Using a CTE inside an INSERT INTO, one can populate a table with data generated from a recursive query; random data generation is possible using this technique without using any procedural statements. [16]

Some Databases, like PostgreSQL, support a shorter CREATE RECURSIVE VIEW format which is internally translated into WITH RECURSIVE coding. [17]

An example of a recursive query computing the factorial of numbers from 0 to 9 is the following:

WITHrecursivetemp(n,fact)AS(SELECT0,1-- Initial SubqueryUNIONALLSELECTn+1,(n+1)*factFROMtempWHEREn<9-- Recursive Subquery)SELECT*FROMtemp;

CONNECT BY

An alternative syntax is the non-standard CONNECT BY construct; it was introduced by Oracle in the 1980s. [18] Prior to Oracle 10g, the construct was only useful for traversing acyclic graphs because it returned an error on detecting any cycles; in version 10g Oracle introduced the NOCYCLE feature (and keyword), making the traversal work in the presence of cycles as well. [19]

CONNECT BY is supported by Snowflake, EnterpriseDB, [20] Oracle database, [21] CUBRID, [22] IBM Informix [23] and IBM Db2 although only if it is enabled as a compatibility mode. [24] The syntax is as follows:

SELECTselect_listFROMtable_expression[WHERE...][STARTWITHstart_expression]CONNECTBY[NOCYCLE]{PRIORchild_expr=parent_expr|parent_expr=PRIORchild_expr}[ORDERSIBLINGSBYcolumn1[ASC|DESC][,column2[ASC|DESC]]...][GROUPBY...][HAVING...]...
For example,
SELECTLEVEL,LPAD(' ',2*(LEVEL-1))||ename"employee",empno,mgr"manager"FROMempSTARTWITHmgrISNULLCONNECTBYPRIORempno=mgr;

The output from the above query would look like:

 level |  employee   | empno | manager -------+-------------+-------+---------      1 | KING        |  7839 |      2 |   JONES     |  7566 |    7839      3 |     SCOTT   |  7788 |    7566      4 |       ADAMS |  7876 |    7788      3 |     FORD    |  7902 |    7566      4 |       SMITH |  7369 |    7902      2 |   BLAKE     |  7698 |    7839      3 |     ALLEN   |  7499 |    7698      3 |     WARD    |  7521 |    7698      3 |     MARTIN  |  7654 |    7698      3 |     TURNER  |  7844 |    7698      3 |     JAMES   |  7900 |    7698      2 |   CLARK     |  7782 |    7839      3 |     MILLER  |  7934 |    7782 (14 rows)

Pseudo-columns

Unary operators

The following example returns the last name of each employee in department 10, each manager above that employee in the hierarchy, the number of levels between manager and employee, and the path between the two:

SELECTename"Employee",CONNECT_BY_ROOTename"Manager",LEVEL-1"Pathlen",SYS_CONNECT_BY_PATH(ename,'/')"Path"FROMempWHERELEVEL>1ANDdeptno=10CONNECTBYPRIORempno=mgrORDERBY"Employee","Manager","Pathlen","Path";

Functions

See also

Related Research Articles

Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

The SQL SELECT statement returns a result set of records, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline with the program source code, of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as C/C++, COBOL and Fortran. This differs from SQL-derived programming languages that don't go through discrete preprocessors, such as PL/SQL and T-SQL.

The recursive join is an operation used in relational databases, also sometimes called a "fixed-point join". It is a compound operation that involves repeating the join operation, typically accumulating more records each time, until a repetition makes no change to the results.

In relational databases, the information schema is an ANSI-standard set of read-only views that provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information that some databases make available through non-standard commands, such as:

 => SELECT count(table_name) FROM information_schema.tables;  count   -------  99    => SELECT column_name, data_type, column_default, is_nullable  FROM information_schema.columns WHERE table_name='alpha';  column_name | data_type | column_default | is_nullable   -------------+-----------+----------------+-------------  foo | integer | | YES  bar | character | | YES    => SELECT * FROM information_schema.information_schema_catalog_name;  catalog_name   --------------  johnd  

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER.

SQL/XML or XML-Related Specifications is part 14 of the Structured Query Language (SQL) specification. In addition to the traditional predefined SQL data types like NUMERIC, CHAR, TIMESTAMP, ... it introduces the predefined data type XML together with constructors, several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in a SQL database.

SQL:1999 was the fourth revision of the SQL database query language. It introduced many new features, many of which required clarifications in the subsequent SQL:2003. In the meanwhile SQL:1999 is deprecated.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.

In relational databases a virtual column is a table column whose value(s) is automatically computed using other columns values, or another deterministic expression. Virtual columns are defined of SQL:2003 as Generated Column, and are only implemented by some DBMSs, like MariaDB, SQL Server, Oracle, PostgreSQL, SQLite and Firebird.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. 1 2 Jim Melton; Alan R. Simon (2002). SQL:1999: Understanding Relational Language Components. Morgan Kaufmann. ISBN   978-1-55860-456-8.
  2. 1 2 Microsoft. "Recursive Queries Using Common Table Expressions" . Retrieved 2009-12-23.
  3. Helen Borrie (2008-07-15). "Firebird 2.1 Release Notes" . Retrieved 2015-11-24.
  4. "WITH Queries". 10 February 2022. PostgreSQL
  5. "WITH Clause". SQLite
  6. "MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs)". mysqlserverteam.com
  7. Paragon corporation: Using PostgreSQL User-Defined Functions to solve the Tree Problem, February 15, 2004, accessed September 19, 2015
  8. https://firebirdsql.org/file/documentation/reference_manuals/reference_material/Firebird-2.5-LangRef-Update.pdf [ bare URL PDF ]
  9. possible before 14.10 with temp tables https://stackoverflow.com/questions/42579298/why-does-a-with-clause-give-a-syntax-error-on-informix
  10. "Advanced".
  11. Karen Morton; Robyn Sands; Jared Still; Riyaj Shamsudeen; Kerry Osborne (2010). Pro Oracle SQL. Apress. p. 283. ISBN   978-1-4302-3228-5.
  12. "IBM Docs".
  13. "IBM Docs".
  14. Regina Obe; Leo Hsu (2012). PostgreSQL: Up and Running. O'Reilly Media. p. 94. ISBN   978-1-4493-2633-3.
  15. Jim Melton; Alan R. Simon (2002). SQL:1999: Understanding Relational Language Components. Morgan Kaufmann. p. 352. ISBN   978-1-55860-456-8.
  16. Don Chamberlin (1998). A Complete Guide to DB2 Universal Database. Morgan Kaufmann. pp. 253–254. ISBN   978-1-55860-482-7.
  17. "Create View". 10 February 2022.
  18. Benedikt, M.; Senellart, P. (2011). "Databases". In Blum, Edward K.; Aho, Alfred V. (eds.). Computer Science. The Hardware, Software and Heart of It. p. 189. doi:10.1007/978-1-4614-1168-0_10. ISBN   978-1-4614-1167-3.
  19. Sanjay Mishra; Alan Beaulieu (2004). Mastering Oracle SQL. O'Reilly Media, Inc. p. 227. ISBN   978-0-596-00632-7.
  20. Hierarchical Queries Archived 2008-06-21 at the Wayback Machine , EnterpriseDB
  21. Hierarchical Queries, Oracle
  22. "CUBRID Hierarchical Query" . Retrieved 11 February 2013.
  23. Hierarchical Clause, IBM Informix
  24. Jonathan Gennick (2010). SQL Pocket Guide (3rd ed.). O'Reilly Media, Inc. p. 8. ISBN   978-1-4493-9409-7.

Further reading

Academic textbooks. Note that these cover only the SQL:1999 standard (and Datalog), but not the Oracle extension.