Select (SQL)

Last updated

The SQL SELECT statement returns a result set of rows, from one or more tables. [1] [2]

Contents

A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language (DML) command. As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.

The SELECT statement has many optional clauses:

Overview

SELECT is the most common operation in SQL, called "the query". SELECT retrieves data from one or more tables, or expressions. Standard SELECT statements have no persistent effects on the database. Some non-standard implementations of SELECT can have persistent effects, such as the SELECT INTO syntax provided in some databases. [4]

Queries allow the user to describe desired data, leaving the database management system (DBMS) to carry out planning, optimizing, and performing the physical operations necessary to produce that result as it chooses.

A query includes a list of columns to include in the final result, normally immediately following the SELECT keyword. An asterisk ("*") can be used to specify that the query should return all columns of all the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include:

The following example of a SELECT query returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set.

SELECT*FROMBookWHEREprice>100.00ORDERBYtitle;

The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.

SELECTBook.titleASTitle,count(*)ASAuthorsFROMBookJOINBook_authorONBook.isbn=Book_author.isbnGROUPBYBook.title;

Example output might resemble the following:

Title                  Authors ---------------------- ------- SQL Examples and Guide 4 The Joy of SQL         1 An Introduction to SQL 2 Pitfalls of SQL        1

Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the Book table, one could re-write the query above in the following form:

SELECTtitle,count(*)ASAuthorsFROMBookNATURALJOINBook_authorGROUPBYtitle;

However, many[ quantify ] vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively.

SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select list to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price.

SELECTisbn,title,price,price*0.06ASsales_taxFROMBookWHEREprice>100.00ORDERBYtitle;

Subqueries

Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a subquery. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function AVG receives as input the result of a subquery:

SELECTisbn,title,priceFROMBookWHEREprice<(SELECTAVG(price)FROMBook)ORDERBYtitle;

A subquery can use values from the outer query, in which case it is known as a correlated subquery.

Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called common table expressions (named and designed after the IBM DB2 version 2 implementation; Oracle calls these subquery factoring). CTEs can also be recursive by referring to themselves; the resulting mechanism allows tree or graph traversals (when represented as relations), and more generally fixpoint computations.

Derived table

A derived table is a subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an inline view or a select in from list.

In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):

SELECTb.isbn,b.title,b.price,sales.items_sold,sales.company_nmFROMBookbJOIN(SELECTSUM(Items_Sold)Items_Sold,Company_Nm,ISBNFROMBook_SalesGROUPBYCompany_Nm,ISBN)salesONsales.isbn=b.isbn

Examples

Table "T"QueryResult
C1C2
1a
2b
SELECT*FROMT;
C1C2
1a
2b
C1C2
1a
2b
SELECTC1FROMT;
C1
1
2
C1C2
1a
2b
SELECT*FROMTWHEREC1=1;
C1C2
1a
C1C2
1a
2b
SELECT*FROMTORDERBYC1DESC;
C1C2
2b
1a
does not existSELECT1+1,3*2;
`1+1``3*2`
26

Given a table T, the querySELECT*FROMT will result in all the elements of all the rows of the table being shown.

With the same table, the query SELECTC1FROMT will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.

With the same table, the query SELECT*FROMTWHEREC1=1 will result in all the elements of all the rows where the value of column C1 is '1' being shown  in relational algebra terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.

With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, SELECT*FROMT1,T2 will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.

Although not in standard, most DBMS allows using a select clause without a table by pretending that an imaginary table with one row is used. This is mainly used to perform calculations where a table is not needed.

The SELECT clause specifies a list of properties (columns) by name, or the wildcard character (“*”) to mean “all properties”.

Limiting result rows

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.

In ISO SQL:2003, result sets may be limited by using

ISO SQL:2008 introduced the FETCH FIRST clause.

According to PostgreSQL v.9 documentation, an SQL window function "performs a calculation across a set of table rows that are somehow related to the current row", in a way similar to aggregate functions. [7] The name recalls signal processing window functions. A window function call always contains an OVER clause.

ROW_NUMBER() window function

ROW_NUMBER() OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:

SELECT*FROM(SELECTROW_NUMBER()OVER(ORDERBYsort_keyASC)ASrow_number,columnsFROMtablename)ASfooWHERErow_number<=10

ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK() window function

The RANK() OVER window function acts like ROW_NUMBER, but may return more or less than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:

SELECT*FROM(SELECTRANK()OVER(ORDERBYageASC)ASranking,person_id,person_name,ageFROMperson)ASfooWHEREranking<=10

The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.

FETCH FIRST clause

Since ISO SQL:2008 results limits can be specified as in the following example using the FETCH FIRST clause.

SELECT*FROMTFETCHFIRST10ROWSONLY

This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and Mimer SQL.

Microsoft SQL Server 2008 and higher supports FETCH FIRST, but it is considered part of the ORDER BY clause. The ORDER BY, OFFSET, and FETCH FIRST clauses are all required for this usage.

SELECT*FROMTORDERBYacolumnDESCOFFSET0ROWSFETCHFIRST10ROWSONLY

Non-standard syntax

Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the simple limit query for different DBMSes are listed:

SETROWCOUNT10SELECT*FROMT
MS SQL Server (This also works on Microsoft SQL Server 6.5 while the Select top 10 * from T does not)
SELECT*FROMTLIMIT10OFFSET20
Netezza, MySQL, MariaDB (also supports the standard version, since version 10.6), SAP SQL Anywhere, PostgreSQL (also supports the standard, since version 8.4), SQLite, HSQLDB, H2, Vertica, Polyhedra, Couchbase Server, Snowflake Computing, OpenLink Virtuoso
SELECT*fromTWHEREROWNUM<=10
Oracle
SELECT FIRST 10 * from T Ingres
SELECT FIRST 10 * FROM T order by a Informix
SELECT SKIP 20 FIRST 10 * FROM T order by c, d Informix (row numbers are filtered after order by is evaluated. SKIP clause was introduced in a v10.00.xC4 fixpack)
SELECT TOP 10 * FROM T MS SQL Server, SAP ASE, MS Access, SAP IQ, Teradata
SELECT*FROMTSAMPLE10
Teradata
SELECT TOP 20, 10 * FROM T OpenLink Virtuoso (skips 20, delivers next 10) [8]
SELECT TOP 10 START AT 20 * FROM T SAP SQL Anywhere (also supports the standard, since version 9.0.1)
SELECT FIRST 10 SKIP 20 * FROM T Firebird
SELECT*FROMTROWS20TO30
Firebird (since version 2.1)
SELECT*FROMTWHEREID_T>10FETCHFIRST10ROWSONLY
IBM Db2
SELECT*FROMTWHEREID_T>20FETCHFIRST10ROWSONLY
IBM Db2 (new rows are filtered after comparing with key column of table T)

Rows Pagination

Rows Pagination [9] is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.

Data in Pagination approach

  • {rows} = Number of rows in a page
  • {page_number} = Number of the current page
  • {begin_base_0} = Number of the row - 1 where the page starts = (page_number-1) * rows

Simplest method (but very inefficient)

  1. Select all rows from the database
  2. Read all rows but send to display only when the row_number of the rows read is between {begin_base_0 + 1} and {begin_base_0 + rows}
Select*from{table}orderby{unique_key}

Other simple method (a little more efficient than read all rows)

  1. Select all the rows from the beginning of the table to the last row to display ({begin_base_0 + rows})
  2. Read the {begin_base_0 + rows} rows but send to display only when the row_number of the rows read is greater than {begin_base_0}
SQLDialect
select*from{table}orderby{unique_key}FETCHFIRST{begin_base_0+rows}ROWSONLY
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
Select*from{table}orderby{unique_key}LIMIT{begin_base_0+rows}
MySQL
SQLite
SelectTOP{begin_base_0+rows}*from{table}orderby{unique_key}
SQL Server 2005
Select*from{table}orderby{unique_key}ROWSLIMIT{begin_base_0+rows}
Sybase, ASE 16 SP2
SETROWCOUNT{begin_base_0+rows}Select*from{table}orderby{unique_key}SETROWCOUNT0
Sybase, SQL Server 2000
Select*FROM(SELECT*FROM{table}ORDERBY{unique_key})awhererownum<={begin_base_0+rows}
Oracle 11


Method with positioning

  1. Select only {rows} rows starting from the next row to display ({begin_base_0 + 1})
  2. Read and send to display all the rows read from the database
SQLDialect
Select*from{table}orderby{unique_key}OFFSET{begin_base_0}ROWSFETCHNEXT{rows}ROWSONLY
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
Select*from{table}orderby{unique_key}LIMIT{rows}OFFSET{begin_base_0}
MySQL
MariaDB
PostgreSQL
SQLite
Select*from{table}orderby{unique_key}LIMIT{begin_base_0},{rows}
MySQL
MariaDB
SQLite
Select*from{table}orderby{unique_key}ROWSLIMIT{rows}OFFSET{begin_base_0}
Sybase, ASE 16 SP2
SelectTOP{begin_base_0+rows}*,_offset=identity(10)into#tempfrom{table}ORDERBY{unique_key}select*from#tempwhere_offset>{begin_base_0}DROPTABLE#temp
Sybase 12.5.3:
SETROWCOUNT{begin_base_0+rows}select*,_offset=identity(10)into#tempfrom{table}ORDERBY{unique_key}select*from#tempwhere_offset>{begin_base_0}DROPTABLE#tempSETROWCOUNT0
Sybase 12.5.2:
selectTOP{rows}*from(select*,ROW_NUMBER()over(orderby{unique_key})as_offsetfrom{table})xxwhere_offset>{begin_base_0}


SQL Server 2005
SETROWCOUNT{begin_base_0+rows}select*,_offset=identity(int,1,1)into#tempfrom{table}ORDERBY{unique-key}select*from#tempwhere_offset>{begin_base_0}DROPTABLE#tempSETROWCOUNT0
SQL Server 2000
SELECT*FROM(SELECTrownum-1as_offset,a.*FROM(SELECT*FROM{table}ORDERBY{unique_key})aWHERErownum<={begin_base_0+cant_regs})WHERE_offset>={begin_base_0}
Oracle 11


Method with filter (it is more sophisticated but necessary for very big dataset)

  1. Select only then {rows} rows with filter:
    1. First Page: select only the first {rows} rows, depending on the type of database
    2. Next Page: select only the first {rows} rows, depending on the type of database, where the {unique_key} is greater than {last_val} (the value of the {unique_key} of the last row in the current page)
    3. Previous Page: sort the data in the reverse order, select only the first {rows} rows, where the {unique_key} is less than {first_val} (the value of the {unique_key} of the first row in the current page), and sort the result in the correct order
  2. Read and send to display all the rows read from the database
First PageNext PagePrevious PageDialect
select*from{table}orderby{unique_key}FETCHFIRST{rows}ROWSONLY
select*from{table}where{unique_key}>{last_val}orderby{unique_key}FETCHFIRST{rows}ROWSONLY
select*from(select*from{table}where{unique_key}<{first_val}orderby{unique_key}DESCFETCHFIRST{rows}ROWSONLY)aorderby{unique_key}
SQL ANSI 2008
PostgreSQL
SQL Server 2012
Derby
Oracle 12c
DB2 12
Mimer SQL
select*from{table}orderby{unique_key}LIMIT{rows}
select*from{table}where{unique_key}>{last_val}orderby{unique_key}LIMIT{rows}
select*from(select*from{table}where{unique_key}<{first_val}orderby{unique_key}DESCLIMIT{rows})aorderby{unique_key}
MySQL
SQLite
selectTOP{rows}*from{table}orderby{unique_key}
selectTOP{rows}*from{table}where{unique_key}>{last_val}orderby{unique_key}
select*from(selectTOP{rows}*from{table}where{unique_key}<{first_val}orderby{unique_key}DESC)aorderby{unique_key}
SQL Server 2005
SETROWCOUNT{rows}select*from{table}orderby{unique_key}SETROWCOUNT0
SETROWCOUNT{rows}select*from{table}where{unique_key}>{last_val}orderby{unique_key}SETROWCOUNT0
SETROWCOUNT{rows}select*from(select*from{table}where{unique_key}<{first_val}orderby{unique_key}DESC)aorderby{unique_key}SETROWCOUNT0
Sybase, SQL Server 2000
select*from(select*from{table}orderby{unique_key})awhererownum<={rows}
select*from(select*from{table}where{unique_key}>{last_val}orderby{unique_key})awhererownum<={rows}
select*from(select*from(select*from{table}where{unique_key}<{first_val}orderby{unique_key}DESC)a1whererownum<={rows})a2orderby{unique_key}
Oracle 11

Hierarchical query

Some databases provide specialised syntax for hierarchical data.

A window function in SQL:2003 is an aggregate function applied to a partition of the result set.

For example,

sum(population)OVER(PARTITIONBYcity)

calculates the sum of the populations of all rows having the same city value as the current row.

Partitions are specified using the OVER clause which modifies the aggregate. Syntax:

<OVER_CLAUSE> :: =     OVER ( [ PARTITION BY <expr>, ... ]            [ ORDER BY <expression> ] ) 

The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.

Query evaluation ANSI

The processing of a SELECT statement according to ANSI SQL would be the following: [10]

  1. selectg.*fromusersuinnerjoingroupsgong.Userid=u.Useridwhereu.LastName='Smith'andu.FirstName='John'
  2. the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1
  3. the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
  4. If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
    selectu.*fromusersuleftjoingroupsgong.Userid=u.Useridwhereu.LastName='Smith'andu.FirstName='John'
    all users who did not belong to any groups would be added back into Vtable3
  5. the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
  6. the GROUP BY is evaluated; if the above query were:
    selectg.GroupName,count(g.*)asNumberOfMembersfromusersuinnerjoingroupsgong.Userid=u.UseridgroupbyGroupName
    vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
  7. the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
    selectg.GroupName,count(g.*)asNumberOfMembersfromusersuinnerjoingroupsgong.Userid=u.UseridgroupbyGroupNamehavingcount(g.*)>5
  8. the SELECT list is evaluated and returned as Vtable 7
  9. the DISTINCT clause is evaluated; duplicate rows are removed and returned as Vtable 8
  10. the ORDER BY clause is evaluated, ordering the rows and returning VCursor9. This is a cursor and not a table because ANSI defines a cursor as an ordered set of rows (not relational).

Window function support by RDBMS vendors

The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.

Generating data in T-SQL

Method to generate data based on the union all

select1a,1bunionallselect1,2unionallselect1,3unionallselect2,1unionallselect5,1

SQL Server 2008 supports the "row constructor" feature, specified in the SQL:1999 standard

select*from(values(1,1),(1,2),(1,3),(2,1),(5,1))asx(a,b)

Related Research Articles

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

A table is a collection of related data held in a table format within a database. It consists of columns and rows.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.

In a database, a view is the result set of a stored query, which can be queried in the same manner as a persistent database collection object. This pre-established query command is kept in the data dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.

A check constraint is a type of integrity constraint in SQL which specifies a requirement that must be met by each row in a database table. The constraint must be a predicate. It can refer to a single column, or multiple columns of the table. The result of the predicate can be either TRUE, FALSE, or UNKNOWN, depending on the presence of NULLs. If the predicate evaluates to UNKNOWN, then the constraint is not violated and the row can be inserted or updated in the table. This is contrary to predicates in WHERE clauses in SELECT or UPDATE statements.

A WHERE clause in SQL specifies that a SQL Data Manipulation Language (DML) statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL DML statements, but can be used to limit the number of rows affected by a SQL DML statement or returned by a query. In brief SQL WHERE clause is used to extract only those results from a SQL statement, such as: SELECT, INSERT, UPDATE, or DELETE statement.

Gadfly is a relational database management system written in Python. Gadfly is a collection of Python modules that provides relational database functionality entirely implemented in Python. It supports a subset of the standard RDBMS Structured Query Language (SQL).

An ORDER BY clause in SQL specifies that a SQL SELECT statement returns a result set with the rows being sorted by the values of one or more columns. The sort criteria does not have to be included in the result set The sort criteria can be expressions, including column names, user-defined functions, arithmetic operations, or CASE expressions. The expressions are evaluated and the results are used for the sorting, i.e., the values stored in the column or the results of the function call.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

In a SQL database query, a correlated subquery is a subquery that uses values from the outer query. Because the subquery may be evaluated once for each row processed by the outer query, it can be slow.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. Microsoft (23 May 2023). "Transact-SQL Syntax Conventions".
  2. MySQL. "SQL SELECT Syntax".
  3. Omitting FROM clause is not standard, but allowed by most major DBMSes.
  4. "Transact-SQL Reference". SQL Server Language Reference. SQL Server 2005 Books Online. Microsoft. 2007-09-15. Retrieved 2007-06-17.
  5. SAS 9.4 SQL Procedure User's Guide. SAS Institute (published 2013). 10 July 2013. p. 248. ISBN   9781612905686 . Retrieved 2015-10-21. Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard.
  6. Leon, Alexis; Leon, Mathews (1999). "Eliminating duplicates - SELECT using DISTINCT". SQL: A Complete Reference. New Delhi: Tata McGraw-Hill Education (published 2008). p. 143. ISBN   9780074637081 . Retrieved 2015-10-21. [...] the keyword DISTINCT [...] eliminates the duplicates from the result set.
  7. PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features
  8. OpenLink Software. "9.19.10. The TOP SELECT Option". docs.openlinksw.com. Retrieved 1 October 2019.
  9. Ing. Óscar Bonilla, MBA
  10. Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka

Sources