A GROUP BY
statement in SQL specifies that a SQL SELECT
statement partitions result rows into groups, based on their values in one or several columns. Typically, grouping is used to apply some sort of aggregate function for each group. [1] [2]
The result of a query using a GROUP BY
statement contains one row for each group. This implies constraints on the columns that can appear in the associated SELECT
clause. As a general rule, the SELECT
clause may only contain columns with a unique value per group. This includes columns that appear in the GROUP BY
clause as well as aggregates resulting in one value per group. [3]
Returns a list of Department IDs along with the sum of their sales for the date of January 1, 2000.
SELECTDeptID,SUM(SaleAmount)FROMSalesWHERESaleDate='01-Jan-2000'GROUPBYDeptID
In the following example one can ask "How many units were sold in each region for every ship date?":
Sum of units | Ship date ▼ | |||||
---|---|---|---|---|---|---|
Region ▼ | 2005-01-31 | 2005-02-28 | 2005-03-31 | 2005-04-30 | 2005-05-31 | 2005-06-30 |
East | 66 | 80 | 102 | 116 | 127 | 125 |
North | 96 | 117 | 138 | 151 | 154 | 156 |
South | 123 | 141 | 157 | 178 | 191 | 202 |
West | 78 | 97 | 117 | 136 | 150 | 157 |
(blank) | ||||||
Grand total | 363 | 435 | 514 | 581 | 622 | 640 |
The following code returns the data of the above pivot table which answers the question "How many units were sold in each region for every ship date?":
SELECTRegion,Ship_Date,SUM(Units)ASSum_of_UnitsFROMFlatDataGROUPBYRegion,Ship_Date
Common grouping (aggregation) functions include:
In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.
Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.
Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to the SQL used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements.
A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER
, LEFT OUTER
, RIGHT OUTER
, FULL OUTER
and CROSS
.
The SQL SELECT statement returns a result set of records, from one or more tables.
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three.
An SQL INSERT statement adds one or more records to any single table in a relational database.
A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.
A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.
In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL
is a reserved word used to identify this marker.
A pivot table is a table of values which are aggregations of groups of individual values of a more extensive table within one or more discrete categories. The aggregations or summaries on the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is an outcome of statistically processing on a tabularized raw data and can be used for decision making.
Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.
Gadfly is a relational database management system written in Python. Gadfly is a collection of Python modules that provides relational database functionality entirely implemented in Python. It supports a subset of the standard RDBMS Structured Query Language (SQL).
In database management, an aggregate function or aggregation function is a function where the values of multiple rows in a single column are processed together to form a single summary value.
A HAVING
clause in SQL specifies that an SQL SELECT
statement must only return rows where aggregate values meet the specified conditions.
An ORDER BY
clause in SQL specifies that a SQL SELECT
statement returns a result set with the rows being sorted by the values of one or more columns. The sort criteria does not have to be included in the result set The sort criteria can be expressions, including column names, user-defined functions, arithmetic operations, or CASE
expressions. The expressions are evaluated and the results are used for the sorting, i.e., the values stored in the column or the results of the function call.
Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.
In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.
A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based.
The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.