Group by (SQL)

Last updated November 08, 2023

A GROUP BY statement in SQL specifies that a SQL SELECT statement partitions result rows into groups, based on their values in one or several columns. Typically, grouping is used to apply some sort of aggregate function for each group.^[1]^[2]

The result of a query using a GROUP BY statement contains one row for each group. This implies constraints on the columns that can appear in the associated SELECT clause. As a general rule, the SELECT clause may only contain columns with a unique value per group. This includes columns that appear in the GROUP BY clause as well as aggregates resulting in one value per group.^[3]

Examples

Returns a list of Department IDs along with the sum of their sales for the date of January 1, 2000.

SELECTDeptID,SUM(SaleAmount)FROMSalesWHERESaleDate='01-Jan-2000'GROUPBYDeptID

In the following example one can ask "How many units were sold in each region for every ship date?":

Sum of units	Ship date ▼
Region ▼	2005-01-31	2005-02-28	2005-03-31	2005-04-30	2005-05-31	2005-06-30
East	66	80	102	116	127	125
North	96	117	138	151	154	156
South	123	141	157	178	191	202
West	78	97	117	136	150	157
(blank)
Grand total	363	435	514	581	622	640

The following code returns the data of the above pivot table which answers the question "How many units were sold in each region for every ship date?":

SELECTRegion,Ship_Date,SUM(Units)ASSum_of_UnitsFROMFlatDataGROUPBYRegion,Ship_Date

Common groupings

Common grouping (aggregation) functions include:

Count(expression) - Quantity of matching records (per group)
Sum(expression) - Summation of given value (per group)
Min(expression) - Minimum of given value (per group)
Max(expression) - Maximum of given value (per group)
Avg(expression) - Average of given value (per group)

Related Research Articles

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to the SQL used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements.

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

The SQL SELECT statement returns a result set of records, from one or more tables.

An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

A pivot table is a table of values which are aggregations of groups of individual values of a more extensive table within one or more discrete categories. The aggregations or summaries on the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is an outcome of statistically processing on a tabularized raw data and can be used for decision making.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.

Gadfly is a relational database management system written in Python. Gadfly is a collection of Python modules that provides relational database functionality entirely implemented in Python. It supports a subset of the standard RDBMS Structured Query Language (SQL).

In database management, an aggregate function or aggregation function is a function where the values of multiple rows in a single column are processed together to form a single summary value.

A HAVING clause in SQL specifies that an SQL SELECT statement must only return rows where aggregate values meet the specified conditions.

An ORDER BY clause in SQL specifies that a SQL SELECT statement returns a result set with the rows being sorted by the values of one or more columns. The sort criteria does not have to be included in the result set The sort criteria can be expressions, including column names, user-defined functions, arithmetic operations, or CASE expressions. The expressions are evaluated and the results are used for the sorting, i.e., the values stored in the column or the results of the function call.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.

A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

↑ "SQL GROUP BY Statement". www.w3schools.com. Retrieved 2020-09-18.
↑ shkale-msft. "GROUP BY (Transact-SQL) - SQL Server". docs.microsoft.com. Retrieved 2020-09-18.
↑ "SQL Grouping and Aggregation". databaselecture.com. Retrieved 2020-12-09.

External links

SQL Snippets: SQL Features Tutorials - Grouping Rows with GROUP BY

This database-related article is a stub. You can help Wikipedia by expanding it.

This programming-language-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "SQL GROUP BY Statement". www.w3schools.com. Retrieved 2020-09-18.

[2] shkale-msft. "GROUP BY (Transact-SQL) - SQL Server". docs.microsoft.com. Retrieved 2020-09-18.

[3] "SQL Grouping and Aggregation". databaselecture.com. Retrieved 2020-12-09.

[1]

[2]

[3]

v t e SQL
Versions	SQL-86 SQL-89 SQL-92 SQL:1999 SQL:2003 SQL:2006 SQL:2008 SQL:2011 SQL:2016 SQL:2023
Keywords	As Case Create Delete From Group by Having Insert Join Merge Null Order by Over Prepare Select Truncate Union Update Where With
Related	Edgar Codd Relational database
ISO/IEC SQL parts	Framework Foundation Call-Level Interface Persistent Stored Modules Management of External Data Object Language Bindings Information and Definition Schemas SQL Routines and Types for the Java Programming Language XML-Related Specifications