
SQL 2005+SQL Server Grouping
The twenty-eighth part of the SQL Server Programming Fundamentals tutorial adds to the description of the use of aggregate functions within queries. In this article the GROUP BY and HAVING clauses are used to summarise groups of similar result rows.
Grouping Results
Is the previous article in this tutorial we examined some of the aggregate functions provided by SQL Server. These functions allow us to count rows, calculate totals and averages, and find the maximum and minimum values across a series of rows returned by a query. In each case, we performed an aggregation over a query's entire result set. This allowed us to answer questions such as "What is the entire sales income for the business?"
A common requirement is to perform similar functions over groups of related rows in a result set, calculating the aggregates for each group to provide a summary. This allows us to refine the questions asked into, for example, "What is the sales income by region for the business?" The results of this query would include one row per group with totals in each.
This type of query can be created using the GROUP BY clause within a SELECT statement. The GROUP BY clause defines one or more columns or expressions that should be used to determine which rows belong to each group. The groups can then be summarised using the appropriate aggregate functions.
JoBS Database
The examples in this article use the JoBS database. This is the database that has been created and populated with information throughout the course of the tutorial. If you do not have an up-to-date copy of the database you should create one with the script that may be downloaded using the link at the top of this page.
Using GROUP BY
The GROUP BY clause is added to the end of a query's SELECT statement. The clause is followed by a comma-separated list of the expressions for which you wish to create grouped sets. Often each item in the list will be a simple column name. However, you can create expressions that combine column names, literal values and functions to create groups.
GROUP BY expression1, expression2, ..., expressionN
As an example, we may wish to calculate the number of units of stock held by each engineer in the JoBS database. To calculate the total number of stocked parts for all of the engineers combined, we would use the following statement:
SELECT sum(UnitsHeld) FROM EngineerStock
To modify the statement to show a subtotal for each engineer individually we can add the engineer ID to the list of returned columns and to a GROUP BY clause. NB: The EngineerId is not required in the column list but without it you will not be able to see the link between each total and the engineer.
SELECT EngineerId, sum(UnitsHeld) FROM EngineerStock GROUP BY EngineerId
As with previous aggregation operations you will see that the calculated column in the results has no column name assigned. As before, we can add a column alias if desired:
SELECT EngineerId, sum(UnitsHeld) AS Units FROM EngineerStock GROUP BY EngineerId
When creating queries that use grouping, the list of columns to be returned in the results is more restricted than for a standard query. The column list may only include aggregated values and columns or expressions that appear in the GROUP BY clause's list. The following query attempts to select the PartNumber column, which meets neither criterion, so causes an error when executed.
SELECT
EngineerId,
PartNumber,
sum(UnitsHeld)
FROM
EngineerStock
GROUP BY
EngineerId
28 May 2009