 SQL 2005+SQL Server Summaries with ROLLUP and CUBE
The twenty-ninth part of the SQL Server Programming Fundamentals tutorial describes the ROLLUP and CUBE clauses, which can be used when grouping data. These clauses create additional summary rows containing aggregated values from multiple groups of rows.
Summarising Grouped Data
In the previous article in this series we investigated the use of the GROUP BY clause to create queries that performed aggregation of data for groups of related rows. In the article I gave the example that you could use this grouping to calculate a business' sales income for each targeted region and list the results using a single query. Without the grouping the same aggregate functions could be used to calculate the total sales income for the entire business. However, to obtain both the subtotals and the overall total would require two separate queries to be executed. If there were multiple groups to calculate subtotals across, for example sales region and sales person, to obtain the subtotals required to populate a simple pivot table would require further queries.
In this article we will look at two clauses that can be added to a query containing a GROUP BY clause. The WITH ROLLUP and WITH CUBE clauses assist in summarising information in the manner described above. In addition to the totals for each grouped row, new rows are added to the results to provide grand totals. NB: Although the examples in this article use the "sum" aggregate function, the clauses can also be used with count, avg, min and max.
JoBS Database
The examples in this article refer to the JoBS tutorial database. This is the database that has been created and populated with data throughout the course of this series of articles. If you do not have an up-to-date copy of the database you can create one using the script that may be downloaded via the link at the top of this page.
Using ROLLUP
Let's start with a very simple example of the WITH ROLLUP clause that can be built upon as we progress through the article. If we wish to find the total number of stock items held by each engineer in the JoBS database, we can use a query with a GROUP BY clause. The query below obtains this information by summing the UnitsHeld value from the EngineerStock table, with grouping according to the engineer holding the stock.
SELECT
E.EngineerName,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
E.EngineerName
The above query finds the subtotals of stock levels for each engineer but does not find the total stock held for all of the engineers combined. To obtain this, we must execute the aggregate function without any grouping, as follows:
SELECT sum(UnitsHeld) FROM EngineerStock
If we wished to combine the results from the two queries above, we could add a new first column to the second query, possibly using a literal VarChar value of 'Total'. This would make the column lists compatible so that the queries' result sets could be combined using a UNION. However, using two separate queries would be inefficient. Instead, we can use a rollup.
A Simple ROLLUP
When only one column is used for grouping, a rollup adds one new row to the query's output. This row contains aggregated values for the entire result set. If you are using the sum function, the new row includes the total of all of the rows returned by the query. For other aggregate functions the appropriate result for the entire result set is calculated.
To use the rollup functionality, add the WITH ROLLUP clause to the end of the query:
SELECT
E.EngineerName,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
E.EngineerName
WITH ROLLUP
The results from the above query should contain the grand total for all of the rows. In the grouping column, "EngineerName", the new row has a NULL value. Sometimes this is useful and sometimes you will want to replace the NULL with more descriptive text. This process will be explained shortly.
Using Multiple Grouping Columns
The ROLLUP clause become more interesting when used in queries that have grouping across several columns. In these cases, the grouping columns are used to generate a hierarchy. The first grouping column is the root of the hierarchy and each subsequent grouping column adds another layer. Totals are then added at each level of the hierarchy.
For example, if we were to modify the previous query so that it grouped by the engineer's name, then by the stock item's part number, subtotals would be created for each engineer and a grand total created for all of the engineers in the results. These would be the rows added by the clause that are supplementary to the totals calculated for each combination of engineer and part.
You can picture this by placing the data into a simple pivot table. Below there are three engineers and three products. We can see grand totals for each engineer but not for each product. The missing totals are shown in the table as question marks (?).
| Part / Engineer | Engineer A | Engineer B | Engineer C | Totals |
|---|
| Part 1 | 10 | 16 | 20 | ? |
|---|
| Part 2 | 8 | 10 | 7 | ? |
|---|
| Part 2 | 12 | 11 | 23 | ? |
|---|
| Totals | 30 | 37 | 50 | 117 |
|---|
To show this using the data from the JoBS database, execute the following query. This will not, of course, be displayed as a pivot table.
SELECT
E.EngineerName,
S.PartNumber,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
E.EngineerName,
S.PartNumber
WITH ROLLUP
If the order of grouping is altered, the totals that are calculated also change. If we change the grouping to be by part first, then by engineer, we lose the engineer totals but gain summaries for each part. Again, we can see this by pivoting the data:
| Part / Engineer | Engineer A | Engineer B | Engineer C | Totals |
|---|
| Part 1 | 10 | 16 | 20 | 46 |
|---|
| Part 2 | 8 | 10 | 7 | 25 |
|---|
| Part 2 | 12 | 11 | 23 | 46 |
|---|
| Totals | ? | ? | ? | 117 |
|---|
The query for this is as follows, note the changed order of grouping columns.
SELECT
S.PartNumber,
E.EngineerName,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
S.PartNumber,
E.EngineerName
WITH ROLLUP
The Grouping Function
When summarising information using the ROLLUP clause, SQL Server provides a new function named "Grouping". This function requires a single parameter containing the name of one of the grouping columns in the query. The function then returns either 1 or 0 for every row in the results set. If the named column is aggregated for the row, meaning that the rollup has inserted a NULL value into the column for that row, the value of the function will be 1. If not, it will be zero.
We can see the Grouping function in action by executing the following query. This query adds two columns containing the Grouping results for PartNumber and EngineerName. Scan the results to confirm that wherever a NULL is inserted by the ROLLUP, a corresponding 1 appears in the matching Grouping column.
SELECT
S.PartNumber,
E.EngineerName,
sum(S.UnitsHeld),
grouping(S.PartNumber) AS PartGrouping,
grouping(E.EngineerName) AS EngineerGrouping
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
S.PartNumber,
E.EngineerName
WITH ROLLUP
Using the CASE Statement and Grouping Function to Replace NULL Values
With the use of the Grouping function, we can determine whether a NULL has been generated for a grouping column by the summarising process. We can use this knowledge to replace the NULL with a more suitable value when necessary. To do this we will use a conditional statement named CASE. I will not describe the CASE statement in detail in this article; as it is covered later in the tutorial. In this instance we will simply describe one usage, which allows a value to be tested and used to determine which of two other values to return in the results.
We will use two CASE statements. The first will be for the PartNumber column. The condition being checked, or predicate, will be whether the Grouping function for the column returns 1. If it does, the literal text, 'ALL ENGINEERS' will be returned for the row. If not, the column value from the table will be returned. The syntax for this condition is:
CASE WHEN grouping(S.PartNumber)=1 THEN 'ALL PARTS' ELSE S.PartNumber END
As you can see, the predicate appears after the WHEN clause. The two possible answers appear after the THEN and ELSE clauses. This makes the statement easy to read in English as "When the grouping value of the PartNumber column is 1 then return 'ALL PARTS'. Otherwise, return the value of the PartNumber column". The second CASE statement will be similar to the first, except that it will operate on the engineer's name.
To demonstrate, execute the following query. You should see several occurrences of the ALL ENGINEERS text and one grand total row containing both ALL PARTS and ALL ENGINEERS.
SELECT
CASE WHEN grouping(S.PartNumber)=1
THEN 'ALL PARTS' ELSE S.PartNumber END,
CASE WHEN grouping(E.EngineerName)=1
THEN 'ALL ENGINEERS' ELSE E.EngineerName END,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
S.PartNumber,
E.EngineerName
WITH ROLLUP
NB: Sometimes you will find queries where the NULLs are replaced with standard text using the ISNULL function. This gives the correct results when the grouped columns cannot contain NULL values. However, if they can contain NULLs, these will be replaced too, potentially giving misleading results.
Using CUBE
For our final example we will use the WITH CUBE clause. This is similar to the WITH ROLLUP clause in its operation as it adds new summary rows for groups of results. However, rather than creating a hierarchy of results, a cube contains totals for every possible permutation of group. Returning to our pivot table layout, every total will be calculated.
| Part / Engineer | Engineer A | Engineer B | Engineer C | Totals |
|---|
| Part 1 | 10 | 16 | 20 | 46 |
|---|
| Part 2 | 8 | 10 | 7 | 25 |
|---|
| Part 2 | 12 | 11 | 23 | 46 |
|---|
| Totals | 30 | 37 | 50 | 117 |
|---|
To show the results, execute the following query. This is the same as the previous query except that it uses CUBE instead of ROLLUP. You can see that the result set contains subtotals for every engineer and for every part individually.
SELECT
CASE WHEN grouping(S.PartNumber)=1
THEN 'ALL PARTS' ELSE S.PartNumber END,
CASE WHEN grouping(E.EngineerName)=1
THEN 'ALL ENGINEERS' ELSE E.EngineerName END,
sum(S.UnitsHeld)
FROM
EngineerStock S
INNER JOIN
Engineers E
ON
S.EngineerId = E.EngineerId
GROUP BY
S.PartNumber,
E.EngineerName
WITH CUBE
|