BlackWaspTM
SQL Server
SQL 2005+

SQL Server Grouping (2)

The twenty-eighth part of the SQL Server Programming Fundamentals tutorial adds to the description of the use of aggregate functions within queries. In this article the GROUP BY and HAVING clauses are used to summarise groups of similar result rows.

Linking Tables

A GROUP BY clause operates on any data that is gathered by the preceding query. This makes it appropriate for queries that gather information from more than one linked table. However, there are some common mistakes that can occur when aggregating data from joined tables with grouping. When using joins for one-to-many or many-to-many relationships, queries often return some duplicated data from the joined tables. If the duplicated data includes values that are to be included within sum or avg functions they can return unexpected results due to double counting. Similarly, when using the count function, duplicates can be counted twice unless the DISTINCT clause is included.

When using small amounts of data for testing purposes it is possible that you will not notice such calculation errors. You should always pay extra attention when testing aggregates in grouped queries. It can be useful to execute queries without the grouping to help identify areas of potential duplication that are likely to cause miscalculations.

In the following query we join the Engineers and EngineerStock tables to improve the previous query. In this example we include the engineer's name in addition to their unique ID. As both the ID and name are displayed, they are both included in the GROUP BY clause. This does not generate extra groups because there is a one-to-one relationship between an engineer's name and ID.

SELECT
    E.EngineerName,
    S.EngineerId,
    sum(S.UnitsHeld)
FROM
    EngineerStock S
INNER JOIN
    Engineers E
ON
    S.EngineerId = E.EngineerId
GROUP BY
    S.EngineerId,
    E.EngineerName

GROUP BY ALL

The behaviour of the GROUP BY clause can be modified using the ALL keyword. This keyword causes all of the possible groups from a table to be returned, even if the query's WHERE clause filters out all of the results for a particular group. In this case, the group will be displayed but all aggregated values in the row will be null.

GROUP BY ALL has some limitations. Key to these is that it should never be used with remote tables. Additionally, the clause is not supported by Microsoft when used with a WHERE clause. Without such a WHERE clause the addition of the ALL keyword has no meaning. Generally you should not use the ALL variation; it is included in this article only for completeness. Microsoft has indicated that GROUP BY ALL will be removed from future versions of SQL Server.

28 May 2009