Transact-SQL (T-SQL) includes several aggregate functions that can be used in queries. An aggregate function calculates a value based upon the contents of multiple result rows. The calculated values of aggregations are returned in columns in a result set.
In this article we will examine the five most commonly used aggregate functions. These allow you to count the rows returned by a query, calculate a sum or average value and determine the maximum and minimum values in a column. There are further aggregations for calculating checksums and statistical values but these are used less often and will not be examined here. We will consider simple examples that perform calculations across entire result sets in this article. In the next article we will introduce grouping, which allows multiple aggregate values to be generated for groups of related rows in a result set.
The examples in this article use queries that are executed against the JoBS tutorial database, which has been constructed and populated through the course of these articles. If you do not have an up-to-date version of the database, you can create one using the script that can be downloaded using the link at the top of this page.
The first aggregation function that we will examine is count. In its simplest form, this function returns the number of rows in a table as an integer value. For example, to count all of the rows in the CustomerAddresses table you can execute the following query. This should return a value of twenty-one:
SELECT count(*) FROM CustomerAddresses
If you have a query that may count billions of rows, an int value may not be large enough. In this case, you may use the count_big function. The functionality is almost identical to count, except that the returned result is a bigint value.
SELECT count_big(*) FROM CustomerAddresses
The above syntaxes use an asterisk to specify that all rows in the resultant data will be counted irrespective of the data that they contain. If you replace the asterisk character with the name of a column, the functionality is modified slightly. In that case, all rows that contain information in the named column are counted. Any rows with a NULL value in the stated column are excluded from the count.
The CustomerAddresses table contains several columns for each customer address. The second of these, "Address2", is a nullable column so can be used in a demonstration. Run the following query to see that the NULL values are not counted:
SELECT count(Address2) FROM CustomerAddresses
In addition to counting only non-null values, you may decide that you want to count only distinct, non-null values. This can be achieved by adding the DISTINCT clause to the column name. The following query should return a count of nineteen rows even though there are twenty-one values in the "TownOrCity" column. As the Leeds and Manchester values are duplicated, each is only counted once.
SELECT count(DISTINCT TownOrCity) FROM CustomerAddresses
Using Column Aliases
You will have noticed that the name of the calculated column is generated automatically by SQL Server when running aggregation queries. As with other types of query, you can include an alias to give a name to the calculated columns. The next query calculates a count of distinct cities and a count of all addresses. To easily identify the two values, the columns are named.
SELECT count(DISTINCT TownOrCity) AS Cities, count(*) AS Addresses
NB: This query includes two aggregate functions. It is acceptable to add many aggregated columns to a query. However, you may not mix aggregated and non-aggregated columns in a query without the use of grouping.
So far, the queries we have executed have counted the rows from an entire table. It is possible to include a WHERE clause in a query to filter the results before performing a calculation. In the next example, only addresses within the city of Leeds are counted. This should return a value of two.
SELECT count(*) FROM CustomerAddresses WHERE TownOrCity='Leeds'
The second aggregate function that we will consider is sum. This function can be applied to numeric columns to add all of the non-null values together and return the total. The syntax is similar to that of the count function but you may not use an asterisk between the parentheses.
To obtain the total value for all of the contracts in the JoBS database, we can use the following T-SQL statement:
SELECT sum(ContractValue) FROM Contracts
The value being totalled need not be a simple column name. You may include a mathematical expression to be calculated for each row in the table before the summing occurs. A very simplistic expression could double the contract value before totalling.
NB: This query is for example purposes only. In reality it would be more efficient to calculate the sum and double the final result.
SELECT sum(ContractValue * 2) FROM Contracts
As will the count function, you may decide to only sum distinct results using the DISTINCT clause.
SELECT sum(DISTINCT ContractValue) FROM Contracts
The avg function calculates the arithmetic mean of a column or expression. This is achieved by summing all of the values and dividing by the number of rows. However, as with all other aggregate functions, any NULL values are ignored. This is important when calculating the average as it can produce unexpected results. For example, if the column being processed contains the values 1, 2, 3, 4 and NULL the sum of the values is 10. You may expect that the average would be 2 as there are five rows. Actually, the result of the avg function will be 2.5 as the total is divided by the number of non-null rows (4), not the overall number of rows (5).
To calculate the average contract value for all contracts, run the following query:
SELECT avg(ContractValue) FROM Contracts
NB: As with the sum function, you can use the DISTINCT clause.
Maximum and Minimum Values
The last two aggregate functions that we will consider are min and max. These return the smallest and largest values respectively when used with numeric columns. These functions may also be applied to character-based columns. Used in this manner, they return the first or last item in the column according to the sort sequence of the table.
SELECT max(ContractValue), min(ContractValue) FROM Contracts
14 May 2009