BlackWasp
SQL Server
SQL 2005+

SQL Server Ranking Functions

The thirty-first part of the SQL Server Programming Fundamentals tutorial describes the use of ranking functions within SQL queries. These functions add a new column to the results of a query, applying a ranking value to every row that is returned.

What are Ranking Functions?

Ranking functions were first introduced in SQL Server 2005. They are used within the column list of a query to create a new column containing an integer value for each row in a set of results. For example, the Row_Number function generates a unique, sequential number for each row. Other functions may generate duplicate values depending upon their usage and the data they are operating against.

Considerations

When a ranking function is used, a sort order must be specified so that the rows can be appropriately ranked. This is not necessarily linked to the final ordering of the results. However, sorting operations can impact performance so these functions should be used with due consideration.

The values generated by a ranking function are non-deterministic, as they are not linked to the data in the associated rows. This restricts their use in indexed views.

Using Ranking Functions

In this article we will examine ranking functions and execute some example queries. The examples in this article use the JoBS database. This is the database that has been created and expanded throughout the course of this tutorial. To set up your database, download the script using the link at the top of this page. Execute the script to create the database and populate it with data. It is important that even if you have an up-to-date version of the JoBS database that you recreate it with this script to undo changes made in the previous article.

Using Row_Number

The first ranking function that we will look at is Row_Number. This function generates a unique integer value for every row in the results. The first row is assigned a value of one and each subsequent row increments the number. The syntax for this function is interesting as it introduces a new clause that will be seen in all of the ranking functions:

SELECT row_number() OVER (ORDER BY column-name), columns FROM table-name

The OVER clause specifies the order of the rows that should be used when generating values with the ranking function. Following the OVER keyword, the sort order to apply is provided in parentheses. In the following example, a row number is applied to each row from the Engineer table. In this case the engineer with the first name, when sorted alphabetically, receives a row number of 1.

NB: The OVER clause also allows the data to be partitioned. This divides the results into sections and resets the ranking for each section. However, this is beyond the scope of this article.

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    row_number() OVER (ORDER BY EngineerName) AS RowNumber
FROM
    Engineers

If you wish, you can add more than one ranking function to a query. Often each function will use the same sort order within the OVER clause. However, it is possible to use a different order for the each ranking, as in the following example:

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    row_number() OVER (ORDER BY EngineerName) AS Forward,
    row_number() OVER (ORDER BY EngineerName DESC) AS Reverse
FROM
    Engineers

In addition to the ordering used for the ranking, you can specify the order in which you wish to retrieve the results. In the following example the ranking is still based upon the engineer's name. However, the results are sorted according to the engineer's unique ID.

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    row_number() OVER (ORDER BY EngineerName) AS RowNumber
FROM
    Engineers
ORDER BY
    EngineerId

Using Rank

The Rank function is used with a similar syntax to Row_Number. The results are ranked according to the sort order, with each row being given a number that is one greater than the number of results that appear before it. Where there are two or more items that are equivalent in the sort order they are each given the same rank. For example, if the first two items in the list are equivalent, they are both given a rank of one. The next item in the list will have a rank of three.

The table below shows the rank compared to the row number for a group of salespeople. Each ranking function in the grid is applied with a sort order based upon descending sales value:

SalespersonSalesRow_NumberRank
Bob£21,00011
Jim£20,00022
Sam£20,00032
Tim£18,00044
Tom£16,00055
Liz£16,00065
Vic£16,00075
Mel£14,00088

To demonstrate, execute the following statement against the JoBS database. This query lists all of the engineers ranked by their hourly charge rate.

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    rank() OVER (ORDER BY HourlyRate) AS Rank
FROM
    Engineers

Using Dense_Rank

The Dense_Rank function is very similar to Rank. The key difference is that there are no gaps between the rank values when duplicates are encountered. If the first two values in the sort order are identical, each will be given a value of one. The third value would have a dense rank of two, not three.

SalespersonSalesRow_NumberRankDense_Rank
Bob£21,000111
Jim£20,000222
Sam£20,000322
Tim£18,000443
Tom£16,000554
Liz£16,000654
Vic£16,000754
Mel£14,000885

The following query generates the dense rank for the engineers, again based upon hourly rates.

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    dense_rank() OVER (ORDER BY HourlyRate) AS DenseRank
FROM
    Engineers

Using Ntile

The Ntile function splits the results of a query into a number of groups. Each group is then given a number that is applied to every row within the group. The numbering starts with group one and is incremented for every additional group. The syntax for the statement is similar to that of the previous functions with the addition of a parameter to specify the number of groups that should be generated.

Where possible, each group will contain exactly the same number of rows. However, if the number of rows is not divisible by the number of groups, the later groups will be one row smaller than the first group. The table below shows this in the final two columns. The penultimate column divides the results into four equal groups. The final column has three groups so the third group contains two rows compared to the first two groups' three rows.

SalespersonSalesRow_NumberRankDense_RankNtile(4)Ntile(3)
Bob£21,00011111
Jim£20,00022211
Sam£20,00032221
Tim£18,00044322
Tom£16,00055432
Liz£16,00065432
Vic£16,00075443
Mel£14,00088543

We can modify the previous SQL statement to use Ntile. In the following example, the engineers are split into four groups:

SELECT
    EngineerId,
    EngineerName,
    HourlyRate,
    ntile(4) OVER (ORDER BY HourlyRate) AS GroupNumber
FROM
    Engineers
Link to this Page4 August 2009
TwitterTwitter RSS Feed RSS