BlackWasp
SQL Server
SQL 2005+

SQL Server Subqueries

The thirtieth part of the SQL Server Programming Fundamentals tutorial examines subqueries. These can be nested queries allowing complex selections that are not possible using joins. They can also be used to update rows using the contents of other tables.

What are Subqueries?

A subquery is a query that appears within another statement. Generally this is a SELECT statement that is nested within another SELECT, although subqueries may also be used within commands that update data in a table. Usually a subquery returns a single value or one row of data, though there are situations where multiple rows are returned. This information can be used to set a variable's value, as a column in a query or within a WHERE clause of a containing statement.

Subqueries can be nested to several layers deep, with subqueries within other subqueries. This provides the means to generate very complex queries that solve complicated problems. This includes the creation of queries containing multiple tables that are not possible using standard joins. However, this benefit can be a drawback if the queries become too large to be easily understood. It can also lead to inefficient queries that are slow to execute and that should be separated into several smaller queries executed within a stored procedure.

Correlated Subqueries

A standard subquery usually generates a single value that is used in every row of the parent query's results. This allows SQL Server to optimise the statement so that the subquery is only executed once. The performance of such a query can then be comparable to the equivalent query using joins, where this is possible.

A correlated subquery may provide a different answer for each row of the parent query. This may be because it is being used to calculate a value that is based upon the contents of the other columns in each row of the parent query. Correlated subqueries can be very powerful but may exhibit lower performance because they are executed once for each row in the results.

Using Subqueries

In this article we will examine the use of standard and correlated subqueries by executing a series of example statements. Some of the queries are possible to recreate without the use of subqueries, using joins instead. In some cases, the equivalent query using joins would be more efficient. The samples are designed to demonstrate the syntax and use of subqueries. In a real-world scenario you should use judgement and performance testing to determine the most suitable style of query for the problem being addressed.

JoBS Database

The examples in this article use the JoBS database. This is the database that has been generated and populated with data throughout the course of this tutorial. If you do not have an up-to-date copy of the database you can create one using the script provided via the link at the top of this page.

Standard Subqueries

Let's start by running a simple query containing a standard subquery. The query below retrieves a list of all contracts within the JoBS database. The third column in the results is generated by the subquery, which retrieves the total value of all known contracts. This query allows the value of each contract to be compared to the total value. Note the requirement for the subquery to be contained within parentheses (). In this case a name has also been provided for the subquery's column.

SELECT
    ContractNumber,
    ContractValue,
    (SELECT sum(ContractValue) FROM Contracts) AS AllContractsTotal
FROM
    Contracts

In the above statement the query contains a standard subquery. The result of the subquery need only be retrieved once because it is the same for every row in the result set.

The resultant value of a subquery can be used in much the same way as any other value. For example, if we wished to determine the percentage value that each contract contributes towards the company's income, we can use the subquery within a calculation, as in the next query. Again, the result of the subquery does not vary from row to row, even though the result of the calculation does.

SELECT
    ContractNumber,
    ContractValue,
    100 * ContractValue / (SELECT sum(ContractValue) FROM Contracts) AS Percentage
FROM
    Contracts

Subqueries can be used in places other than the column list for a query. In the next example the subquery is used in the WHERE clause. In this case only contract rows that contribute at least 3% of the total contract income are retrieved:

SELECT
    ContractNumber,
    ContractValue,
    100 * ContractValue / (SELECT sum(ContractValue) FROM Contracts) AS Percentage
FROM
    Contracts
WHERE
    ContractValue / (SELECT sum(ContractValue) FROM Contracts) >= 0.03

Correlated Subqueries

Correlated subqueries use information from the main query when retrieving or calculating their values. This leads to the subquery being executed multiple times, as the results can differ from row to row. This does lower the performance of the statement but can lead to very powerful queries that may be difficult or even impossible to implement using joins.

Try executing the following statement that includes two correlated subqueries:

SELECT
    EngineerId,
    EngineerName,
    (SELECT count(*) FROM Jobs WHERE EngineerId = E.EngineerId) AS Jobs,
    (SELECT sum(Duration) * E.HourlyRate FROM Jobs WHERE EngineerId = E.EngineerId)
        AS Cost
FROM
    Engineers E

In the above sample, the main part of the query obtains the full list of engineers, retrieving the ID and name for each row. Note that the Engineers table in the FROM clause has been given the alias, "E". The third column returned by the query is generated by a subquery. In this case the subquery retrieves the number of jobs that have been undertaken by the engineer. The link between the two queries is determined by the WHERE clause of the subquery, which compares the EngineerId from the Jobs table with the EngineerId of the parent query, using the "E" alias.

The fourth column in the query is also generated by a subquery. In this case, the total duration of all of the jobs undertaken by each engineer is multiplied by the engineer's hourly rate to determine a total cost. Again, the "E" alias is used to access the columns of the outer table.

As with standard subqueries, correlated subqueries may also be used within the WHERE clause of a query. The following example adds a WHERE clause that filters the results so that only engineers that have attended jobs are listed.

SELECT
    EngineerId,
    EngineerName,
    (SELECT count(*) FROM Jobs WHERE EngineerId = E.EngineerId) AS Jobs,
    (SELECT sum(Duration) * E.HourlyRate FROM Jobs WHERE EngineerId = E.EngineerId)
        AS Cost
FROM
    Engineers E
WHERE
    (SELECT count(*) FROM Jobs WHERE EngineerId = E.EngineerId) > 0

Using a Subquery to Check if a Row Exists

Some Transact-SQL (T-SQL) commands are designed specifically to work with subqueries. One commonly used function, named "EXISTS", allows the creation of a predicate that ideal for use in a WHERE clause. The function returns true if the subquery passed to it as a parameter returns one or more rows. If the subquery does not generate any results, the function returns false.

In the previous sample, the subquery in the WHERE clause returned the number of jobs for each engineer. If this number was greater than zero, the row was included. It would be more appropriate to use EXISTS for such a query. Below is a similar query that returns a list of engineers that have jobs.

SELECT
    EngineerId,
    EngineerName
FROM
    Engineers E
WHERE
    EXISTS(SELECT * FROM Jobs WHERE EngineerId = E.EngineerId)

The EXISTS command can be used in reverse by prefixing a NOT clause. The next query returns the engineers who have not yet been assigned any jobs:

SELECT
    EngineerId,
    EngineerName
FROM
    Engineers E
WHERE
    NOT EXISTS(SELECT * FROM Jobs WHERE EngineerId = E.EngineerId)

Using IN

A second T-SQL command that uses a subquery is "IN". The subquery provided must return rows with only one column. You can then use the clause to determine whether the value to the left of the IN appears in any of the rows in the subquery to the right.

Consider the following query. The subquery here returns a list of EngineerId values that is filtered so that only IDs linked to the "GAS" skill are included. The main query lists the engineers with IDs that appear in the results of the subquery. Overall, the query returns all of the engineers that have the "GAS" skill. NB: This query could, and indeed probably should, be created using an inner join.

SELECT
    EngineerId,
    EngineerName
FROM
    Engineers E
WHERE
    E.EngineerId IN (SELECT EngineerId FROM EngineerSkills WHERE SkillCode = 'GAS')

You can also retrieve all of the engineers that do not have the "GAS" skill by using NOT IN.

SELECT
    EngineerId,
    EngineerName
FROM
    Engineers E
WHERE
    E.EngineerId NOT IN
    (SELECT EngineerId FROM EngineerSkills WHERE SkillCode = 'GAS')

Using a Subquery to Set Values During an Update

Subqueries are most commonly used within SELECT statements. However, they can be used elsewhere. In the final example below, two subqueries are used within an UPDATE to determine the new values for the rows being modified. In this case, the hourly rate and overtime rate for all engineers are being modified. The new rates are based upon the number of recognised skills for each engineer.

UPDATE Engineers
SET
    HourlyRate = 10 + 5 *
    (SELECT count(*) FROM EngineerSkills WHERE EngineerId = Engineers.EngineerId),
    OvertimeRate = 15 + 7.5 *
    (SELECT count(*) FROM EngineerSkills WHERE EngineerId = Engineers.EngineerId)
Link to this Page5 July 2009
TwitterTwitter RSS Feed RSS