BlackWaspTM
SQL Server
SQL 2005+

Transact-SQL Joins (2)

The nineteenth part of the SQL Server Programming Fundamentals tutorial returns to the topic of querying. In this article, we examine joins using Transact-SQL (T-SQL). Joins allow data from several tables to be combined in a single set of results.

Performing Queries with Joins

Inner Joins

Possibly the most common form of join that you will use is the inner join. With this type of join, the two tables are combined based upon a join predicate. Wherever a row in one table matches a row in the other, the two rows are combined and added to the outputted results. If a row in either table matches several in the other, each combination will be included in the results. If a row in either table does not match any in the other table, it will be excluded from the results altogether.

The join clause, second table name and join predicate are included in the SELECT statement immediately after the name of the first table. The join uses the INNER JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT statement is as follows:

SELECT columns FROM table-1 INNER JOIN table-2 ON predicate

As an example, we can join the Jobs and Engineers tables. The following query returns a list of jobs and the details of the engineer that performed the work. In this case, the predicate specifies that the tables will be joined only where the EngineerId in both tables is a match. These columns are used in a foreign key relationship definition for the two tables, although this is not a requirement for the join to be executed. However, if non-key columns are frequently used in join operations, you should consider adding appropriate indexes to improve query performance.

SELECT * FROM Jobs INNER JOIN Engineers ON Jobs.EngineerId = Engineers.EngineerId

The query returns four results as each of the four jobs in the database has an engineer associated with it. However, not all of the engineers have been assigned work so the twenty-one that have not undertaken a job are not included in the results.

The example query returns every column from both tables. For larger tables, or when using multiple joins, this may include a lot of data that you do not require. In addition to making the results more difficult to read, it can also lengthen the processing time for the query and increase the amount of network traffic generated. You should therefore only return the columns that you require. This can be achieved by specifying a column list as usual. However, if the two tables include columns with the same name, the ambiguity can cause an error. Try executing the following query:

SELECT
    EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

This query fails because the EngineerId column appears in both tables. You must therefore specify which table's EngineerId column you require by prefixing it with the table name and a full-stop (period) character. The following query resolves the problem:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As with the column selection, you must specify the table name for any columns with ambiguous names. For example:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId
WHERE
    Jobs.EngineerId = 4
OR
    Jobs.EngineerId = 8
ORDER BY
    Jobs.EngineerId
28 December 2008