BlackWasp
SQL Server
SQL 2005+

Transact-SQL Joins

The nineteenth part of the SQL Server Programming Fundamentals tutorial returns to the topic of querying. In this article, we examine joins using Transact-SQL (T-SQL). Joins allow data from several tables to be combined in a single set of results.

Joins

In the previous two articles we have examined the process of normalising a database and creating foreign key relationships between tables that are logically linked. When you are working with a normalised relational database, it is essential that you can perform queries that gather data from several linked tables but return the matching data in a single set of results. This is achieved with the use of joins.

A join is a clause within a SELECT statement that specifies that the results will be obtained from two tables. It usually includes a join predicate, which is a conditional statement that determines exactly which rows from each of the tables will be joined by the query. Usually the join will be based upon a foreign key relationship and will only return combined results from the two tables when the key values in both tables match. However, it is possible to join tables based upon non-key values or even perform a cross join that has no predicate and returns all possible combinations of values from the two tables.

Usually normalisation results in a database with many related tables. You may therefore want to join more than two tables for a single set of results. In this case, you can include multiple joins in a SELECT statement. The first join combines the first two tables. The second join combines the results of the first join with a third table, and so on.

JoBS Database

The examples in this article use the JoBS tutorial database. If you do not have a copy of the JoBS database available, download the script using the link at the top of this page. You can execute the script to create and populate a new database.

Performing Queries with Joins

Inner Joins

Possibly the most common form of join that you will use is the inner join. With this type of join, the two tables are combined based upon a join predicate. Wherever a row in one table matches a row in the other, the two rows are combined and added to the outputted results. If a row in either table matches several in the other, each combination will be included in the results. If a row in either table does not match any in the other table, it will be excluded from the results altogether.

The join clause, second table name and join predicate are included in the SELECT statement immediately after the name of the first table. The join uses the INNER JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT statement is as follows:

SELECT columns FROM table-1 INNER JOIN table-2 ON predicate

As an example, we can join the Jobs and Engineers tables. The following query returns a list of jobs and the details of the engineer that performed the work. In this case, the predicate specifies that the tables will be joined only where the EngineerId in both tables is a match. These columns are used in a foreign key relationship definition for the two tables, although this is not a requirement for the join to be executed. However, if non-key columns are frequently used in join operations, you should consider adding appropriate indexes to improve query performance.

SELECT * FROM Jobs INNER JOIN Engineers ON Jobs.EngineerId = Engineers.EngineerId

The query returns four results as each of the four jobs in the database has an engineer associated with it. However, not all of the engineers have been assigned work so the twenty-one that have not undertaken a job are not included in the results.

The example query returns every column from both tables. For larger tables, or when using multiple joins, this may include a lot of data that you do not require. In addition to making the results more difficult to read, it can also lengthen the processing time for the query and increase the amount of network traffic generated. You should therefore only return the columns that you require. This can be achieved by specifying a column list as usual. However, if the two tables include columns with the same name, the ambiguity can cause an error. Try executing the following query:

SELECT
    EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

This query fails because the EngineerId column appears in both tables. You must therefore specify which table's EngineerId column you require by prefixing it with the table name and a full-stop (period) character. The following query resolves the problem:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
    FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As with the column selection, you must specify the table name for any columns with ambiguous names. For example:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
    FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId
WHERE
    Jobs.EngineerId = 4
OR
    Jobs.EngineerId = 8
ORDER BY
    Jobs.EngineerId

Implicit Inner Joins

A second syntax can be used for creating inner joins without providing a join clause. This syntax is known as an implicit inner join. In this case, the names of the tables to be joined are provided in a comma-delimited list. The join predicate becomes part of the WHERE clause.

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs, Engineers
WHERE
    Jobs.EngineerId = Engineers.EngineerId

Generally speaking, the syntax that you use for inner joins is a matter of personal preference or will be proscribed by your organisation's coding standards. Many people prefer the explicit syntax as the join predicates are separated from those in the WHERE clause and are kept closer to the names of the tables that they are acting upon. It is also easier to change the type of join with the explicit syntax. It is useful to understand both variations, as you are likely to encounter each syntax in real-world scenarios.

Using Table Aliases

Providing the full name of the table as a prefix to column names can lead to long query statements. This is especially true if, to make the query more easily understood, you prefer to prefix every column name rather than just those that are ambiguous. In such circumstances, it is useful to replace the table names with table aliases.

A table alias provides a short code, usually of one or two letters, that can be used in place of a table name. Each alias is defined in the query after the full table name. In the next example, the "J" alias represents the Jobs table and the Engineers table has the alias "E". This makes the query much more readable.

SELECT
    J.EngineerId,
    J.JobId,
    J.VisitDate,
    E.EngineerName
FROM
    Jobs J
INNER JOIN
    Engineers E
ON
    J.EngineerId = E.EngineerId

Self-Referencing Joins

Table aliases are always required when creating self-referencing joins, where the two tables that are being joined are actually the same table. In the following example, the Jobs table is being joined to itself so that the initial job and follow up job can be combined in a single row in the results. The initial job's table has an alias of "J", whilst the follow up job uses the "JF" alias.

SELECT
    J.JobId,
    J.StandardJobId,
    J.EngineerId,
    JF.JobId AS FollowUpJobId,
    JF.StandardJobId AS FollowUpStandardJobId,
    JF.EngineerId AS FollowUpEngineerId
FROM
    Jobs J
INNER JOIN
    Jobs JF
ON
    J.FollowUpJobId = JF.JobId

Joining Multiple Tables

When you wish to join more than two tables, you can simply add further INNER JOIN and ON clauses to the query. For example, in the JoBS database there is a many-to-many link between Engineers and their Skills via a junction table named "EngineerSkills". We can join the three tables to generate a list that contains all engineers that have skills with one row for every engineer and skill combination. The following query returns these results in order of skill name. Note that some engineers appear more than once in the list because they have multiple skills.

SELECT
    E.EngineerName,
    E.HourlyRate,
    E.OvertimeRate,
    S.SkillName
FROM
    Engineers E
INNER JOIN
    EngineerSkills ES
ON
    E.EngineerId = ES.EngineerId
INNER JOIN
    Skills S
ON
    ES.SkillCode = S.SkillCode
ORDER BY
    S.SkillName

Outer Joins

Outer joins use a similar syntax to explicit inner joins but provide different results in some cases. The key difference is that outer joins include all of the rows from one or both tables, even if there are no matching rows defined by the join predicate. Where information is missing, NULL values are substituted in the columns of the returned results. The easiest way to explain the difference is with an example.

Firstly, let's consider an inner join. The query below joins the CustomerComplaints table to the Engineers table so that each complaint can be shown alongside the engineer that was at fault. Although there are three complaints in the database, only one row is returned by the query. This is because the other two rows define complaints that were deemed not to be the engineer's fault. In these cases the EngineerId column in the CustomerComplaints table is set to NULL.

SELECT
    E.EngineerName,
    C.Complaint
FROM
    CustomerComplaints C
INNER JOIN
    Engineers E
ON
    C.EngineerId = E.EngineerId
EngineerNameComplaint
Joey OharaEngineer did not have appropriate parts and was rude.

This may not be the information that we require from the query. If we actually want to list every complaint in the database, showing the engineer's name only where an engineer is associated with the complaint, we must use an outer join. In this case, we will use a left outer join. This indicates that all of the values from the table to the left of the join clause will be returned. If there is no associated row from the table on the right of the join, the columns from that table will be included but will contain NULL values.

To demonstrate, execute the following command, noting that the join clause has been modified to LEFT JOIN:

SELECT
    E.EngineerName,
    C.Complaint
FROM
    CustomerComplaints C
LEFT JOIN
    Engineers E
ON
    C.EngineerId = E.EngineerId

The results from this query contain all three complaints. For the two complaints not associated with an engineer, the EngineerName column's value is NULL.

EngineerNameComplaint
NULLCustomer has received an incorrect charge on their direct debit account.
NULLCustomer does not wish to receive direct marketing information.
Joey OharaEngineer did not have appropriate parts and was rude.

The opposite of a left outer join is a right outer join. As you may imagine, this returns all of the rows from the table on the right of the join clause with null values for columns from the table to the left of the clause when no matching row exists. If you alter the previous query to use a right join, all twenty-five engineers will be returned. One of these engineers, Joey Ohara, will have a complaint listed, whilst all of the others will have NULL in the Complaint column.

SELECT
    E.EngineerName,
    C.Complaint
FROM
    CustomerComplaints C
RIGHT JOIN
    Engineers E
ON
    C.EngineerId = E.EngineerId

The third style of outer join is the full outer join. This type of join ensures that every row from both tables is included in the final results. If any row in either table does not have a matching partner, the row is populated with NULL values for the missing data. Try modifying the query to use a full join as shown below. This time you will see twenty-seven returned rows. One will be a complaint with a matching engineer, two will be complaints without engineers and twenty-four will be for engineers with no complaints made against them.

SELECT
    E.EngineerName,
    C.Complaint
FROM
    CustomerComplaints C
FULL JOIN
    Engineers E
ON
    C.EngineerId = E.EngineerId

Cross Joins

Cross joins are the simplest form of join and possibly the least used. A cross join does not define a join predicate. This means that the results contain every possible combination from the two tables. This is the Cartesian product of the table so this type of join is often called a Cartesian join.

If we modify the outer join query to provide a cross join, all combinations of data are returned. As there are twenty-five rows in the Engineers table and three rows in the CustomerComplaints table, the resultant list contains seventy-five rows.

SELECT
    E.EngineerName,
    C.Complaint
FROM
    CustomerComplaints C
CROSS JOIN
    Engineers E

NB: Cross joins should be used with care, particularly with large tables as the number of results can grow very quickly.

Implicit Cross Joins

As with inner joins, cross joins have an implicit syntax variation. This is the same syntax as for inner joins but with no join predicate in the WHERE clause:

SELECT E.EngineerName, C.Complaint FROM CustomerComplaints C, Engineers E
Link to this Page28 December 2008
TwitterTwitter RSS Feed RSS