 SQL 2005+Transact-SQL Joins
The nineteenth part of the SQL Server Programming Fundamentals tutorial returns to the topic of querying. In this article, we examine joins using Transact-SQL (T-SQL). Joins allow data from several tables to be combined in a single set of results.
Joins
In the previous two articles we have examined the process of normalising a database and creating foreign key relationships between tables that are logically linked. When you are working with a normalised relational database, it is essential that you can perform queries that gather data from several linked tables but return the matching data in a single set of results. This is achieved with the use of joins.
A join is a clause within a SELECT statement that specifies that the results will be obtained from two tables. It usually includes a join predicate, which is a conditional statement that determines exactly which rows from each of the tables will be joined by the query. Usually the join will be based upon a foreign key relationship and will only return combined results from the two tables when the key values in both tables match. However, it is possible to join tables based upon non-key values or even perform a cross join that has no predicate and returns all possible combinations of values from the two tables.
Usually normalisation results in a database with many related tables. You may therefore want to join more than two tables for a single set of results. In this case, you can include multiple joins in a SELECT statement. The first join combines the first two tables. The second join combines the results of the first join with a third table, and so on.
JoBS Database
The examples in this article use the JoBS tutorial database. If you do not have a copy of the JoBS database available, download the script using the link at the top of this page. You can execute the script to create and populate a new database.
Performing Queries with Joins
Inner Joins
Possibly the most common form of join that you will use is the inner join. With this type of join, the two tables are combined based upon a join predicate. Wherever a row in one table matches a row in the other, the two rows are combined and added to the outputted results. If a row in either table matches several in the other, each combination will be included in the results. If a row in either table does not match any in the other table, it will be excluded from the results altogether.
The join clause, second table name and join predicate are included in the SELECT statement immediately after the name of the first table. The join uses the INNER JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT statement is as follows:
SELECT columns FROM table-1 INNER JOIN table-2 ON predicate
As an example, we can join the Jobs and Engineers tables. The following query returns a list of jobs and the details of the engineer that performed the work. In this case, the predicate specifies that the tables will be joined only where the EngineerId in both tables is a match. These columns are used in a foreign key relationship definition for the two tables, although this is not a requirement for the join to be executed. However, if non-key columns are frequently used in join operations, you should consider adding appropriate indexes to improve query performance.
SELECT * FROM Jobs INNER JOIN Engineers ON Jobs.EngineerId = Engineers.EngineerId
The query returns four results as each of the four jobs in the database has an engineer associated with it. However, not all of the engineers have been assigned work so the twenty-one that have not undertaken a job are not included in the results.
The example query returns every column from both tables. For larger tables, or when using multiple joins, this may include a lot of data that you do not require. In addition to making the results more difficult to read, it can also lengthen the processing time for the query and increase the amount of network traffic generated. You should therefore only return the columns that you require. This can be achieved by specifying a column list as usual. However, if the two tables include columns with the same name, the ambiguity can cause an error. Try executing the following query:
SELECT
EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
ON
Jobs.EngineerId = Engineers.EngineerId
This query fails because the EngineerId column appears in both tables. You must therefore specify which table's EngineerId column you require by prefixing it with the table name and a full-stop (period) character. The following query resolves the problem:
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
ON
Jobs.EngineerId = Engineers.EngineerId
When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As with the column selection, you must specify the table name for any columns with ambiguous names. For example:
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
ON
Jobs.EngineerId = Engineers.EngineerId
WHERE
Jobs.EngineerId = 4
OR
Jobs.EngineerId = 8
ORDER BY
Jobs.EngineerId
Implicit Inner Joins
A second syntax can be used for creating inner joins without providing a join clause. This syntax is known as an implicit inner join. In this case, the names of the tables to be joined are provided in a comma-delimited list. The join predicate becomes part of the WHERE clause.
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs, Engineers
WHERE
Jobs.EngineerId = Engineers.EngineerId
Generally speaking, the syntax that you use for inner joins is a matter of personal preference or will be proscribed by your organisation's coding standards. Many people prefer the explicit syntax as the join predicates are separated from those in the WHERE clause and are kept closer to the names of the tables that they are acting upon. It is also easier to change the type of join with the explicit syntax. It is useful to understand both variations, as you are likely to encounter each syntax in real-world scenarios.
Using Table Aliases
Providing the full name of the table as a prefix to column names can lead to long query statements. This is especially true if, to make the query more easily understood, you prefer to prefix every column name rather than just those that are ambiguous. In such circumstances, it is useful to replace the table names with table aliases.
A table alias provides a short code, usually of one or two letters, that can be used in place of a table name. Each alias is defined in the query after the full table name. In the next example, the "J" alias represents the Jobs table and the Engineers table has the alias "E". This makes the query much more readable.
SELECT
J.EngineerId,
J.JobId,
J.VisitDate,
E.EngineerName
FROM
Jobs J
INNER JOIN
Engineers E
ON
J.EngineerId = E.EngineerId
Self-Referencing Joins
Table aliases are always required when creating self-referencing joins, where the two tables that are being joined are actually the same table. In the following example, the Jobs table is being joined to itself so that the initial job and follow up job can be combined in a single row in the results. The initial job's table has an alias of "J", whilst the follow up job uses the "JF" alias.
SELECT
J.JobId,
J.StandardJobId,
J.EngineerId,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.EngineerId AS FollowUpEngineerId
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
Joining Multiple Tables
When you wish to join more than two tables, you can simply add further INNER JOIN and ON clauses to the query. For example, in the JoBS database there is a many-to-many link between Engineers and their Skills via a junction table named "EngineerSkills". We can join the three tables to generate a list that contains all engineers that have skills with one row for every engineer and skill combination. The following query returns these results in order of skill name. Note that some engineers appear more than once in the list because they have multiple skills.
SELECT
E.EngineerName,
E.HourlyRate,
E.OvertimeRate,
S.SkillName
FROM
Engineers E
INNER JOIN
EngineerSkills ES
ON
E.EngineerId = ES.EngineerId
INNER JOIN
Skills S
ON
ES.SkillCode = S.SkillCode
ORDER BY
S.SkillName
Outer Joins
Outer joins use a similar syntax to explicit inner joins but provide different results in some cases. The key difference is that outer joins include all of the rows from one or both tables, even if there are no matching rows defined by the join predicate. Where information is missing, NULL values are substituted in the columns of the returned results. The easiest way to explain the difference is with an example.
Firstly, let's consider an inner join. The query below joins the CustomerComplaints table to the Engineers table so that each complaint can be shown alongside the engineer that was at fault. Although there are three complaints in the database, only one row is returned by the query. This is because the other two rows define complaints that were deemed not to be the engineer's fault. In these cases the EngineerId column in the CustomerComplaints table is set to NULL.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
INNER JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
| EngineerName | Complaint |
|---|
| Joey Ohara | Engineer did not have appropriate parts and was rude. |
This may not be the information that we require from the query. If we actually want to list every complaint in the database, showing the engineer's name only where an engineer is associated with the complaint, we must use an outer join. In this case, we will use a left outer join. This indicates that all of the values from the table to the left of the join clause will be returned. If there is no associated row from the table on the right of the join, the columns from that table will be included but will contain NULL values.
To demonstrate, execute the following command, noting that the join clause has been modified to LEFT JOIN:
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
LEFT JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
The results from this query contain all three complaints. For the two complaints not associated with an engineer, the EngineerName column's value is NULL.
| EngineerName | Complaint |
|---|
| NULL | Customer has received an incorrect charge on their direct debit account. | | NULL | Customer does not wish to receive direct marketing information. | | Joey Ohara | Engineer did not have appropriate parts and was rude. |
The opposite of a left outer join is a right outer join. As you may imagine, this returns all of the rows from the table on the right of the join clause with null values for columns from the table to the left of the clause when no matching row exists. If you alter the previous query to use a right join, all twenty-five engineers will be returned. One of these engineers, Joey Ohara, will have a complaint listed, whilst all of the others will have NULL in the Complaint column.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
RIGHT JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
The third style of outer join is the full outer join. This type of join ensures that every row from both tables is included in the final results. If any row in either table does not have a matching partner, the row is populated with NULL values for the missing data. Try modifying the query to use a full join as shown below. This time you will see twenty-seven returned rows. One will be a complaint with a matching engineer, two will be complaints without engineers and twenty-four will be for engineers with no complaints made against them.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
FULL JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
Cross Joins
Cross joins are the simplest form of join and possibly the least used. A cross join does not define a join predicate. This means that the results contain every possible combination from the two tables. This is the Cartesian product of the table so this type of join is often called a Cartesian join.
If we modify the outer join query to provide a cross join, all combinations of data are returned. As there are twenty-five rows in the Engineers table and three rows in the CustomerComplaints table, the resultant list contains seventy-five rows.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
CROSS JOIN
Engineers E
NB: Cross joins should be used with care, particularly with large tables as the number of results can grow very quickly.
Implicit Cross Joins
As with inner joins, cross joins have an implicit syntax variation. This is the same syntax as for inner joins but with no join predicate in the WHERE clause:
SELECT E.EngineerName, C.Complaint FROM CustomerComplaints C, Engineers E
|