BlackWasp
SQL Server
SQL 2005

Basic T-SQL Queries

The eighth part of the SQL Server Programming Fundamentals tutorial describes the use of the SELECT statement of the structured query language. This command allows the creation of database queries that return table rows that meet specified criteria.

Querying Data

The power of a database management system such as SQL Server is the ability to create both simple and complex queries. A query interrogates one or more tables, finding all of the rows that match the specified criteria. The rows are gathered into a set of results, formatted into rows and columns, which are returned to the process that performed the query.

Querying of data tables in SQL Server is achieved using the SELECT statement. The command appears deceptively simple, allowing basic queries to be generated very quickly. However, its flexibility permits the creation of very complex queries that combine information from many tables that match detailed criteria. It allows calculations to be performed upon the data as it is gathered, including row-by-row calculations and aggregation of information based upon groups of rows.

In this article we will concentrate on simple querying of a single table. More advanced querying techniques that join the results of multiple tables, calculate aggregate information and perform grouping will be described in later articles.

This article's examples require that you are using the JoBS tutorial database and that the database is correctly populated with information. If you have not already created this database, download the script from the link at the top of the page and use it to create the database and sample data. If you have used the tutorial database, you should drop the old version and replace it using the downloaded script to ensure that the data is correct.

The SELECT Statement

The SELECT statement can include a variety of clauses that control the behaviour of the command and the results that are returned. The most basic form of the statement simply returns literal values that are not sourced from a table. These values are provided after the command as a comma-separated list.

Try executing the following command in a SQL Server Management Studio query window:

SELECT 1, 2, 3.4, 'Hello'

The results area shows a row of data containing the four values specified. The value of such a statement may not immediately be apparent. However, there are various reasons that you may wish to select a fixed row of data or include literal values within other styles of query. For example, when combining the results of multiple queries you may want to add a predefined row or a column categorising the results according to the query from which they originated.

Selecting Data From a Table

When selecting information from a table, the name of the table must be provided. This is specified using the FROM clause. You must also provide details of which columns to return from the database. If you wish to return data from every column in the table, you can use an asterisk (*) as shown below. This query retrieves every column of each of the twenty-five rows in the Engineers table.

SELECT * FROM Engineers

NB: Note that the results are returned as rows and columns, in a similar format to the storage of information in a table.

Specifying a Column List

Although using the previous syntax to retrieve all of the columns for a query is useful for testing and maintenance tasks, it is not best practise for production systems. The main problem is that changes to the underlying structure of the table mean that the results may change. If the software using the results assumes the position of the columns, this could lead to incorrect data being used. A second issue is that the query may return many columns that are not required by the calling function. In this case, unnecessary network traffic is generated and the query's performance is impacted.

A more effective and efficient use of the SELECT statement is with the inclusion of a comma-separated column list in place of the asterisk. In this case, only the columns that are required are retrieved from the database and passed back to the function that initiated the query.

The following statement retrieves only the unique identifier, name and hourly rate for engineers. As the other column in the table holds binary data for a photograph of the engineer, omitting it from the list when it is not needed could vastly reduce the network traffic and time required to return the data.

SELECT EngineerId, EngineerName, HourlyRate FROM Engineers

When using a column list, the information does not need to be ordered in the same manner as the underlying table. If you change the column list, the results are modified accordingly.

SELECT HourlyRate, EngineerName, EngineerId FROM Engineers

The JoBS tutorial database does not include any columns that include spaces in their names. If you are using a database with columns that have names that include spaces, the name in the column list must be delimited using square brackets. For example:

SELECT [Engineer Name] FROM Engineers

NB: This notation can be used elsewhere in SQL Server where names include spaces. The brackets can even be included where spaces are not present.

Using Column Aliases

Sometimes you will want to change the names of the columns in the set of results. To do so, a column alias can be provided in the column list using the AS keyword. The AS keyword appears after the real column name and before the alias. In the following example, all three columns have aliases, one including a space.

SELECT
    EngineerId AS Id,
    EngineerName AS [Engineer Name],
    HourlyRate AS Cost
FROM
    Engineers

Limiting the Result Set Size

If a query will return a very large number of results, you may decide to limit the result set size. By adding the TOP clause immediately after the SELECT keyword you can specify the maximum number of rows to return. This is particularly useful when working with ordered results as the sorting occurs before the result set is limited. The sample below limits the number of engineers returned to ten.

NB: The parentheses around the number 10 are optional for backward compatibility. Microsoft advises that these parentheses are included.

SELECT TOP (10) EngineerId, EngineerName, HourlyRate FROM Engineers

The row limitation can be expressed as a percentage, rather than a discrete value, by adding the PERCENT clause. If the exact percentage is not an integer, the number of rows will be rounded upwards. For example, in the following statement ten percent of the results would be 2.5 rows. This is rounded and three rows are returned.

SELECT TOP (10) PERCENT EngineerId, EngineerName, HourlyRate FROM Engineers

Eliminating Duplicates

In all of the above examples, each use of the SELECT statement has resulted in the return of every row. Had there been duplicate rows in the tables, you would have seen duplicate rows in the results. We can demonstrate this by switching to a different table. In this case, we will use the EngineerWorkingAreas table to find the list of areas that are serviced by DisasterFix.

SELECT AreaCode FROM EngineerWorkingAreas

As the table simply links engineers to their working areas, the results include duplicates. For example, "CEN" appears four times in the output as several engineers work in this area. If you want to eliminate these duplicates from the results, you can use the DISTINCT clause before the column list. The previous statement returned all sixty-two rows of data; the modified version below only returns the forty-eight unique rows.

SELECT DISTINCT AreaCode FROM EngineerWorkingAreas

The DISTINCT clause operates on the columns that are included in the column list. If multiple columns are provided, the combination of data must be unique. When using the clause to remove duplicates, null values are considered equal.

Using Expressions

As we have seen, literal values can be present in the column list of a query. In addition, expressions can be included to perform calculations and various operations using literal values and column names as their operands. A review of all of the operators and functions available is beyond the scope of this article; some of the more important functions will be described later in the tutorial. In this article we will review some simple arithmetic and concatenation operators.

To perform a calculation using the arithmetic operators (+ - * /), the expression is simply included in the column list. Usually you will specify an alias for the calculated column, as no name will be provided by default. The statement below calculates the total cost of the parts used on all jobs by multiplying the unit cost by the number of parts used.

SELECT PartNumber, UnitsUsed, Cost, UnitsUsed * Cost AS ExtendedCost FROM UsedParts

Another useful operation is the concatenation of character values. This is achieved using the concatenation operator (+). The text to be combined can be sourced from columns or can include literal strings surrounded by apostrophes, as in the next sample, which combines the part names and numbers from the Parts table.

SELECT PartName + ' (' + PartNumber + ')' FROM Parts

Type Casting

Operations may require that the values involved be of specific types. If the column values are not of the correct type, they can be changed using the CAST function. This function has two parameters. The first is the value to be converted and the second is the new type, including a size where appropriate.

A simple example of the function is when concatenating text. For this combination to work correctly, the two operands must be character types. Concatenating numeric values is not permitted unless they are first cast to fixed or variable length character fields. The following statement generates a full parts list including a formatted price. The price is held in a money column that must be cast before concatenation with the part name.

SELECT PartName + ' (£' + CAST(Cost AS VARCHAR(7)) + ')' FROM Parts

NB: The current and desired types must be compatible in order for the cast operation to be successful. A full matrix of the possible conversions can be found on the Cast and Convert page of the MSDN web site.

Filtering Result Sets

The real power and flexibility of basic queries is provided by the ability to filter the result set, only retrieving rows that match a given set of criteria. To filter the results, one or more predicates must be provided in a WHERE clause. A predicate is an expression that can be evaluated to true, false or unknown. When the combined value of predicates is true for a row, the row is included in the returned results. If the evaluation is false or unknown, the row is omitted from the results.

Comparison Operators

Predicates can be created using comparison operators that require two operands. The six basic operators are as follows:

OperatorTrue Condition
=The two operands are equal.
!=The two operands are not equal.
>The first operand is greater than the second.
>=The first operand is greater than or equal to the second.
<The first operand is less than the second.
<=The first operand is less than or equal to the second.

Let's try some of these operators using the Engineers table. Firstly, we can find all of the engineers that have an hourly rate of exactly £20.00. There should be five such engineers.

SELECT
    EngineerId AS Id,
    EngineerName AS [Engineer Name],
    HourlyRate AS Cost
FROM
    Engineers
WHERE
    HourlyRate = 20

To find all of the engineers who earn £20.00 per hour or less, try this query:

SELECT
    EngineerId AS Id,
    EngineerName AS [Engineer Name],
    HourlyRate AS Cost
FROM
    Engineers
WHERE
    HourlyRate <= 20

Finding Values in a Range

Another interesting method for building a predicate is using the BETWEEN clause. This allows an inclusive range of values to be specified. When evaluated for a row, if the value appears within the range then the row is included in the results. The following statement retrieves all of the customers that were created within a specific date range.

SELECT
    FirstName,
    LastName,
    CreatedDate
FROM
    Customers
WHERE
    CreatedDate BETWEEN '2007-12-08' AND '2008-01-15'

NB: When using dates without specifying the time element, the time is assumed to be 12:00am. This means that in the previous query, if a customer were created on 15 January 2008 at any time other than midnight the row would be excluded from the results. To include such a row the end of the range should be '2008-01-15 23:59:59.997' or comparison operators should be used instead of BETWEEN.

Comparing Values Against a List

Sometimes you will want to determine if a value matches an item from a list. This is possible using the IN clause and a comma-separated list of values in parentheses. If we want to retrieve a list of the ID's of engineers that service the Humberside or Yorkshire areas, we can use the information in the EngineerWorkingAreas table and the following query:

SELECT DISTINCT
    EngineerId
FROM
    EngineerWorkingAreas
WHERE
    AreaCode IN ('HUM', 'NYO', 'SYO', 'WYO')

Performing Wildcard Searches

When working with character columns it is possible to perform pattern-matching queries using the LIKE keyword and a pattern containing wildcard characters and escape sequences. The column or expression to be evaluated is positioned to the left of the LIKE keyword and the pattern to be compared is placed to the right.

There are several wildcards that can be included in the pattern. The first is a percentage sign (%), which indicates that the value can include any number of any characters in place of the wildcard, including no characters at all. This is useful for "begins with", "ends with" or "contains" queries. For example, executing the following statement retrieves all customers with a last name that starts with the letter "M".

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    LastName LIKE 'M%'

The underscore (_) wildcard in a pattern is used to represent any single character. In the following example, four underscores are used to return customers with a last name that is exactly five characters long and ends with an "s".

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    LastName LIKE '____s'

Sometimes checking for any character in the pattern is not desired. Instead, you may wish to check that a particular character position contains a single character from a list. In these cases, you can use a pair of square brackets containing each acceptable character. When the predicate is evaluated it returns true only if the character at the position exists within the brackets. The following statement demonstrates this by finding customers with a first name of "Dita" or "Rita".

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    FirstName LIKE '[RD]ita'

The list of characters within the square brackets can be expressed as a range, rather than a discrete set. This is achieved by specifying the start and end of the range with a hyphen between the two extremes.

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    FirstName LIKE '[D-R]ita'

The list or range of values can also be negated using a caret symbol (^). In this case, the wildcard specifies that the character at the given position must not be within the stated list or range. In the following example engineers are included in the result set only if their name does not begin with any letter between "A" and "M".

SELECT
    EngineerName AS [Engineer Name],
    HourlyRate AS Cost
FROM
    Engineers
WHERE
    EngineerName LIKE '[^A-M]%'

One final use of the square bracket syntax is when you wish to find patterns that include one of the wildcard characters. To search for a string that includes a percentage symbol, for example, you would use a pattern such as "%[%]%". The outer percentage signs are wildcards but the inner one is interpreted as a literal character.

The NOT Operator

The NOT operator is a useful keyword that negates the value of a predicate. This can make it much simpler to express some conditions in a where clause, allowing the SELECT statement to be more readable. The operator is placed to the left of a predicate to reverse its meaning. The predicate can, if preferred, be contained within parentheses.

The statement below provides the opposite result to an earlier example. The predicate alone would cause the command the return all of the engineers earning £20 or less per hour. With the introduction of the NOT keyword, the statement returns only those engineers earning more than this amount.

SELECT
    EngineerId AS Id,
    EngineerName AS [Engineer Name],
    HourlyRate AS Cost
FROM
    Engineers
WHERE
    NOT HourlyRate <= 20

The NOT keyword is sometimes included in an operator directly. An example of this is with the IN keyword. This can be negated using the NOT IN variant, as shown below.

SELECT DISTINCT
    EngineerId
FROM
    EngineerWorkingAreas
WHERE
    AreaCode NOT IN ('HUM', 'NYO', 'SYO', 'WYO')

I mentioned earlier that predicates can return one of three possible values, these being true, false and unknown. When using NOT, an expression that evaluates to true becomes false. A false value becomes true. We will shortly see predicates that return unknown. It is important to understand that when the NOT operator is used with an unknown value, the result remains unknown.

Checking for Null Values

When a column value or an expression evaluates as NULL, this indicates that the value is undefined or empty. This is different from a value of zero or an empty string as every null value should be considered to be a different value. Any comparison of a value with NULL or of two NULL values returns a result of "unknown".

The result of this type of comparison has some effects that are unusual but that must be carefully anticipated. We can query the Customers table to demonstrate the potential pitfalls. This table contains twenty rows. Of these, only three represent business customers, so the BusinessName column in the other seventeen rows contains NULL.

If we execute a query that returns all of the rows where the business name is "Blue-Green ltd", we find that only one row matches the predicate.

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    BusinessName = 'Blue-Green ltd'

Developers who are new to SQL Server may expect that changing this query to use the "not equal" operator, as shown below, would reverse the results. They may expect to see the other nineteen rows. However, as the comparison returns unknown for all comparisons with NULL, only two rows are included in the result set. This means that two apparently opposite queries do not return every row when combined.

SELECT FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    BusinessName != 'Blue-Green ltd'

The same effect would be seen when using the NOT operator. If the predicate returns unknown because of a comparison with NULL, negating the result still returns an unknown value so the row is excluded from the results.

IS NULL and IS NOT NULL

If you wish to test if a column or expression is null, you can use the IS NULL keyword, rather than another comparison operator. To retrieve all of the customers that have no business name specified, execute the following statement:

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    BusinessName IS NULL

The opposite of "IS NULL" is "IS NOT NULL". This evaluates to true only when the column value does not contain a null value. Therefore, you can find all of the customers that are associated with businesses by executing the following.

SELECT
    FirstName,
    LastName,
    BusinessName
FROM
    Customers
WHERE
    BusinessName IS NOT NULL

Using Multiple Criteria

So far in this article we have only considered WHERE clauses with a single predicate. To expand the flexibility of the criteria in a query, you can add as many predicates as required to obtain the desired results. To do so, you use the AND and the OR operators.

AND Operator

The AND operator combines two predicates using a logical AND operation. However, as the predicates in a query can be evaluated as true, false or unknown, the results are not as simple as for pure Boolean logic. The table below shows the possible outcomes of combining these values. Remember that rows are returned if the outcome is true and omitted from the result set if false or unknown.

Operand 1Operand 2Result
truetruetrue
truefalsefalse
falsetruefalse
falsefalsefalse
trueunknownunknown
falseunknownfalse
unknowntrueunknown
unknownfalsefalse
unknownunknownunknown

Let's use the AND keyword to perform a SELECT that requires that two predicates return true. In this case, we will obtain all of the rows in the Customers table where the customer is not linked to a business and was created on or after 1 January 2008.

SELECT
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
WHERE
    BusinessName IS NULL
AND
    CreatedDate >= '2008-01-01'

OR Operator

The OR operator combines predicates using a logical OR operation. This allows you to build a result set containing results that meet at least one of several criteria. Again, consideration must be given to the effects of a predicate being unknown.

Operand 1Operand 2Result
truetruetrue
truefalsetrue
falsetruetrue
falsefalsefalse
trueunknowntrue
falseunknownunknown
unknowntruetrue
unknownfalseunknown
unknownunknownunknown

We can modify the previous query to use the OR operator instead of AND. This time, we will obtain all of the rows in the Customers table where the customer is not linked to a business or where the customer was created on or after 1 January 2008.

SELECT
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
WHERE
    BusinessName IS NULL
OR
    CreatedDate >= '2008-01-01'

Combining AND and OR

Although the examples shown above only use two predicates in their criteria, you can add as many predicates as required. This may include the use of AND and OR operators in the same WHERE clause. In these cases, it is important to know the order in which these operators will be considered. SQL Server provides a simple rule to determine the order. Firstly, all of the AND operations are evaluated. Afterwards, the OR operations are processed to obtain the final result. If you wish to modify this order, you can include parentheses:

SELECT
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
WHERE
    BusinessName IS NOT NULL
AND
    (FirstName LIKE 'A%' OR CreatedDate >= '2008-01-01')

Sorting Data

The final clause to be considered in this article is ORDER BY. When added to the end of a SELECT statement, the clause specifies the columns that should be used to sort the results of the query. The results may be sorted by a single column or by multiple columns specified in a comma-separated list. For example, we can sort the customers by business name and then last name using the following query. If the business names match, the matching rows are sorted by last name. NB: Note that NULL values appear first in the list.

SELECT
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
ORDER BY
    BusinessName,
    LastName

Adding the DESC keyword to one or more of the columns reverses the sort order. This time, we will sort the customers by business name in descending order. For duplicate business names, we will sort by first name in ascending order. NB: Note the use of the ASC keyword for ascending order sorting. This is optional as ascending order is the default.

SELECT
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
ORDER BY
    BusinessName DESC,
    LastName ASC

Using TOP and ORDER BY

A very useful technique is to combine the use of ORDER BY with the use of the TOP clause. This allows the creation of queries such as the following, which retrieves the newest nine customers based on the CreatedDate column value.

SELECT TOP 9
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
ORDER BY
    CreatedDate DESC

This does cause an interesting effect. In the case of the above query, the ninth and tenth newest customers actually have the same CreatedDate value. In the query, SQL Server arbitrarily selects one of these for inclusion in the results and discards the other. This may be acceptable in some cases but often you will want to retrieve additional results where such duplication exists. This is possible if you add the WITH TIES clause to the TOP keyword. Now if there are duplicates that would normally be discarded, they are included in the result set.

The final sample statement shows the WITH TIES clause. Although the top nine results are requested, a duplicate means that the query returns ten rows.

SELECT TOP 9 WITH TIES
    FirstName,
    LastName,
    BusinessName,
    CreatedDate
FROM
    Customers
ORDER BY
    CreatedDate DESC
Link to this Page3 August 2008
RSS RSS Feed