SQL Server Indexes Part 1
The fifteenth part of the SQL Server Programming Fundamentals tutorial describes indexes. Indexes provide a lookup facility for a table, allowing rows to be found more quickly and without the need to scan the data in every row when querying a table.
What is an Index?
In this article I will describe the purpose of indexes in a SQL Server database and some of the types of index that are supported by SQL Server 2005. Indexes are a complex topic so we will not look at any examples in this instalment, this will be saved until the next part of the tutorial.
Table Scans and Index Lookups
When you perform a query, SQL Server can identify the rows that should be returned using one of two methods. The most basic method is called a table scan. As the name suggests, when a table scan is performed, the querying engine scans through every row in the tables that are included in the query. When a row matches the query's WHERE clause, the relevant information is added to the results that will be returned. This process is generally only acceptable for small tables or those where all of the rows are usually returned unfiltered. In almost all other scenarios, reading every table row is very inefficient.
The second method that a query can use to find information is an index lookup. This type of query uses the contents of an index that has previously been defined for a table. An index on a database table performs a similar function to an index in a book. Rather than reading through the entire book to find the word or phrase that you are looking for, you can examine the book's index to find all of the important references to that item. A database index contains the contents of one or more columns in the table. If the query filters its results based only upon columns within the index, the correct database rows can be found much more quickly.
Indexes are stored as sorted, balanced tree structures, also known as b-trees. This type of tree is very fast to access. Several of these index structures can be added to a single table to improve the performance of queries that use that table. When a query is executed, SQL Server's query optimiser determines which of the indexes are suitable for the query and which of these will provide the best performance. Once selected, the b-tree is searched to find the first entry where the data that matches the query. The initial search is fast because the data in the index is sorted. Furthermore, the sorting minimises the range of index entries that must be scanned.
Index lookups are almost always quicker and much more efficient than table scans when retrieving information. However, this improvement in reading speed has an associated impact upon the performance when creating and updating information. In addition to changing the data in the table, each index must also be modified. These two factors should be considered together when designing your tables. If a table is often updated and seldom read, there may not be an overall benefit in adding multiple indexes. However, when certain columns are queried often, such as with columns included in foreign keys, it is often beneficial to index them.
NB: Foreign keys are not indexed by default in SQL Server so these are often ideal candidates for indexing.
SQL Server 2005 permits the creation of several types of index. In this article, we will examine four of the more common types. These are:
- Clustered Indexes
- Non-Clustered Indexes
- Unique Indexes
- Indexes with Included Columns
One of the most common types of index is the clustered index. When a clustered index is added to a table, the physical ordering of the rows is changed; they are sorted to match the index definition. This prevents you from creating more than one clustered index per clustered table.
The leaf nodes of a clustered index's balanced tree contain the actual data rows from the table. This gives very good performance for queries that are assisted by a clustered index for two key reasons. Firstly the data is immediately available once located in the index with no further lookups required. Secondly, the sorting of the data means that items that are adjacent in the index are likely to be in the same page of data on disk and thus can be read efficiently.
There are several design considerations when using clustered indexes:
- Clustered indexes are particularly useful for queries that use relational operators such as =, <, >, <=, => and BETWEEN or for queries that return large result sets.
- Clustered indexes are often used where the columns in the index will be used in queries that join multiple tables together. Typically, such an index is added to a primary key or foreign key. When you create a primary key using SQL Server Management Studio, a clustered index is added by default.
- Clustered indexes give good performance to queries that include ORDER BY clauses or GROUP BY clauses. Grouping will be described in a future article in this tutorial.
- Ideally, the values held in clustered index columns should be unique or include low levels of duplication. They should be accessed and created sequentially, making identity columns ideal.
- Clustered indexes are not advised to be used for columns that are updated frequently, as each update could potentially require a re-ordering operation of the table data.
- The composite values of the columns in a clustered index are used by non-clustered indexes on the same table. For this reason, the combined size of the columns should be minimised to prevent other indexes from becoming too large.
5 November 2008