
SQL 2005+SQL Server Indexes Part 1 (2)
The fifteenth part of the SQL Server Programming Fundamentals tutorial describes indexes. Indexes provide a lookup facility for a table, allowing rows to be found more quickly and without the need to scan the data in every row when querying a table.
Clustered Indexes
One of the most common types of index is the clustered index. When a clustered index is added to a table, the physical ordering of the rows is changed; they are sorted to match the index definition. This prevents you from creating more than one clustered index per clustered table.
The leaf nodes of a clustered index's balanced tree contain the actual data rows from the table. This gives very good performance for queries that are assisted by a clustered index for two key reasons. Firstly the data is immediately available once located in the index with no further lookups required. Secondly, the sorting of the data means that items that are adjacent in the index are likely to be in the same page of data on disk and thus can be read efficiently.
There are several design considerations when using clustered indexes:
- Clustered indexes are particularly useful for queries that use relational operators such as =, <, >, <=, => and BETWEEN or for queries that return large result sets.
- Clustered indexes are often used where the columns in the index will be used in queries that join multiple tables together. Typically, such an index is added to a primary key or foreign key. When you create a primary key using SQL Server Management Studio, a clustered index is added by default.
- Clustered indexes give good performance to queries that include ORDER BY clauses or GROUP BY clauses. Grouping will be described in a future article in this tutorial.
- Ideally, the values held in clustered index columns should be unique or include low levels of duplication. They should be accessed and created sequentially, making identity columns ideal.
- Clustered indexes are not advised to be used for columns that are updated frequently, as each update could potentially require a re-ordering operation of the table data.
- The composite values of the columns in a clustered index are used by non-clustered indexes on the same table. For this reason, the combined size of the columns should be minimised to prevent other indexes from becoming too large.
Non-Clustered Indexes
Non-clustered indexes are held separately from the data in a table and do not cause any changes to the ordering of the physical data. You can therefore create multiple non-clustered indexes on a single table.
In a non-clustered index, the leaf nodes of the b-tree structure contain pointers to the rows of data they represent, rather than the data itself. The pointer is either a reference to a location in the table's clustered index or, when the table is not clustered, to the table row. This adds an extra lookup to the process when reading from an index, lowering the performance of queries. A further reduction in speed can occur because adjacent rows in the index are less likely to be in the same data pages increasing the required disk access.
- Non-clustered indexes are ideal for tables that will be updated rarely and queried often. If there are large volumes of data in the table, the querying speed can be drastically improved.
- Non-clustered indexes are useful where a clustered index would be preferred but one already exists for the table.
- If a table is updated often and queried less so, you should carefully consider the benefits and drawbacks of adding such an index.
- Non-clustered indexes provide further performance gains when they cover the query's returned columns. Coverage means that all of the columns in the query are also included in the index. In this case, the query engine can obtain its results entirely from the index without visiting the table data.
- When creating a non-clustered index, you should not add a large number of columns just to achieve coverage of the returned data. The columns in the index should be those that are likely to be included in a WHERE clause only. To achieve additional coverage, you should consider using indexes with included columns, described later in this article.
- If a non-clustered index is included in a clustered table, the columns from the clustered index are automatically included in the non-clustered index as they are the pointer that will ultimately find the data. You can obtain coverage of some queries from this combined set of columns.
5 November 2008