BlackWasp
SQL Server
SQL 2005

SQL Server Primary Keys

The tenth part of the SQL Server Programming Fundamentals tutorial explains the use and creation of primary keys. A primary key is a column, or group of columns, within a database table that provides a unique reference for each row added to that table.

What is a Primary Key?

A primary key provides a unique reference for every row in a table. A primary key is defined in terms of a single column or a group of columns in the table. A primary key with more than one column is known as a composite key. Once created, every row within the table must have a different value in the column, or a different combination of values across composite key columns.

A primary key is a special type of unique index or unique constraint. A unique index indicates that a single column, or a combination of columns, in a table must contain only unique values. You may create several unique constraints in a table to ensure that different elements of the data have uniqueness. However, you may only define one primary key for a table. Another important difference is that whilst a unique index may contain nullable columns, a primary key cannot.

In addition to defining a unique reference for every row, a primary key adds a clustered index to the table. This special type of index controls the physical ordering of information within the table, so that the data is sorted according to the values in the primary key columns. This vastly improves the performance of queries that select data according to the primary key column values. NB: Although the primary key index is clustered by default, the clustering can be removed and applied to another index. This will be described in a later article in this tutorial.

An important use of primary keys in relational databases is for the creation of "one-to-many" relationships between tables. In most cases, the primary key is used as the unique reference that is specified in the linked table, by means of a foreign key.

Natural Keys

A natural key is a primary key that uses naturally unique information from the table in its constituent columns. Many database developers prefer this type of key as the use of naturally unique data makes reading the raw data in linked tables easier. In the Parts table below, the PartNumber column provides a naturally unique value that is an ideal candidate for a primary key.

PartNumberPartNameUnitCost
15COPIPE15mm Copper Pipe1.52
22COPIPE22mm Copper Pipe2.97
10MICCOP10mm Microbore Copper Tube1.17
25STPTAP25x25mm Stop Tap5.99

This table defines a set of parts that engineers may use when performing jobs for customers. With the part number being used as the primary key, a related table can be defined that holds details of the stock that engineers have in their possession. In the EngineerStock table below, the PartNumber column provides the link to the Parts table and is easily readable by a developer who understands the data. For example, we can see immediately that the engineer with an EngineerId of "1" has 18 units of 15mm copper pipe.

EngineerIdPartNumberUnitsHeld
115COPIPE18
122COPIPE15
222COPIPE9
225STPTAP1

There can be problems when trying to select naturally unique values for a primary key. These occur when the unique information in the table does not meet the following guidelines and rules:

  • Primary key data should be immutable. This means that once set, the information should never be changed. Although it is possible to modify the information in a primary key column, it is inadvisable. This is because the change would need to be propagated to every linked table to maintain referential integrity. This usually prevents the use of data such as a car's registration number as the primary key; if a car's registration number is changed, the primary key information would need to be updated.
  • The natural key must be unique for every row in the table and must not include nullable information. This would prevent the use of a person's name as a primary key as names are not unique. It is also possible that a person's name changes through marriage or a legal name change, breaking the first guideline.
  • The information within the primary key columns should be as compact as possible. If the column sizes in the primary key are large, the number of index entries that can be held in memory simultaneously is reduced. Large keys causes inefficiencies in index access and input/output, reducing the performance of your database. In a flight school database, the combination of pilot reference number, co-pilot reference number, aircraft registration and departure date and time may be unique but could be considered too large for use in a primary key.
  • SQL Server does not permit the use of large object columns in primary keys. If you are holding binary file data in a database it may be unique but it will not be possible to use it in a primary key.

Surrogate Keys

When no suitable natural key exists for a table, a surrogate key can be used instead. A surrogate key is generally a single-column primary key containing a unique value. Often the value is generated automatically. The use of such a key guarantees a unique and immutable value for every row in the table. However, it can be more difficult to read the information in a table when used in a foreign key.

If the previously described Parts table were updated to include an integer-based surrogate key, the data may appear as follows:

PartsKeyPartNumberPartNameUnitCost
115COPIPE15mm Copper Pipe1.52
222COPIPE22mm Copper Pipe2.97
310MICCOP10mm Microbore Copper Tube1.17
425STPTAP25x25mm Stop Tap5.99

If the new surrogate key is used in relationship in the EngineerStock table, the stock data is more difficult to read. To determine the number of units of 15mm copper pipe, the data in both tables must be compared.

EngineerIdPartsKeyUnitsHeld
1118
1215
229
241

Surrogate keys add a small number of bytes to the storage requirement for every row in the table in which they are defined. This can increase the overall size of the database. However, if the size of the surrogate key is small, this size increase may be offset by the decrease in size of related tables, when compared to the use of larger natural keys.

Surrogate keys are often defined as integers or globally unique identifiers (GUIDs). In the case of integers, the size of integer column selected is dependant upon the anticipated number of rows that will be required in the table. The number usually starts at one and increments by one for each row added to the table. This is ideal for databases that are only used in on-line scenarios. If a database includes synchronisation for offline use, a GUID may be more appropriate to reduce the risk of duplicated values being created during disconnected use.

Creating a Primary Key

Primary keys can be added to tables using various methods. In this article we will concentrate on two options: adding a primary key using the SQL Server Management Studio graphical user interface tools and creating a key using Transact-SQL statements.

The examples below add primary keys to all of the tables in the JoBS database that we have created over the course of the tutorial. If you have not followed the tutorial, download the script using the link at the top of this page and use it to create the database and some sample data. You should also use this script to recreate the database if you have followed the tutorial so far, as is includes modifications to the schema and data not described in the previous article.

Creating a Primary Key Using the SQL Server Management Studio

During development of a new database, SQL Server Management Studio (SSMS) tools can be used to create primary keys. To demonstrate, we will add a primary key to the Engineers table. To begin, open SSMS and expand the Object Explorer so that you can see the Engineers table. Right-click the table name and choose "Design" to open the table designer.

To add a single-column primary key, you simply select the column in question and choose "Set Primary Key" from the Table Designer menu. You can also use the Set Primary Key toolbar button. To try this, ensure that the EngineerId column is selected by clicking the first row selection button. Use the menu option or toolbar button to set the primary key. You will see a small key icon alongside the column definition.

NB: The EngineerId column provides a surrogate key as this information has been added to give a unique reference.

SQL Server Primary Key Definition

If you mistakenly add a primary key, you can remove it using the same menu option or toolbar button. You will notice that the description of the menu item, or the tooltip for the button, is updated to show that they will remove the primary key. Once the key is correct, save the changes to the table design and close the table designer window.

Now that the Engineers table has a primary key, it is impossible to create rows with duplicates in the EngineerId column. To see what happens when you try, open a new query window for the JoBS database and execute the following T-SQL statement:

INSERT INTO Engineers VALUES (1, 'Bill Jones', 19.85, null)

The primary key prevents the new row being created and reports an error. NB: Note the default name of PK_Engineers that has been applied to the new primary key constraint.

Violation of PRIMARY KEY constraint 'PK_Engineers'. Cannot insert duplicate key in object 'dbo.Engineers'.
The statement has been terminated.

If you review the contents of the table you will see that the new row was not stored.

Creating a Composite Key Using the Table Designer

A composite key is a primary key containing more than one column. Once defined, the combined value of all of the primary key columns must be unique but the individual columns may include duplicates. This is the type of key that is required for the EngineerStock table. In this table the EngineerId may be repeated, as may the values in the PartNumber column. However, we want to restrict the data so that the combination of these columns is always unique. This will ensure that there is only one stock figure per engineer and part.

In the table designer for the EngineerStock, select both the EngineerId and PartNumber column. To do this, select the EngineerId column first and then click the row selector button for the PartNumber column whilst holding the Ctrl key. Once both columns are highlighted, click the Set Primary Key toolbar button or menu item. A key icon will appear alongside both columns to indicate that the primary key has been set. You can now save the table and close the designer window.

SQL Server Composite Primary Key Definition

Creating a Primary Key Using Transact-SQL

Primary keys can be generated using T-SQL commands, either at the time of creation of the table or at a later date. To create a constraint when creating a table, the CONSTRAINT clause is used within the parentheses containing the column specifications. To create a simple primary key with a clustered index, use the following syntax:

CONSTRAINT constraint-name PRIMARY KEY CLUSTERED (column1, column2, ..., columnN)

The constraint-name provides a unique name for the primary key. This name must be a unique name for the entire set of database objects rather than just being unique amongst the primary keys. The comma-separated column list specifies which columns make up the primary key.

NB: This is a simplified syntax for the creation of keys with default settings. Additional settings related to the primary key's index exist and will be examined in a later article describing indexes.

To use the above syntax within the creation of the Engineers table, you could execute the following statement:

CREATE TABLE Engineers
(
    EngineerId INT NOT NULL,
    EngineerName VARCHAR(50) NOT NULL,
    HourlyRate MONEY NOT NULL,
    Photograph VARBINARY(MAX)
    CONSTRAINT PK_Engineers PRIMARY KEY CLUSTERED 
    (
        EngineerId
    )
)

NB: The above statement will fail as the Engineers table already exists.

Altering an Existing Table

A primary key can be added to an existing table using the ALTER TABLE statement with the ADD clause. The syntax for the constraint element is the same as for when creating tables:

ALTER TABLE table-name ADD
CONSTRAINT constraint-name PRIMARY KEY CLUSTERED (column1, column2, ..., columnN)

We can test this syntax by adding a primary key to the Parts table. This table has a natural key in the unique part number. To set this column to be the primary key, execute the following statement:

ALTER TABLE Parts ADD CONSTRAINT PK_Parts PRIMARY KEY CLUSTERED (PartNumber)

JoBS Database Primary Keys

The remainder of the primary keys can now be added to the JoBS tutorial database. Use a combination of the graphical user interface tools and Transact-SQL to add the following primary keys.

TableColumn(s)
ContractsContractNumber
CustomersCustomerNumber
CustomerAddressesAddressId
CustomerComplaintsComplaintReference
EngineerSkillsEngineerId, SkillCode
EngineerWorkingAreasEngineerId, AreaCode
GeographicalAreasAreaCode
JobsJobId
RequiredSkillsStandardJobId, SkillCode
SkillsSkillCode
StandardJobsStandardJobId
UsedPartsJobId, PartNumber
Link to this Page17 August 2008
RSS RSS Feed