
Database Normalisation
The seventeenth part of the SQL Server Programming Fundamentals tutorial discusses the concept of database normalisation. Normalisation is a database design technique that minimises duplication of information, reducing the risk of introducing data errors.
Data Anomalies
When information is held in a list format, the data is likely to include some duplication of information. This is often the case when creating a large list with many columns, such as in a spreadsheet format. Duplication can often occur when a single database table is used to hold lots of related information.
Duplication of information causes two key problems in a relational database. Firstly, the additional information can often lead to increased storage requirements. Secondly, and more importantly, you can experience data integrity problems, the majority of which can lead to one or more types of anomaly. Three anomalies are described in the following sections.
Update Anomalies
An update anomaly occurs when all copies of a single piece of information are not updated correctly. For example, consider the following table:
| Name | Job Title | Skill |
|---|
| Bob | Developer | C# |
| Bob | Developer | SQL |
| Jim | Developer | C# |
| Sue | Developer | VB |
| Pat | Project Manager | PRINCE2 |
To allow the table to hold the details of multiple employees, each with a job title and one or more skills, this table includes duplication in every column. This gives many opportunities for update anomalies. For example, if Bob's information is updated so that his job title is "Programmer", this should probably also update all other developer rows to indicate that the new job title replaces the old one for all developers. If not, the update could leave the table in the following state:
| Name | Job Title | Skill |
|---|
| Bob | Programmer | C# |
| Bob | Programmer | SQL |
| Jim | Developer | C# |
| Sue | Developer | VB |
| Pat | Project Manager | PRINCE2 |
The information in the table may still be considered accurate. However, a query of the database that intends to return all of the developers has become more difficult. A worse scenario will occur if the next update to change Bob's title back to "Developer" is only executed against one of his two entries, especially if the new job title is misspelled:
| Name | Job Title | Skill |
|---|
| Bob | Developr | C# |
| Bob | Programmer | SQL |
| Jim | Developer | C# |
| Sue | Developer | VB |
| Pat | Project Manager | PRINCE2 |
Insert Anomalies
Insert anomalies occur when the structure of a database table makes it difficult to insert information. This can happen when two or more distinct areas of information (or domains) exist within a table. In our previous example table, we may want to create a new employee who does not have a specific technical skill. This forces us to use a null value or dummy value for the skill:
| Name | Job Title | Skill |
|---|
| Bob | Developr | C# |
| Bob | Programmer | SQL |
| Jim | Developer | C# |
| Sue | Developer | VB |
| Pat | Project Manager | PRINCE2 |
| Lee | Trainee Programmer | null) |
In this situation, the information in the table seems reasonable. However, when Lee obtains his first skill, simply adding a row, as would be done for other developers, causes an anomaly:
| Name | Job Title | Skill |
|---|
| Bob | Developr | C# |
| Bob | Programmer | SQL |
| Jim | Developer | C# |
| Sue | Developer | VB |
| Pat | Project Manager | PRINCE2 |
| Lee | Trainee Programmer | (null) |
| Lee | Trainee Programmer | C# |
15 December 2008