This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Programming Concepts

Database Normalisation

The seventeenth part of the SQL Server Programming Fundamentals tutorial discusses the concept of database normalisation. Normalisation is a database design technique that minimises duplication of information, reducing the risk of introducing data errors.

Normal Forms

The normal forms of relational databases describe the level of vulnerability that a particular design has to data integrity problems. They define a set of rules that should be applied when designing a database. The forms each have an ordinal number, with "first normal form" being the least normalised and most vulnerable to anomalies. Higher ordinals indicate progressively stricter rules. Additionally, there are some named normal forms, such as Boyce-Codd Normal Form, that apply additional rules.

Normal forms should be considered to be guidelines for database design, rather than absolute rules. There are some situations where it is necessary to break with the normal forms. This is generally to improve the performance of queries that otherwise would require a large number of join operations.

Originally, Edgar F. Codd described the first, second and third normal forms. These three forms are those that are the most commonly adhered to and that are always achievable when designing databases. They will be described in the remaining sections of this article. There are further normal forms, including Boyce-Codd Normal Form and fourth, fifth and sixth normal form. However, these stricter forms are not always achievable.

First Normal Form

For a table to be compliant with first normal form (1NF), it must meet certain conditions. The key rules are:

  • The columns of the table must not require any specific ordering.
  • The rows of the table must not require any specific ordering.
  • It must not be possible to create duplicate rows.
  • Every column of every row must contain exactly one element of data.
  • Rows must not contain hidden information such as row identifiers or hidden timestamps.

These rules can be illustrated with some example tables.

Example 1

Opened text file.Information
Outputted 15 lines to text file.Information
Failed to write line 16.Error

This table holds information, warning and error messages that are logged by a computer program. In order to read the log correctly, the details must be retrieved in the order in which they were written to the database. This breaks the second rule, which states that rows of the table must not require any specific ordering to be understood. It also breaks the third rule, as there is no possible unique candidate key for the table.

To resolve the ordering issue, the date and time of every log entry can be recorded. The candidate key problem can be fixed by adding a surrogate identifier to the table, shown underlined. In the following table, this column has been added and is using an identity column to generate values.

Log IDLog TimeMessageType
115/12/2008 12:01:23.123456Opened text file.Information
215/12/2008 12:01:23.123472Outputted 15 lines to text file.Information
315/12/2008 12:01:23.130120Failed to write line 16.Error

Example 2

NameJob TitleSkills
BobDeveloperC#, SQL
JimDeveloperC#, VB, SQL

In the second example we return to the employee list. In this case, the database designer has created a Skills column to contain the list of skills that each employee has. Where an employee has several skills, they are comma-separated. This breaks the rule that every column of every row contains exactly one element of data.

This issue can be resolved by creating two tables instead of one. The first table contains employees' names and job titles with the name as the primary key. The second table contains one row for each skill for each employee with both columns being included in a composite key. On performing a query that requires both sets of data, a join is made based upon the employee name.

Employees Table
NameJob Title
Employee Skills Table
15 December 2008