BlackWasp
Programming Concepts

Relational Database Concepts

This is the first in a series of articles describing the use of Microsoft's SQL Server database management system (DBMS). The tutorial is aimed at software developers using SQL Server for data storage. Part one reviews database concepts and terminology.

Tutorial Prerequisites

The tutorial has been created using Microsoft SQL Server 2005 for all code samples and downloadable demonstrations. However, all of the general concepts and the majority of the samples are valid for earlier versions of the database management system. They should also be valid for future releases of SQL Server.

To try the tutorial examples, a SQL Server system must be available. This can be a locally installed instance of SQL Server or a version installed on an available networked system. Any service-based edition of SQL Server 2005 is ideal, including the free-of-charge Express edition, which may be downloaded from the Microsoft SQL Server web site. SQL Server Compact Edition (CE) may also be used for some samples, although this edition is designed to be usable on hand-held devices and, as such, has limited functionality. Access to SQL Server Management Studio, SQL Server Management Studio Express or a similar tool is strongly recommended.

What is a Database?

A database is simply a place to store and process structured information. Databases permit manipulation or retrieval of their contained data in usable manners. There are many forms of database that store and organise information using different structures.

Microsoft SQL Server is a database management system or DBMS. This terminology should not be confused with the term, "database". A DBMS is the software that provides the facilities to create and maintain databases and manipulate the information stored within. In the case of SQL Server, the databases that are created are relational databases, so the product is often known as a relational database management system or RDBMS.

What is a Relational Database?

Relational databases are probably the most common type of database used for general-purpose tasks. In a relational database, information is grouped according to its type, generally in tables (see below). For example, in a database designed to hold fleet information you may include a table of employees and a table of vehicles. These two types of structured data would be held separately as they hold fundamentally different information.

In addition to separating information according to its data structure, a relational database allows relationships to be created. A relationship defines a possible link between data types; the actual linkage of data is dependant upon the information held. For example, in the fleet database there may be a relationship between employees and vehicles, indicating that one or more employees drive a particular vehicle.

Normalisation and Denormalisation

Relational databases use the concept of normalisation. Normalisation is a design technique that minimises the duplication of information. It also reduces the risk of errors in data caused by copies of the same information not being kept synchronised. By using relationships, the duplication required can be lessened or eliminated completely. For example, the fleet database could be created with a single table containing all employees and the details of the vehicle that they drive. If two employees use the same vehicle, the vehicle's details will be entered twice and must be maintained in two places. By creating a relationship between the employees table and the vehicles table, the vehicle data appears only once and requires updating in only one place.

The opposite of normalisation is denormalisation. In a denormalised database information is purposefully duplicated. This provides greater performance than a normalised database at the cost of increased database size and the risk of data integrity problems. Denormalised databases are useful where speed of data retrieval is paramount, such as in a database used for reporting purposes only.

Normalisation and denormalisation will be considered in more detail in a later article in this tutorial.

Elements of a Relational Database Schema

There are several key elements to a relational database created using Microsoft SQL Server. Each of these forms a part of the database's schema. The schema is the logical data model that determines the information that may be stored in the database and how it is to be arranged. This section describes some of the key terminology that should be understood before continuing with the tutorial.

Tables

The table is the most important element of a relational database. Tables are where the information within a database is held. A table can be thought of as a grid of information where the columns in the grid define the elements of data required and each row is an individual record. In fact, the terms "columns" and "rows" are those used to describe the table's schema and its data respectively.

Employees Table
Payroll IDNameVehicle Registration
1001John SmithZX08 W4S
1002Bill JonesNX08 DS8
1003Lisa BrownNJ57 D1D
1005Louise GreenNX08 DS8

The above data could be held in a table of employees. The table contains three columns for the payroll ID, name and vehicle registration number; these being defined in the table's schema. The table has four rows of data, one for each employee that has access to a vehicle. The vehicle's details will be held in a separate, related table linked by the unique vehicle registration number.

NB: The storage of data in a table is more complicated than that shown above. However, this is a useful metaphor for this article.

Indices

One of the greatest benefits of holding information in a database is the ability to quickly retrieve it. When querying a database, it is possible to apply criteria to ask for a specific set of rows. For example, returning all employees that use a specific vehicle or returning employees within a particular range of payroll IDs.

If the DBMS needed to scan through all of the data within a table in order to retrieve the desired information, the process would be very slow, particularly for tables with millions of rows. To improve retrieval performance a table can have one or more indices. Each index provides a fast "look-up" facility for rows, as an index in a book allows all references to a topic to be located without reading every page.

In the fleet database example, an index may be created for the Vehicle Registration column. This would facilitate fast queries that required data to be filtered using that column.

Keys

There are two types of key that are important for Microsoft SQL Server databases. The first is the primary key. A primary key is a single column, or group of several columns (compound key), that can be used to uniquely identify rows in a table. Each table in a database may have a single primary key. Once defined, no two rows in the table may contain matching data in the primary key columns. In the fleet database example, the unique Payroll ID value could be the primary key for the Employees table.

Foreign keys are used when defining relationships between tables. A foreign key is a single column, or group of columns, in a table that reference the primary key in another table. This creates a link between the two tables. In the fleet database, a table named "Vehicles" may be used to contain the details of each vehicle owned by the company. The Vehicles table could have a primary key based upon the vehicle registration number. By creating a foreign key in the matching column of the Employees table, the employee and vehicle data become linked.

The information in foreign key columns does not have to be unique. Matching information can appear in multiple rows, as in the example employees table where two people use the same car. For this reason, relationships are often called "one-to-many links", as one value in the primary key of a table may link to many rows in related tables.

NB: A foreign key does not always refer to a secondary table. By linking a foreign key in a table to the primary key of the same table, a self-referencing relationship is created. These can be used to create hierarchical data structures.

Constraints

Constraints are rules that are applied to the information in a database. These are usually used to enforce business rules upon the tabular data. For example, a constraint may reject any attempt to create an Employee row where the employee's age is not within an acceptable range, ensuring compliance with local employment legislation.

Views

Views provide the useful concept of virtual tables. A view gathers specific information from one or more sources and presents it in the format of a single table. The information may be filtered within the view to remove unnecessary information. For example, the Vehicles table in the fleet database could contain a column to hold the type of each vehicle. A view could then be created that returns the employee and vehicle details for all drivers of heavy goods vehicles.

Stored Procedures

A stored procedure is a predefined set of statements that can be executed when required. Stored procedures provide the main means of creating programs within SQL Server databases. They can be as simple as single queries that return data or update rows, or can be complicated programs containing looping and conditional structures. Stored procedures can require the passing of parameters to control their exact execution.

Structured Query Language

The structured query language (SQL) is the language used to query and manipulate information within a SQL Server database. SQL is actually an ISO and ANSI standardised language. However, as with most RDBMS software, SQL Server uses proprietary extensions within its own Transact-SQL (T-SQL) variant of SQL.

SQL Statements

The basic building block of the structured query language is the SQL statement. Using statements, information in a database can be manipulated and queried. The four most common commands are:

  • SELECT. Retrieves a set of rows from one or more tables or views, permitting joining of tables using their primary and foreign keys and filtering of information according to defined criteria.
  • INSERT. Creates rows in a table.
  • UPDATE. Modifies data in one or more rows in a table, depending upon specified criteria.
  • DELETE. Permanently removes one or more data rows from a table.

Transactions

The last concept to be considered in this article is that of transactions. When updating information in a database, it is often necessary to make multiple updates to obtain the correct end result. In the short period between the individual updates, the information in the database could lose integrity.

As an example, a database may be used to hold bank account information. To transfer money between two bank accounts, one account must be debited and the other credited. If one of these updates filed during a transfer, either the bank or its customer would appear to lose money.

To avoid this type of problem, relational database use transactions. A transaction is started before the first update and committed after the final update. Until the transaction is committed, the individual changes do not appear in the database. If an error occurs during the transaction, all of the changes are lost, ensuring that the database's state is still valid. Transactions may also be manually rolled back, undoing all of the changes made since the transaction began.

Transactions must pass the ACID test. The ACID mnemonic is defined as:

  • Atomicity. The data modifications that occur within a transaction are atomic. If the transaction is successful, all changes are committed to the database. If a transaction fails or is rolled back manually, all of the updates are reversed.
  • Consistency. Each command executed by a transaction must result in valid data. Any attempt to create invalid information, breaking the database's consistency, will result in the transaction being rolled back.
  • Isolation. Databases are generally accessible by multiple concurrent users or processes. When several transactions are executing at the same time, each must be isolated from the others. This means that one transaction should not interfere with another and that transactions that operate on the same data should appear to execute in series.
  • Durability. The durability rule states that once a transaction is committed, all of the individual changes to the database are committed and are not lost, irrespective of any external errors or hardware or software failures.

What Next?

In the remainder of the SQL Server Programming Fundamentals Tutorial, we will create, populate and manipulate databases using Microsoft SQL Server. All of the concepts described above, and more, will be used in this series of articles.

Link to this Page18 May 2008
RSS RSS Feed