HPlogo ALLBASE/SQL Database Administration Guide: HP 3000 MPE/iX Computer Systems > Chapter 2 Logical Design

Normalizing the Data

» 

Technical documentation

Complete book in PDF
» Feedback

 » Table of Contents

 » Index

Normalization is a process that helps you arrange data into tables. The goal of normalization is to reduce data redundancy and facilitate efficient updates. In theory, a perfectly normalized database uses data at maximum efficiency and minimum redundancy. In practice, however, normalization does not take into consideration the specific applications of a database. Consequently, normalization is recommended only as a guideline for database design. Use normalization to get your data into relational format (two-dimensional tables) and then alter the table format to fit the specific needs of users and applications.

The process of normalization entails examining the data and the table format of a database against each of the criteria for each of the normal forms. Although there are at least five normal forms, most database tables need to be processed only to the second or third normal form. Therefore, only the first three normal forms are discussed here. When data is normalized, each lower normal form is a prerequisite to the next higher normal form. In other words, data must be in first normal form before it can be put into second normal form.

First Normal Form

In first normal form, data contains rows that have the same number of columns. Unnormalized data is not so orderly. For example, some purchase orders contain one order item, others contain two or more. Thus, if each order number from the purchase order in Figure 2-1 “Identifying Data Categories” were to occupy a single row, each row would have a different number of columns. By separating the data into three groups as in Figure 2-3 “First Normal Form: Removing Repeating Groups”, repeated categories are removed, leaving each row with the same number of columns.

Figure 2-3 First Normal Form: Removing Repeating Groups

[First Normal Form: Removing Repeating Groups]

Second Normal Form

In second normal form, shown in Figure 2-4 “Second Normal Form: Establishing Dependency”, all attributes are functionally dependent on the primary key.

Figure 2-4 Second Normal Form: Establishing Dependency

[Second Normal Form: Establishing Dependency]

Attributes are functionally dependent when their values depend directly upon the value of the primary key. The primary key in the Order Item information is the order and item number combination. Each such combination represents a unique row.

Since the vendor part number and the order quantity relate to a particular order and item number, they are functionally dependent on the primary key. However, the unit price depends on the order quantity (which, in turn, depends on the order and item number), but it does not depend directly on order or item number. This is called a transitive dependency.

Third Normal Form

In third normal form, data does not have transitive dependencies. In Figure 2-5 “Third Normal Form: Removing Transitive Dependencies”, unit price is removed from the Order Items information and placed in a new set of information called Supply Price, which is used to calculate the price of an item dependent on the quantity ordered and date delivered.

Figure 2-5 Third Normal Form: Removing Transitive Dependencies

[Third Normal Form: Removing Transitive Dependencies]

Third normal form relieves update problems. For example, in second normal form, when an item's price changes, all orders containing that item would have to be altered. In third normal form, only the rows for that item in the Supply Price table need to be changed.

Normalized data provides a starting point for table definition. Note that the data in Figure 2-3 “First Normal Form: Removing Repeating Groups”, Figure 2-4 “Second Normal Form: Establishing Dependency”, and Figure 2-5 “Third Normal Form: Removing Transitive Dependencies” is in tabular form. The next step in table design is to take the normalized data and fit it into ALLBASE/SQL columns, tables, views, and indexes.